from data import insight :
    About     Archive     Feed

I Have No Idea What I'm Doing

This blog was set up during my data science bootcamp, and they taught us “just enough to be dangerous” about how to make it all work.

When I decided maybe I should blog a bit, there were a lot of updates to apply, and it looks like there are a bunch of interesting Jekyll features that weren’t covered that I might want to experiement with in the future.

Which is just a fancy way of warning: “This blog is under construction” and things may update randomly as I experiment some.

Analyzing Angles with Pingouin

pingouin is a python package for calculating statistics on data organized in pandas dataframes. It has an easier to use interface than stats-models and a batteries-included philosophy where operations that maybe take multiple function calls in scipy.stats are rolled into one call in pingouin. The author calls it simple-but-exhaustive statistics.

While poking around the capabilities of this package, I discovered the circular module. I’d never heard of circular statistics before, but there are a lot of angles in the data I work with every day – and angles don’t always “play nice” with other types of scalar data. For example, if we’re talking about compass headings, the difference between a heading of 45 degrees and a heading of 48 degrees is 2 degrees. But, the difference between a heading of 358 degrees and 0 degrees is also 2 degrees. You can’t treat angles like other types of scalar data.

Even if you don’t work in physical coordinate systems, any data with dates and times can be worked in circular statistics. A year is a 360 degree trip around the sun. A day is a 360 degree rotation of the earth. Anything that cycles can be turned into an angle. Tuesday is just an angle away from Sunday. By converting date-times to radians, you can treat 11:59pm on Monday and 12:02am on Tuesday as the near-identical times they really are, rather than as two separate calendar days.

I generally think in degrees (it’s easier to imaging a 15 degree angle in my mind than a .26 radian angle, but trigonometry runs on radians. So, the first thing we need to do with any type of circular data analysis is convert it into radians. For radar data, np.radians works great. But for other types of data, pingouin’s covert_angles function lets you specify the number of “units” in your particular kind of circle.

Once your data is in radians, what sort of stats can you do? There is no simple analog to linear regression, unfortunately, but there are correlation functions for circular-to-circular or circular-to-scalar variables, ways to calculate values analogous to scalar mean and variance, and checks for a uniform circular distribution.

Check them out!