This blog was set up during my data science bootcamp, and they taught us “just enough to be
dangerous” about how to make it all work.
When I decided maybe I should blog a bit, there were a lot of updates to apply, and it looks like there are a bunch
of interesting Jekyll features that weren’t covered that I might want to experiement with in the future.
Which is just a fancy way of warning: “This blog is under construction” and things
may update randomly as I experiment some.
pingouin is a python package for
calculating statistics on data organized in pandas dataframes. It has
an easier to use interface than stats-models and a batteries-included
philosophy where operations that maybe take multiple function calls in
scipy.stats are rolled into one call in pingouin. The author calls
it simple-but-exhaustive statistics.
While poking around the capabilities of this package, I discovered the
circular module. I’d never heard of circular statistics before, but
there are a lot of angles in the data I work with every day – and angles
don’t always “play nice” with other types of scalar data.
For example, if we’re talking about compass headings, the difference between
a heading of 45 degrees and a heading of 48 degrees is 2 degrees.
But, the difference between a heading of 358 degrees and 0 degrees is also
2 degrees. You can’t treat angles like other types of scalar data.
Even if you don’t work in physical coordinate systems, any data
with dates and times can be worked in circular statistics.
A year is a 360 degree trip around the sun. A day is a 360 degree
rotation of the earth. Anything that cycles can
be turned into an angle. Tuesday is just an angle away from Sunday.
By converting date-times to radians, you can treat 11:59pm on Monday and
12:02am on Tuesday as the near-identical times they really are, rather than as
two separate calendar days.
I generally think in degrees (it’s easier
to imaging a 15 degree angle in my mind than a .26 radian angle, but
trigonometry runs on radians.
So, the first thing we need to do with any type of circular data analysis is
convert it into radians.
For radar data, np.radians works great. But for other types of data,
pingouin’s covert_angles function
lets you specify the number of “units” in your particular kind of circle.
Once your data is in radians, what sort of stats can you do?
There is no simple analog to linear regression, unfortunately, but
there are correlation functions for circular-to-circular or circular-to-scalar
variables, ways to calculate values analogous to scalar mean and variance,
and checks for a uniform circular distribution.