Here is the help page for the covid project (it is the same for most of the scripts):
> python3 one_state.py --help
flags
-h --help help
-n <int> display the last n values, default: 7
-N <int> display N rows of data: default: 50
-c --delta change or delta, display day over day rise
-d --deaths display deaths rather than cases (default)
-r --rate compute statistics
-s --sort (only if stats are asked for)
to do:
-u <int> data slice ends this many days before yesterday
-p --pop normalize to population
example:
python one_state.py [state] -n 10 -sr
And here is the output (today) for that example:
> python3 one_state.py SC -rs
06/17 06/18 06/19 06/20 06/21 06/22 06/23 stats
Charleston 1264 1403 1554 1728 1836 2044 2251 0.094
Oconee 95 100 105 110 136 154 142 0.083
Pickens 348 367 429 464 499 529 570 0.083
Calhoun 47 48 58 62 69 73 74 0.082
...
total 20556 21533 22608 23756 24661 25666 26572 0.043
The statistic is a linear regression on cases, normalized to the mean of the values, and then the counties in my state (SC) are sorted according to the result. Charleston is my county, and unfortunately, it is the county with the highest rate of growth of cases in the state. Currently in the US, the top 20 counties are:
> python3 us_by_counties.py -rs -n 4 -N 20
06/20 06/21 06/22 06/23 stats
Thomas, KS 0 0 10 12 0.836
Hot Spring, AR 53 53 138 226 0.514
Holmes, FL 47 47 58 121 0.341
Jim Wells, TX 22 27 34 46 0.245
Brewster, TX 24 24 39 45 0.236
Erath, TX 44 44 44 85 0.227
McDonald, MO 170 366 371 403 0.215
Sharkey, MS 9 9 13 16 0.213
Blanco, TX 14 14 22 24 0.205
Newton, TX 6 8 11 11 0.2
Aroostook, ME 11 17 19 21 0.188
Tehama, CA 34 34 53 54 0.181
Sioux, ND 12 12 19 19 0.181
Okfuskee, OK 7 7 11 11 0.178
Bourbon, KS 9 9 14 14 0.174
Lawrence, MO 11 13 13 19 0.171
Harvey, KS 13 13 20 20 0.17
Pontotoc, MS 93 93 128 146 0.169
Letcher, KY 8 8 8 13 0.162
Live Oak, TX 10 10 15 15 0.16
I chose n = 4 so the output would be formatted correctly for the blog post.
As with any large dataset, there are some problems to work through, which are not solved perfectly yet. Also, I've focused more on the U.S. lately, so scripts for world data haven't been updated yet either.
What I got interested in and want to show is the generation of maps of the US by states or counties, or one or a few states by counties, where the fill color is based on, for example, the growth rate of cases. Here is the US by states.
I haven't generated the color bar horizontally yet, I just cut it out and rotated it, so the writing is rotated as well.
This type of map is called a choropleth map. I stumbled across a python tool for generating maps. It's part of the plotly library. It is as simple as
fig = px.choropleth(
df,
locations=abbrev,
locationmode='USA-states',
color=st,
color_continuous_scale='Plasma',
scope="usa",
labels={'color':'growth'})
fig.show()
The details are slightly complicated, but not bad.
df is a pandas data frame that maps states by two-letter abbreviation to the corresponding statistic.
df = pd.DataFrame(data={'state':abbrev, 'value':st})
You need GeoJSON data for a county map (the states are already known to plotly.express). That data file is available from them.
The colors are mapped to the statistic st as read from the data frame. The last line of the call to px.choropleth assigns the title to the color bar.
This is the state of South Carolina today. These colors are an attempt to make the positives pop out more.
There's much more to discuss. I have always wanted to make a map of the US with my road trips plotted on it, something like this. For that we need to talk about GeoJSON data and how to obtain it, as well as the Albers projection that is used in making maps. It turns out that the standard methods from plotly have a significant limitation and I had a really weird bug in my code that I eventually figured out.
Finally, we'll need to find how to generate the data for each individual trip to overlay on the map. That's all for later.