But there is another format that is even more official, and is maintained by the US Census Bureau. That is a collection of Shapefiles.
The example is
The columns are:
To recap, here is a site where GeoJSON files for the Us are available in multiple sizes (small, medium, large), as well as their .kml and Shapefile .shp equivalents. The sizes are 500k, 5m, 20m, from largest to smallest (the labels are the scales, 500k being the most detailed).
The files were obtained from links on this page. It includes data for the US, for the states, and for counties. And it contains Congressional districts, which would be useful to remember.
The specification was developed at least partly by ESRI, which develops geographic information software.. The encoding was undoubtedly designed to save space, back when space (storage, transmission bandwidth) was much more expensive. Now, the opaque data is a liability.
Shapefiles
There is actually not just one file, but always a minimum of three including .shp, .shx and .dbf inside a zip container. The Shapefiles from the US Census also have .prj and .xml.
I can't tell much by looking at one with hexdump, except that most of it is aligned (in sections), on 16-byte boundaries. The format is described at this link, but I haven't worked with that.
One way to open and read a Shapefile is to use geopandas. I grab that with pip3 install geopandas
>>> d = 'gz_2010_us_040_00_20m' >>> fn = d + '/gz_2010_us_040_00_20m.shp' >>> df = gpd.read_file(fn)
The columns are:
- GEO_ID
- STATE
- NAME
- LSAD
- CENSUSAREA
- geometry
>>> df.NAME.head()
0 Arizona
1 Arkansas
2 California
3 Colorado
4 Connecticut
Name: NAME, dtype: object
For some reason the order is several joined partial lists of states, each one alphabetized.
We need to extract the coordinates for a particular state:
We need to extract the coordinates for a particular state:
import geopandas as gpdfn = 'ex.shp'
df = gpd.read_file(fn)sel = df['NAME'] == 'Maine'
g = me = df.loc[sel].geometryfrom shapely.geometry import mapping
D = mapping(g)for f in D['features']:
print(f['id'])
L = f['geometry']['coordinates']
for m in L:
print(len(m[0]))
print(m[0][0])
print(m[0][1])
print()
> python3 script.py
39
11
(-69.307908, 43.773767)
(-69.30675099999999, 43.775095)
11
(-69.42792, 43.928798)
(-69.423323, 43.922871)
...