Monday, June 19, 2023

Exploring Hawaii

Continuing from the previous post, I thought I would summarize how to drill down to individual polygons in a GeoDataFrame containing the Hawaii data.

I put the code to load the data in another file and imported it, so we start with hw.gdf.

gdf = hw.gdf       # gdf is a GeoDataFrame
print(gdf.shape)   # (1,6) a single row
gs = gdf.geometry  # gs is a GeoSeries

So gdf.geometry (or gdf['geometry'], it's basically the same) is a GeoSeries object containing all the geometry data of the original data frame. The GeoSeries allows indexing by the special .iloc notation (below). The first item is a MultiPolygon, a collection of Polygon objects. These are obtained from the Multipolygon with the geoms attribute.

mp = gs.iloc[0]    # mp is a MultiPolygon
p0 = mp.geoms[0]   # p1 is a Polygon
ext = p0.exterior  # ext is a LinearRing

We want to do some shapely geometry operations with the LinearRing. The reason to do all this is that there doesn't seem to be anything in the data identifying the individual islands. The rest of the listing (below) shows the code.

The last step is to plot the data, and this is best done with the original GeoDataFrame's plot method. It is key to capture the returned Axes to use for adding text later.

The task of actually annotating the plot is fussy and i direct you to the github repo for that.

The code below prints:

> p3 explore_hawaii.py
(1, 6)
(9, 6)
0 Hawaii
2 Ni'ihau
3 Kauai
4 Molokai
5 Kaho'olawe
6 Maui
7 Lanai
8 O'ahu
8 Ford Island
> 

We now have assigned names for each of the Polygons representing islands. Ford Island (poly no. 1) does not seem to "contain" the point I picked for it, but that point is contained in the polygon for O'ahu.

import sys,os,subprocess
import geopandas as gpd

import matplotlib as mpl
import matplotlib.pyplot as plt
from shapely.geometry import Point

import hawaii as hw

#---------------------------------

fig,ax = plt.subplots()

gdf = hw.gdf        # gdf is a GeoDataFrame
print(gdf.shape)   # (1,6) a single row
gs = gdf.geometry  # gs is a GeoSeries

mp = gs.iloc[0]    # mp is a MultiPolygon
p1 = mp.geoms[0]   # p1 is a Polygon
ext = p1.exterior  # ext is a LinearRing

# this plots all the islands the same color
# gdf.boundary.plot(ax=ax,cmap='magma')

# so dissolve the Multi collection
exp = gdf.explode(index_parts=True)
print(exp.shape)   # (9,6) 9 islands

# we want to know which island is in each row
# a random point inside p1 (island of Hawaii)

D = {"Hawaii":[-155.519783,19.625055], 
     "Kaho'olawe":[-156.607857,20.550829],
     "Kauai":[-159.567160,22.017814],
     "Lanai":[-156.930387,20.834303],
     "Maui":[-156.279557,20.758340],
     "Molokai":[-156.986996,21.134644],
     "Ni'ihau":[-160.148047,21.904692],
     "O'ahu":[-157.968125,21.488976],
     "Ford Island":[-157.959627,21.363596]}

# each row of exp is a Series (not a Polygon)
def f(row):
    poly = row.geometry
    n = row.name[1]
    for k in D:
        xy = D[k]
        if poly.contains(Point(xy)):
            print(n,k)
        
exp.apply(f,axis=1)