Geoviz Tutorial¶

Installation and setup¶

Latest and older versions of geoviz can be downloaded from http://pypi.org/project/geoviz/ using pip install geoviz or pip3 install geoviz.

geoviz.choropleth is the main module to be imported. geoviz.preprocess is also useful when data requires pre-processing.

import pandas as pd

import geoviz.choropleth as choro
import geoviz.preprocess as prc

Load and view sample data¶

There are three datasets here:

supp: county-level data on amount of suppression in census data, has both sequential (low to high) data, "pct_suppressed", and divergent (negative-positive) data, "resid".

fitn: MSA-level economic fitness score, has sequential data, "f_2016"

search: state-level search frequency data (beaches vs mountains)

supp = pd.read_csv('geoviz/data/test/cbp_suppression_2016_EMP.csv', dtype={'GEO.id2':str})
fitn = pd.read_csv('geoviz/data/test/fitness_msa.csv', dtype={'msa_code':str})
search = pd.read_csv('geoviz/data/test/states_beach_mountain.csv')

Basic Choropleth¶

We start without any optional arguments, just plotting the percent of data suppressed. The required arguments for choro.plot() are...

file_or_df: the dataframe or filepath
geoid_var: the geographic variable name
geoid_type: the geographic variable type: fips (highly recommended), name, cbsa code, or abbreviation (state only)
y_var: the variable to be plotted
y_type: the type of variable (sequential, sequential_single, divergent, categorical)

y_type¶

This argument refers to the data and palette type.

Sequential schemes are suited to ordered data that progress from low to high. Lightness steps dominate the look of these schemes, with light colors for low data values to dark colors for high data values. Within this scheme, 'sequential' is multi-hued, which is helpful when we want the low values to be visibly different from missing values or the background. 'sequential_single' is used for single hue paletes.
Diverging schemes put equal emphasis on mid-range critical values and extremes at both ends of the data range. The critical class or break in the middle of the legend is emphasized with light colors and low and high extremes are emphasized with dark colors that have contrasting hues.
Categorical schemes do not imply magnitude differences between legend classes, and hues are used to create the primary visual differences between classes. Categorical schemes are best suited to representing nominal or qualitative data.

See http://colorbrewer2.org/ for more information on these palettes.

supp.head()

choro.plot('geoviz/data/test/cbp_suppression_2016_EMP.csv', 'GEO.id2', 'fips', 'pct_suppressed', 'sequential')

Notice that the breaks/color bar ticks doesn't fit our data perfectly. This could be due to
a) the default number of colors, which is 5, or b) the default min/max values used in the color mapper, which is the min/max of the series.

In this case, we do want 5 colors, but the min value is slightly above 0, so the automatically calculated breaks and ticks are not exactly on "round" or "clean" numbers.

Specifying formatting¶

We can specify these (and other formatting specs) manually by passing in a dict to formatting.

To see the options for these, we can call

choro.DEFAULTFORMAT

{'width': 900,
 'background_color': None,
 'title': '',
 'font': 'futura',
 'title_fontsize': '14pt',
 'tools': 'zoom_in,zoom_out,pan,reset,save',
 'svg': None,
 'fill_alpha': 1,
 'line_color': '#d3d3d3',
 'line_width': 0.5,
 'simplify': 0,
 'tooltip_text': '',
 'hover_geolabel': 'Area name',
 'hover_ylabel': None,
 'ncolors': 5,
 'palette': 1,
 'min': None,
 'max': None,
 'reverse_palette': False,
 'lin_or_log': 'lin',
 'cbar_height': 'auto',
 'cbar_fontsize': None,
 'cbar_textfmt': None,
 'cbar_title': '',
 'cbar_title_align': 'center',
 'cbar_style': None,
 'cbar_tick_color': 'black',
 'cbar_tick_alpha': 1,
 'cbar_title_standoff_ratio': 0.006,
 'state_outline_options': ['none', 'before', 'after', 'both'],
 'state_outline': 'none',
 'st_alpha': 1,
 'st_fill': None,
 'st_line_color': 'black',
 'st_line_width': 1}

choro.plot('geoviz/data/test/cbp_suppression_2016_EMP.csv', 
           'GEO.id2', 'fips', 'pct_suppressed', 'sequential', 
           formatting={'ncolors':5, 'min':0, 'max':1.0, 'state_outline':'after'})

choro.plot(supp, 'GEO.id2', 'fips', 'pct_suppressed', 'sequential', 
           formatting={'ncolors':5, 'min':0, 'max':1.0, 'state_outline':'after', 
                       'background_color':'#c8c8c8', 'st_line_color':'#ffffff',
                       'simplify':0.1, ## how much to simplify shapefile geometry (larger is more simplified)
                       'tools':'', ## change which bokeh tools to include 
                       'width':600, 'cbar_height':75, 
                       'title':'Proportion of Data Suppressed', 'title_fontsize':'10pt'})

Divergent data and customizing palette¶

the 'palette' formatting argument takes strings (see https://bokeh.pydata.org/en/latest/docs/reference/palettes.html)
or int between 1 and 3 for default options

choro.plot(supp, 'GEO.id2', 'fips', 'resid', 'divergent', 
           formatting={'ncolors':7, 'min':-0.35, 'max':0.35, 
                       'state_outline':'after', 'simplify':0.01, 'palette':2, ## change to second default palette
                       'tooltip_text':'{0.000}', ## change tooltip text format. Note: must be wrapped by {} 
                       'title':'Actual vs Predicted Proportion of Data Suppressed'})

choro.plot(supp, 'GEO.id2', 'fips', 'resid', 'divergent', 
           formatting={'ncolors':7, 'min':-0.35, 'max':0.35, 
                       'state_outline':'after', 'simplify':0.01, 
                       'palette':'RdGy', ## specify specific palette
                       'hover_geolabel':'Counties', 'hover_ylabel':'Residual' ## change hover labels
                       })

State-level choropleth and categorical data¶

search.head()

choro.plot(search, geoid_var='state_name', geoid_type='name', 
           y_var='pct_beach', y_type='divergent', geolvl='state', ## specify if geo is not county
           formatting={'ncolors':10, 'min':0, 'max':100, 
                       'title':'% of searches that are "Beach" vs "Mountain"',
                       'cbar_title':'Percent', 'cbar_fontsize':'10pt', 'tooltip_text':'{0}%',})

choro.plot(search, geoid_var='state_abbr', geoid_type='abbrev', ## can also use state abbreviation or fips code
           y_var='more', y_type='categorical', geolvl='state', 
           formatting={'title':'"Beach" vs "Mountain"', 'fill_alpha':1, 
                       'palette':['#d8b365', '#5ab4ac']}) ## supply custom colors for palette

plotting metropolitan statistical areas¶

There is a cbsa_to_fips() function in the preprocessing module that processes MSA-level dataframes to county form. This is done by duplicating each MSA row into its underlying counties (based on the OMB's definitions)

The function can be called by user separately before plotting, which may be more time and resource efficient than having the data be processed every time choro.plot() is called.

fitn.head()

fitn['f_2016'].plot('hist')

<matplotlib.axes._subplots.AxesSubplot at 0x11df58fd0>

prc.cbsa_to_fips(fitn, cbsa_var='msa_code').head()

or automatically by specifying geoid_type='cbsa'.

choro.plot(fitn, geoid_var='msa_code', geoid_type='cbsa', y_var='f_2016', y_type='sequential',
           formatting={'palette':2, 'ncolors':7, 'min':0, 'max':7,
                       'st_fill':'#c8c8c8', 'state_outline':'both'})

Note: When plotting MSA data, some parts of the map will be left blank.
Specifying state_outline:'both' and a st_fill fills in the state background before plotting the main layer, then outlines the states again withd a transparent fill.

choro.plot(fitn, geoid_var='msa_code', geoid_type='cbsa', y_var='f_2016', y_type='sequential', 
           formatting={'palette':2, 'ncolors':7, 'min':0, 'max':7,
                       'st_fill':'#c8c8c8', 'state_outline':'both'},
           geolabel='cbsa_name') ## change variable used to label hover tooltips to be the official msa/cbsa name 
           ## in the plot above, labels are default (i.e. the 'name' of each shape, in this case the county)
##           geolabel='msa_name' ## can also use variable originally in the df. 
           ## if there are duplicates names, the user supplied one is preserved (see documentation)

Outputting plots¶

The output options are as a plot in the Jupyter notebook (default), or saved as an html file. This html retains all of the interactive capabilities. A static png image can be saved through a button in the toolbar.

We can also set whether the save button should output the static image as a png (default) or svg* file.

May 2019 Note: in latest tests, this was a little buggy, with the toolbar and the colorbar ticks not shown. As a workaround, the svg-backend html can be saved using https://nytimes.github.io/svg-crowbar/ when opened in Chrome/Firefox.

# choro.plot(supp, 'GEO.id2', 'fips', 'pct_suppressed', 'sequential', 
#            formatting={'simplify':0.05, 'ncolors':5, 'min':0, 'max':1.0, 'state_outline':'after'},
#            output='bokeh_test_png.html')

# choro.plot_empty(formatting={'svg':True})

# choro.plot(supp, 'GEO.id2', 'fips', 'pct_suppressed', 'sequential', 
#            formatting={'simplify':0.05, 'ncolors':5, 'min':0, 'max':1.0, 'svg':True, 'state_outline':'after'},
#            output='bokeh_test_svg.html')

	GEO.id2	county	county_sum	county_total	population	pct_suppressed	pred_pct	resid
0	01001	Autauga County, Alabama	5958	10790	55504.0	0.447822	0.453415	-0.005593
1	01003	Baldwin County, Alabama	48485	61341	212628.0	0.209582	0.278900	-0.069317
2	01005	Barbour County, Alabama	1850	6857	25270.0	0.730203	0.555653	0.174549
3	01007	Bibb County, Alabama	969	3387	22668.0	0.713906	0.569773	0.144133
4	01009	Blount County, Alabama	3287	6286	58013.0	0.477092	0.447670	0.029422

	state_name	state_abbr	state_fips	pct_beach	more
0	Florida	FL	12	95	beach
1	South Carolina	SC	45	89	beach
2	Hawaii	HI	15	88	beach
3	Virginia	VA	51	84	beach
4	Delaware	DE	10	86	beach

	msa_name	msa_code	f_2016
0	Abilene, TX	10180	0.65
1	Akron, OH	10420	1.16
2	Albany, GA	10500	0.68
3	Albany, NY	10580	1.78
4	Albany, OR	10540	0.52

	cbsa	cbsa_name	county_name	fips	msa_name	msa_code	f_2016
0	10180	Abilene, TX	Callahan	48059	Abilene, TX	10180	0.65
1	10180	Abilene, TX	Jones	48253	Abilene, TX	10180	0.65
2	10180	Abilene, TX	Taylor	48441	Abilene, TX	10180	0.65
3	10420	Akron, OH	Portage	39133	Akron, OH	10420	1.16
4	10420	Akron, OH	Summit	39153	Akron, OH	10420	1.16