Geoviz Tutorial

Installation and setup

Latest and older versions of geoviz can be downloaded from http://pypi.org/project/geoviz/ using pip install geoviz or pip3 install geoviz.

geoviz.choropleth is the main module to be imported. geoviz.preprocess is also useful when data requires pre-processing.

In [1]:
import pandas as pd

import geoviz.choropleth as choro
import geoviz.preprocess as prc

Load and view sample data

There are three datasets here:

supp: county-level data on amount of suppression in census data, has both sequential (low to high) data, "pct_suppressed", and divergent (negative-positive) data, "resid".

fitn: MSA-level economic fitness score, has sequential data, "f_2016"

search: state-level search frequency data (beaches vs mountains)

In [2]:
supp = pd.read_csv('geoviz/data/test/cbp_suppression_2016_EMP.csv', dtype={'GEO.id2':str})
fitn = pd.read_csv('geoviz/data/test/fitness_msa.csv', dtype={'msa_code':str})
search = pd.read_csv('geoviz/data/test/states_beach_mountain.csv')

Basic Choropleth

We start without any optional arguments, just plotting the percent of data suppressed. The required arguments for choro.plot() are...

  • file_or_df: the dataframe or filepath
  • geoid_var: the geographic variable name
  • geoid_type: the geographic variable type: fips (highly recommended), name, cbsa code, or abbreviation (state only)
  • y_var: the variable to be plotted
  • y_type: the type of variable (sequential, sequential_single, divergent, categorical)

y_type

This argument refers to the data and palette type.

  1. Sequential schemes are suited to ordered data that progress from low to high. Lightness steps dominate the look of these schemes, with light colors for low data values to dark colors for high data values. Within this scheme, 'sequential' is multi-hued, which is helpful when we want the low values to be visibly different from missing values or the background. 'sequential_single' is used for single hue paletes.

  2. Diverging schemes put equal emphasis on mid-range critical values and extremes at both ends of the data range. The critical class or break in the middle of the legend is emphasized with light colors and low and high extremes are emphasized with dark colors that have contrasting hues.

  3. Categorical schemes do not imply magnitude differences between legend classes, and hues are used to create the primary visual differences between classes. Categorical schemes are best suited to representing nominal or qualitative data.

See http://colorbrewer2.org/ for more information on these palettes.

In [3]:
supp.head()
Out[3]:
GEO.id2 county county_sum county_total population pct_suppressed pred_pct resid
0 01001 Autauga County, Alabama 5958 10790 55504.0 0.447822 0.453415 -0.005593
1 01003 Baldwin County, Alabama 48485 61341 212628.0 0.209582 0.278900 -0.069317
2 01005 Barbour County, Alabama 1850 6857 25270.0 0.730203 0.555653 0.174549
3 01007 Bibb County, Alabama 969 3387 22668.0 0.713906 0.569773 0.144133
4 01009 Blount County, Alabama 3287 6286 58013.0 0.477092 0.447670 0.029422
In [4]:
choro.plot('geoviz/data/test/cbp_suppression_2016_EMP.csv', 'GEO.id2', 'fips', 'pct_suppressed', 'sequential')
Out[4]:
Figure(
id = '1001', …)

Notice that the breaks/color bar ticks doesn't fit our data perfectly. This could be due to
a) the default number of colors, which is 5, or b) the default min/max values used in the color mapper, which is the min/max of the series.

In this case, we do want 5 colors, but the min value is slightly above 0, so the automatically calculated breaks and ticks are not exactly on "round" or "clean" numbers.

Specifying formatting

We can specify these (and other formatting specs) manually by passing in a dict to formatting.

To see the options for these, we can call

In [5]:
choro.DEFAULTFORMAT
Out[5]:
{'width': 900,
 'background_color': None,
 'title': '',
 'font': 'futura',
 'title_fontsize': '14pt',
 'tools': 'zoom_in,zoom_out,pan,reset,save',
 'svg': None,
 'fill_alpha': 1,
 'line_color': '#d3d3d3',
 'line_width': 0.5,
 'simplify': 0,
 'tooltip_text': '',
 'hover_geolabel': 'Area name',
 'hover_ylabel': None,
 'ncolors': 5,
 'palette': 1,
 'min': None,
 'max': None,
 'reverse_palette': False,
 'lin_or_log': 'lin',
 'cbar_height': 'auto',
 'cbar_fontsize': None,
 'cbar_textfmt': None,
 'cbar_title': '',
 'cbar_title_align': 'center',
 'cbar_style': None,
 'cbar_tick_color': 'black',
 'cbar_tick_alpha': 1,
 'cbar_title_standoff_ratio': 0.006,
 'state_outline_options': ['none', 'before', 'after', 'both'],
 'state_outline': 'none',
 'st_alpha': 1,
 'st_fill': None,
 'st_line_color': 'black',
 'st_line_width': 1}
In [6]:
choro.plot('geoviz/data/test/cbp_suppression_2016_EMP.csv', 
           'GEO.id2', 'fips', 'pct_suppressed', 'sequential', 
           formatting={'ncolors':5, 'min':0, 'max':1.0, 'state_outline':'after'})
Out[6]:
Figure(
id = '1109', …)
In [7]:
choro.plot(supp, 'GEO.id2', 'fips', 'pct_suppressed', 'sequential', 
           formatting={'ncolors':5, 'min':0, 'max':1.0, 'state_outline':'after', 
                       'background_color':'#c8c8c8', 'st_line_color':'#ffffff',
                       'simplify':0.1, ## how much to simplify shapefile geometry (larger is more simplified)
                       'tools':'', ## change which bokeh tools to include 
                       'width':600, 'cbar_height':75, 
                       'title':'Proportion of Data Suppressed', 'title_fontsize':'10pt'})
Out[7]:
Figure(
id = '1237', …)

Divergent data and customizing palette

the 'palette' formatting argument takes strings (see https://bokeh.pydata.org/en/latest/docs/reference/palettes.html)
or int between 1 and 3 for default options

In [8]:
choro.plot(supp, 'GEO.id2', 'fips', 'resid', 'divergent', 
           formatting={'ncolors':7, 'min':-0.35, 'max':0.35, 
                       'state_outline':'after', 'simplify':0.01, 'palette':2, ## change to second default palette
                       'tooltip_text':'{0.000}', ## change tooltip text format. Note: must be wrapped by {} 
                       'title':'Actual vs Predicted Proportion of Data Suppressed'})
Out[8]:
Figure(
id = '1355', …)
In [9]:
choro.plot(supp, 'GEO.id2', 'fips', 'resid', 'divergent', 
           formatting={'ncolors':7, 'min':-0.35, 'max':0.35, 
                       'state_outline':'after', 'simplify':0.01, 
                       'palette':'RdGy', ## specify specific palette
                       'hover_geolabel':'Counties', 'hover_ylabel':'Residual' ## change hover labels
                       })
Out[9]:
Figure(
id = '1483', …)

State-level choropleth and categorical data

In [10]:
search.head()
Out[10]:
state_name state_abbr state_fips pct_beach more
0 Florida FL 12 95 beach
1 South Carolina SC 45 89 beach
2 Hawaii HI 15 88 beach
3 Virginia VA 51 84 beach
4 Delaware DE 10 86 beach
In [11]:
choro.plot(search, geoid_var='state_name', geoid_type='name', 
           y_var='pct_beach', y_type='divergent', geolvl='state', ## specify if geo is not county
           formatting={'ncolors':10, 'min':0, 'max':100, 
                       'title':'% of searches that are "Beach" vs "Mountain"',
                       'cbar_title':'Percent', 'cbar_fontsize':'10pt', 'tooltip_text':'{0}%',})
Out[11]:
Figure(
id = '1611', …)
In [12]:
choro.plot(search, geoid_var='state_abbr', geoid_type='abbrev', ## can also use state abbreviation or fips code
           y_var='more', y_type='categorical', geolvl='state', 
           formatting={'title':'"Beach" vs "Mountain"', 'fill_alpha':1, 
                       'palette':['#d8b365', '#5ab4ac']}) ## supply custom colors for palette 
Out[12]:
Figure(
id = '1719', …)

plotting metropolitan statistical areas

There is a cbsa_to_fips() function in the preprocessing module that processes MSA-level dataframes to county form. This is done by duplicating each MSA row into its underlying counties (based on the OMB's definitions)

The function can be called by user separately before plotting, which may be more time and resource efficient than having the data be processed every time choro.plot() is called.

In [13]:
fitn.head()
Out[13]:
msa_name msa_code f_2016
0 Abilene, TX 10180 0.65
1 Akron, OH 10420 1.16
2 Albany, GA 10500 0.68
3 Albany, NY 10580 1.78
4 Albany, OR 10540 0.52
In [14]:
fitn['f_2016'].plot('hist')
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x11df58fd0>
In [15]:
prc.cbsa_to_fips(fitn, cbsa_var='msa_code').head()
Out[15]:
cbsa cbsa_name county_name fips msa_name msa_code f_2016
0 10180 Abilene, TX Callahan 48059 Abilene, TX 10180 0.65
1 10180 Abilene, TX Jones 48253 Abilene, TX 10180 0.65
2 10180 Abilene, TX Taylor 48441 Abilene, TX 10180 0.65
3 10420 Akron, OH Portage 39133 Akron, OH 10420 1.16
4 10420 Akron, OH Summit 39153 Akron, OH 10420 1.16

or automatically by specifying geoid_type='cbsa'.

In [16]:
choro.plot(fitn, geoid_var='msa_code', geoid_type='cbsa', y_var='f_2016', y_type='sequential',
           formatting={'palette':2, 'ncolors':7, 'min':0, 'max':7,
                       'st_fill':'#c8c8c8', 'state_outline':'both'}) 
Out[16]:
Figure(
id = '1812', …)

Note: When plotting MSA data, some parts of the map will be left blank.
Specifying state_outline:'both' and a st_fill fills in the state background before plotting the main layer, then outlines the states again withd a transparent fill.

In [17]:
choro.plot(fitn, geoid_var='msa_code', geoid_type='cbsa', y_var='f_2016', y_type='sequential', 
           formatting={'palette':2, 'ncolors':7, 'min':0, 'max':7,
                       'st_fill':'#c8c8c8', 'state_outline':'both'},
           geolabel='cbsa_name') ## change variable used to label hover tooltips to be the official msa/cbsa name 
           ## in the plot above, labels are default (i.e. the 'name' of each shape, in this case the county)
##           geolabel='msa_name' ## can also use variable originally in the df. 
           ## if there are duplicates names, the user supplied one is preserved (see documentation)
Out[17]:
Figure(
id = '1960', …)

Outputting plots

The output options are as a plot in the Jupyter notebook (default), or saved as an html file. This html retains all of the interactive capabilities. A static png image can be saved through a button in the toolbar.

We can also set whether the save button should output the static image as a png (default) or svg* file.

May 2019 Note: in latest tests, this was a little buggy, with the toolbar and the colorbar ticks not shown. As a workaround, the svg-backend html can be saved using https://nytimes.github.io/svg-crowbar/ when opened in Chrome/Firefox.

In [18]:
# choro.plot(supp, 'GEO.id2', 'fips', 'pct_suppressed', 'sequential', 
#            formatting={'simplify':0.05, 'ncolors':5, 'min':0, 'max':1.0, 'state_outline':'after'},
#            output='bokeh_test_png.html')
In [19]:
# choro.plot_empty(formatting={'svg':True})
In [20]:
# choro.plot(supp, 'GEO.id2', 'fips', 'pct_suppressed', 'sequential', 
#            formatting={'simplify':0.05, 'ncolors':5, 'min':0, 'max':1.0, 'svg':True, 'state_outline':'after'},
#            output='bokeh_test_svg.html')