Latest and older versions of geoviz can be downloaded from http://pypi.org/project/geoviz/ using
pip install geoviz
or pip3 install geoviz
.
geoviz.choropleth
is the main module to be imported. geoviz.preprocess
is also useful when data requires pre-processing.
import pandas as pd
import geoviz.choropleth as choro
import geoviz.preprocess as prc
There are three datasets here:
supp: county-level data on amount of suppression in census data, has both sequential (low to high) data, "pct_suppressed", and divergent (negative-positive) data, "resid".
fitn: MSA-level economic fitness score, has sequential data, "f_2016"
search: state-level search frequency data (beaches vs mountains)
supp = pd.read_csv('geoviz/data/test/cbp_suppression_2016_EMP.csv', dtype={'GEO.id2':str})
fitn = pd.read_csv('geoviz/data/test/fitness_msa.csv', dtype={'msa_code':str})
search = pd.read_csv('geoviz/data/test/states_beach_mountain.csv')
We start without any optional arguments, just plotting the percent of data suppressed. The required arguments for choro.plot() are...
file_or_df
: the dataframe or filepathgeoid_var
: the geographic variable namegeoid_type
: the geographic variable type: fips (highly recommended), name, cbsa code, or abbreviation (state only)y_var
: the variable to be plottedy_type
: the type of variable (sequential, sequential_single, divergent, categorical)This argument refers to the data and palette type.
Sequential schemes are suited to ordered data that progress from low to high. Lightness steps dominate the look of these schemes, with light colors for low data values to dark colors for high data values. Within this scheme, 'sequential'
is multi-hued, which is helpful when we want the low values to be visibly different from missing values or the background. 'sequential_single'
is used for single hue paletes.
Diverging schemes put equal emphasis on mid-range critical values and extremes at both ends of the data range. The critical class or break in the middle of the legend is emphasized with light colors and low and high extremes are emphasized with dark colors that have contrasting hues.
Categorical schemes do not imply magnitude differences between legend classes, and hues are used to create the primary visual differences between classes. Categorical schemes are best suited to representing nominal or qualitative data.
See http://colorbrewer2.org/ for more information on these palettes.
supp.head()
choro.plot('geoviz/data/test/cbp_suppression_2016_EMP.csv', 'GEO.id2', 'fips', 'pct_suppressed', 'sequential')
Notice that the breaks/color bar ticks doesn't fit our data perfectly. This could be due to
a) the default number of colors, which is 5, or
b) the default min/max values used in the color mapper, which is the min/max of the series.
In this case, we do want 5 colors, but the min value is slightly above 0, so the automatically calculated breaks and ticks are not exactly on "round" or "clean" numbers.
We can specify these (and other formatting specs) manually by passing in a dict to formatting.
To see the options for these, we can call
choro.DEFAULTFORMAT
choro.plot('geoviz/data/test/cbp_suppression_2016_EMP.csv',
'GEO.id2', 'fips', 'pct_suppressed', 'sequential',
formatting={'ncolors':5, 'min':0, 'max':1.0, 'state_outline':'after'})
choro.plot(supp, 'GEO.id2', 'fips', 'pct_suppressed', 'sequential',
formatting={'ncolors':5, 'min':0, 'max':1.0, 'state_outline':'after',
'background_color':'#c8c8c8', 'st_line_color':'#ffffff',
'simplify':0.1, ## how much to simplify shapefile geometry (larger is more simplified)
'tools':'', ## change which bokeh tools to include
'width':600, 'cbar_height':75,
'title':'Proportion of Data Suppressed', 'title_fontsize':'10pt'})
the 'palette' formatting argument takes strings (see
https://bokeh.pydata.org/en/latest/docs/reference/palettes.html)
or int between 1 and 3 for default options
choro.plot(supp, 'GEO.id2', 'fips', 'resid', 'divergent',
formatting={'ncolors':7, 'min':-0.35, 'max':0.35,
'state_outline':'after', 'simplify':0.01, 'palette':2, ## change to second default palette
'tooltip_text':'{0.000}', ## change tooltip text format. Note: must be wrapped by {}
'title':'Actual vs Predicted Proportion of Data Suppressed'})
choro.plot(supp, 'GEO.id2', 'fips', 'resid', 'divergent',
formatting={'ncolors':7, 'min':-0.35, 'max':0.35,
'state_outline':'after', 'simplify':0.01,
'palette':'RdGy', ## specify specific palette
'hover_geolabel':'Counties', 'hover_ylabel':'Residual' ## change hover labels
})
search.head()
choro.plot(search, geoid_var='state_name', geoid_type='name',
y_var='pct_beach', y_type='divergent', geolvl='state', ## specify if geo is not county
formatting={'ncolors':10, 'min':0, 'max':100,
'title':'% of searches that are "Beach" vs "Mountain"',
'cbar_title':'Percent', 'cbar_fontsize':'10pt', 'tooltip_text':'{0}%',})
choro.plot(search, geoid_var='state_abbr', geoid_type='abbrev', ## can also use state abbreviation or fips code
y_var='more', y_type='categorical', geolvl='state',
formatting={'title':'"Beach" vs "Mountain"', 'fill_alpha':1,
'palette':['#d8b365', '#5ab4ac']}) ## supply custom colors for palette
There is a cbsa_to_fips()
function in the preprocessing module that processes MSA-level dataframes to county form.
This is done by duplicating each MSA row into its underlying counties (based on the OMB's definitions)
The function can be called by user separately before plotting, which may be more time and resource efficient than having the data be processed every time choro.plot()
is called.
fitn.head()
fitn['f_2016'].plot('hist')
prc.cbsa_to_fips(fitn, cbsa_var='msa_code').head()
or automatically by specifying geoid_type='cbsa'
.
choro.plot(fitn, geoid_var='msa_code', geoid_type='cbsa', y_var='f_2016', y_type='sequential',
formatting={'palette':2, 'ncolors':7, 'min':0, 'max':7,
'st_fill':'#c8c8c8', 'state_outline':'both'})
Note: When plotting MSA data, some parts of the map will be left blank.
Specifying state_outline:'both'
and a st_fill
fills in the state background before plotting the main layer, then outlines the states again withd a transparent fill.
choro.plot(fitn, geoid_var='msa_code', geoid_type='cbsa', y_var='f_2016', y_type='sequential',
formatting={'palette':2, 'ncolors':7, 'min':0, 'max':7,
'st_fill':'#c8c8c8', 'state_outline':'both'},
geolabel='cbsa_name') ## change variable used to label hover tooltips to be the official msa/cbsa name
## in the plot above, labels are default (i.e. the 'name' of each shape, in this case the county)
## geolabel='msa_name' ## can also use variable originally in the df.
## if there are duplicates names, the user supplied one is preserved (see documentation)
The output options are as a plot in the Jupyter notebook (default), or saved as an html file. This html retains all of the interactive capabilities. A static png image can be saved through a button in the toolbar.
We can also set whether the save button should output the static image as a png (default) or svg* file.
May 2019 Note: in latest tests, this was a little buggy, with the toolbar and the colorbar ticks not shown. As a workaround, the svg-backend html can be saved using https://nytimes.github.io/svg-crowbar/ when opened in Chrome/Firefox.
# choro.plot(supp, 'GEO.id2', 'fips', 'pct_suppressed', 'sequential',
# formatting={'simplify':0.05, 'ncolors':5, 'min':0, 'max':1.0, 'state_outline':'after'},
# output='bokeh_test_png.html')
# choro.plot_empty(formatting={'svg':True})
# choro.plot(supp, 'GEO.id2', 'fips', 'pct_suppressed', 'sequential',
# formatting={'simplify':0.05, 'ncolors':5, 'min':0, 'max':1.0, 'svg':True, 'state_outline':'after'},
# output='bokeh_test_svg.html')