--- title: Explore and Download keywords: fastai sidebar: home_sidebar summary: "In this tutorial, the basics of Colabs are introduced and ACS data is Downloaded." ---
/usr/local/lib/python3.6/dist-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)

This Coding Notebook is the first in a series.

An Interactive version can be found here Open In Colab.

This colab and more can be found at https://github.com/BNIA/colabs

  • Content covered in previous tutorials will be used in later tutorials.

  • new code and or information should have explanations and or descriptions attached.

  • Concepts or code covered in previous tutorials will be used without being explaining in entirety.

  • If content can not be found in the current tutorial and is not covered in previous tutorials, please let me know.

About this Tutorial:

Before we begin

Tips

  • If information is not covered in this tutorial it may be because the material was covered in the prior tutorial.
  • A table of contents is provided in the menu to the left.
  • And, that this notebook has been optimized for Google Colabs ran on a Chrome Browser.
  • While still fully usable, non-critical section of code (eg. Python Magics and HTML) may break if used in a different enviornment.

Disclaimer

Views Expressed: All views expressed in this tutorial are the authors own and do not represent the opinions of any entity whatsover with which they have been, are now, or will be affiliated.

Responsibility, Errors and Ommissions: The author makes no assurance about the reliability of the information. The author makes takes no responsibility for updating the tutorial nor maintaining it porformant status. Under no circumstances shall the Author or its affiliates be liable for any indirect incedental, consequential, or special and or exemplary damages arising out of or in connection with this tutorial. Information is provided 'as is' with distinct plausability of errors and ommitions. Information found within the contents is attached with an MIT license. Please refer to the License for more information.

Use at Risk: Any action you take upon the information on this Tutorial is strictly at your own risk, and the author will not be liable for any losses and damages in connection with the use of this tutorial and subsequent products.

Fair Use this site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. While no intention is made to unlawfully use copyrighted work, circumstanes may arise in which such material is made available in effort to advance scientific literacy. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Titile 17 U.S.C. Section 108, the material on this tutorial is distributed without profit to those who have expressed a prior interest in receiving the included information for research and education purposes.

for more information go to: http://www.law.cornell.edu/uscode/17/107.shtml. if you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', ou must obtain permission from the copyright owner.

License

Copyright © 2019 BNIA-JFI

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Whats inside?

The Tutorial

In this tutorial, the basics of Colabs are introduced.

  • We will explore ACS data catalogs to locate data we like
  • We will programmatically download data from the American Community Survey (ACS)
    • Examples will use ACS table B19001, Baltimore City 2017 estimates
  • We will rework the datasets to be human friendly

Objectives

By the end of this tutorial users should have an understanding of:

  • Google Colabs
  • Census Data
  • Exploring, Retrieveing, and cleaning ACS Data programmatically
  • the 'retrieve_acs_data()' function, and how to use it in the future.

Using Colabs:

Instructions: Read all text and execute all code in order.

How to execute code:

  • Locate labels taking the form: 'Run: (A Short Description)'
  • Left of this text you will see an open bracket [ ], possibly with a number inside it.
    • Hovering over the brackets will reveal a play button.
    • Click the button to execute code.

If you would like to see the code you are executing, double click the label 'Run: '. Code is accompanied with brief descriptions inlined.

Try It! Go ahead and try running the cell below. What you will be shown as a result is a flow chart of how this current tutorial may be used.

#@title Run: View User Path

# This box uses HTML magic '%%html' to denote 
# that anything after that tag will be in HTML not Python.
# Within the HTML I use Javascript magic to render 
# a graph from the markup written inside the html div
# The graph maps out the possible paths 
# you may take when using this notebook.

%%html
<script src="https://code.jquery.com/jquery-1.10.2.js"></script>
<script src="https://unpkg.com/mermaid@7.1.0/dist/mermaid.min.js"> </script>
<script> window.mermaid.init() </script>
<div class="mermaid">
  graph LR
      User>User] --> ExploreAcsTables
      ExploreAcsTables --> DownloadAcsTable
      DownloadAcsTable -- Repeat --> DownloadAcsTable
      DownloadAcsTable --> DecideWhatToDo{DecideWhatToDo}
      DecideWhatToDo --> RunAnotherNotebook{RunAnotherNotebook} 
      DecideWhatToDo --> FinishTheNotebook</div>
  
graph LR User>User] --> ExploreAcsTables ExploreAcsTables --> DownloadAcsTable DownloadAcsTable -- Repeat --> DownloadAcsTable DownloadAcsTable --> DecideWhatToDo{DecideWhatToDo} DecideWhatToDo --> RunAnotherNotebook{RunAnotherNotebook} DecideWhatToDo --> FinishTheNotebook

Background

About The Census Data

Census or ACS Data?

Census data comes in 2 flavors:


1) American Community Survey (ACS)

  • First released to the public in 2006
  • Derived using 5 Year Estimates (past 5 years of data)
  • Delievered Annually
  • Delievered at Tract Level. (READ -> 'Geographic Granularity')
  • Data's Margins of error generally prohibit a more granular view
  • ACS Data is accessible programmatically (READ -> 'ACS Programmatic Retrieval')(READ -> 'Geographic Reference Code')

2) Decienial Census

  • Estimates usings 10 years of ACS data
  • Decenial Census data are created, in part, by ACS data.
  • Delievered once every 10 years.
  • Delivered at Block Level. Most Accurate

Geographic Granularity

Census data can come in a variety of levels.

These levels define the specificity of the data.

Ie. Weather a data is reporing on individual communities, or entire cities is contingent on the data granularity.

The data we will be downloading in this tutorial, ACS Data, can be found at the Tract level and no closer.

Aggregating Tracts is the way BNIA calculates some of their yearly community indicators!

Each of the bolded words in the content below are levels that are identifiable through a (READ -> 'Geographic Reference Code') .

  • A census block is the smallest unit of measurement used by the Census
  • Information by census block is only available decenially (i.e. not ACS data)
  • Block groups are the next smallest unit of measurement used by the census and are composed of aggregate census blocks
  • Census tracts are composed of block groups and are the next largest unit of measurement used by the ACS
  • County, city and census designated places are composed of Tracts

For more information on Geographic Reference Codes, refer to the table of contents for the section on that matter.

Run the following code to see how these different levels nest into eachother!

#@title Run: Census Granularities

%%html
<script src="https://code.jquery.com/jquery-1.10.2.js"></script>
<script src="https://unpkg.com/mermaid@7.1.0/dist/mermaid.min.js"> </script>
<script> window.mermaid.init() </script>
<div class="mermaid">
  graph LR
      Census_Block --create--> block_group
      block_group --> tract_level
      tract_level --> csa_level
      csa_level --> jurisdiction_level 
      jurisdiction_level --> state_level
      state_level
</div>
  
graph LR Census_Block --create--> block_group block_group --> tract_level tract_level --> csa_level csa_level --> jurisdiction_level jurisdiction_level --> state_level state_level

Geographic Reference Codes

State, County, and Tract ID's are called Geographic Reference Codes.

This information is crucial to know when accessing data.

In order to successfully pull data, Census State and County Codes must be provided.

The code herin is configured by default to pull data on Baltimore City, MD and its constituent Tracts.

In order to find your State and County code:


Either

A) Click the link: https://geocoding.geo.census.gov/geocoder/geographies/address where upon entering a unique address you can locate state and county codes under the associated values 'Counties' and 'State'

OR

B) Conversly, click https://www.census.gov/geographies/reference-files/time-series/geo/tallies.html

  • The Geographies mainpage contains a lot of data assets.
  • The link to these tallies was located by accessing the geographical references subdirectory of the geographies mainpage and then filtered for publications made on the year 2010 (We are using the 2010 census boundaries)
  • Once clicked, simply scroll down to where you find the header 'Tallies of Geographic Entities By State'
  • Enter your state and press enter.
  • You will be redirected to a plain text file that will contain all the information on your state and its counties.

Working with the ACS Data

Searching for a dataset is the first step in the data processing pipeline.

In this tutorial we plan on processing ACS data in a programmatic fashion.

This tutorial will not just allow you to search/ explore ACS tables and inspect their contents (attributes), but also to download, format, and clean it!

Search Advice

Despite a table explorer section being provided, it is not suggested you use this approach, but rather, explore available data tables and retrieve their ID's using the dedicated websites provided below:

American Fact Finder may assist you in your data locating and download needs: https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml Fact Finder provides a nice interface to explore available datasets. From Fact Finder you can grab a Table's ID and continue the tutorial. Alternately, from Fact Finder, You can download the data for your community directly via an interface. From there, you may continue the tutorial by loading the downloaded dataset as an external resource, instructions on how to do this are provided further below in this tutorial.

Update : 12/18/2019 " American FactFinder (AFF) will remain as an "archive" system for accessing historical data until spring 2020. " - American Fact Finder Website

The New American Fact Finder : https://data.census.gov/cedsci/

This new website is provided by the Census Org. Within its 'Advanced Search' feature exist all the filtering abilities of the older, depricated, (soon discontinued) American Fact Finder Website. It is still a bit buggy to date and may not apply all filters. Filters include years(can only pick on year at a time), geography(state county tract), topic, surveys and Table ID. The filters you apply are shown at the bottom of the query and submitting the search will yield data tables ready for download as well as table ID's that you may snag for use in this tutorial.

ACS Programmatic Retrieval

Notes on the Census API

From ME:

  • Details and Subject tables are derived using the 5 year ACS data.
    • As a reminder, estimates using (ACS) 5-year estimates arrive at the tract level
  • These tables are created by the census and are pre-compiled views of the data.

  • The Detail Tables contain all possible ACS Data.
  • The Subjects Table contains ACS data in convenient groups
  • BNIA create their data mostly using Details table, but sometimes pulling the data from a Subject Table is more convenient (the data would otherwise be found along multiple details tables).

From the ACS Website:

  • Detailed Tables contain the most detailed cross-tabulations, many of which are published down to block groups. The data are population counts. There are over 20,000 variables in this dataset.

  • Subject Tables provide an overview of the estimates available in a particular topic. The data are presented as population counts and percentages. There are over 18,000 variables in this dataset.

For more Information (via API) Please Visit

Guided Walkthrough

SETUP

You will need to run this next box first in order for anything following it to work

%matplotlib inline
!jupyter nbextension enable --py widgetsnbextension

(Optional) Local File Access

Access Google Drive directories:

You can also import file directly into a temporary folder in a public folder

Now lets explore the file system using the built in terminal:

By default you are positioned in the ./content/ folder.

Explore Table Directories

Please Note: The following section details a programmatic way to access and explore the census data catalogs. It is advised that rather than use this portion of the section of the tutorial, you read the section 'Searching For Data' --> 'Search Advice' above and which provide links to dedicated websites hosted by the census bureaue explicitly for your data exploration needs!

Explore the Detailed Table Directory

Retrieve and search available ACS datasets through the ACS's table directory.

The table directory contains TableId's and Descriptions for each datatable the ACS provides.

By running the next cell, an interactive searchbox will filter the directory for keywords within the description.

Be sure to grab the TableId once you find a table with a description of interest.

Once you a table from the explorer has been picked, you can inspect its column names in the next part.

This will help ensure it has the data you need!

Explore the Subject Table Directory

The Data Structure we recieve is different than the prior table.

Intake and processing is different as a result

Now lets explore what we got, just like before.

Only difference is that the column names are automatically included in this query.

Get Table Data

Intro

Hopefully, by now you know which datatable you would like to download!

The following Python function will do that for you.

  • It can be imported and used in future projects or stand alone.

retrieve_acs_data[source]

retrieve_acs_data(state, county, tract, tableId, year, saveAcs)

Function Explanation

Description: This function returns ACS data given appropriate params.

Purpose: Retrieves ACS data from the web

Services

  • Download an ACS dataset from an Subject (S) table
  • Download an ACS dataset from a Details (B) table

Input:

  • state
  • county
  • tract
  • tableId
  • year
  • saveAcs

Output:

  • Acs Data.
  • Prints to ../../data/2_cleaned/acs/

How it works

  • Before our program retrieve the actual data, it will want the table's metadata.

    • This metadata will be used as a crosswalk to replace the awkward column names
    • If this is not done, only a column ID would denote each column. not human readable.
  • The Function changes the URL it requests data from depending on if it is an S or B type table the user has requested

  • Multiple calls for data must be made as a single table may have several hundred columns in them.

    • Constructing a table requires merging the data from multiple responces
  • Our program not just pulls tract level data but the aggregate for the county.

    • County totals are included automatically as 'tract 010000'.
      • The County total is not the sum of all other tracts but a seperate, indendent and unique query.
    • Tract and County Datatables must be merged to form a single dataset
  • Finally, we will download the data in two different formats if desired.

  • If we choose to save the data, we save it with the Table IDs + ColumnNames, and once without the TableIDs.

Function Diagrams

#@title Run: Class Diagram retrieve_acs_data()
%%html
<script src="https://code.jquery.com/jquery-1.10.2.js"></script>
<script src="https://unpkg.com/mermaid@7.1.0/dist/mermaid.min.js"> </script>
<script> window.mermaid.init() </script>
<div class="mermaid" style="height: 300px;">
 classDiagram
      RetrieveAcsData <|-- RetrieveAcsData
      RetrieveAcsData : + String: state => required 
      RetrieveAcsData : + String: county => required 
      RetrieveAcsData : + String: tract => required 
      RetrieveAcsData : + String: tableId => required
      RetrieveAcsData : + String: saveAcs => required
      RetrieveAcsData : + String: includeCountyAgg => required 
      RetrieveAcsData : + Int: year => required
      RetrieveAcsData: + getParams(keys)
      RetrieveAcsData: + getBCityParams(keys)
      RetrieveAcsData: + readIn( url )
      RetrieveAcsData: + addKeys( table, params)
  
classDiagram RetrieveAcsData <|-- RetrieveAcsData RetrieveAcsData : + String: state => required RetrieveAcsData : + String: county => required RetrieveAcsData : + String: tract => required RetrieveAcsData : + String: tableId => required RetrieveAcsData : + String: saveAcs => required RetrieveAcsData : + String: includeCountyAgg => required RetrieveAcsData : + Int: year => required RetrieveAcsData: + getParams(keys) RetrieveAcsData: + getBCityParams(keys) RetrieveAcsData: + readIn( url ) RetrieveAcsData: + addKeys( table, params)
#@title Run: retrieve_acs_data Flow Chart

%%html
<script src="https://code.jquery.com/jquery-1.10.2.js"></script>
<script src="https://unpkg.com/mermaid@7.1.0/dist/mermaid.min.js"> </script>
<script> window.mermaid.init() </script>
<div class="mermaid">
  graph LR
      retrieve_acs_data>retrieve_acs_data] --> getMetaData
      getMetaData --> useSubjectTables{useSubjectTables?}
      useSubjectTables --> getTractData
      useSubjectTables --> getCountyData
      getTractData --> RenameCols
      getCountyData --> RenameCols
      RenameCols --> Save{Save}
  
graph LR retrieve_acs_data>retrieve_acs_data] --> getMetaData getMetaData --> useSubjectTables{useSubjectTables?} useSubjectTables --> getTractData useSubjectTables --> getCountyData getTractData --> RenameCols getCountyData --> RenameCols RenameCols --> Save{Save}
#@title Run: Gannt Chart  retrieve_acs_data()

%%html
<script src="https://code.jquery.com/jquery-1.10.2.js"></script>
<script src="https://unpkg.com/mermaid@7.1.0/dist/mermaid.min.js"> </script>
<script> window.mermaid.init() </script>
<div class="mermaid">
gantt

	title retrieve_acs_data
	dateFormat  dd

	section getMetaData
	getMetaData        :a1, 01, 1d

	section getTractData
	getTractData      :a2, after a1 , 1d

	section getCountyData
	getCountyData      :after a1 , 1d

  	section Clean
	Clean      :a3, after a2 , 1d

  	section Save
	Save      :after a3 , 1d
</div>
  
gantt title retrieve_acs_data dateFormat dd section getMetaData getMetaData :a1, 01, 1d section getTractData getTractData :a2, after a1 , 1d section getCountyData getCountyData :after a1 , 1d section Clean Clean :a3, after a2 , 1d section Save Save :after a3 , 1d
#@title Run: Sequence Diagram  retrieve_acs_data()

%%html
<script src="https://code.jquery.com/jquery-1.10.2.js"></script>
<script src="https://unpkg.com/mermaid@7.1.0/dist/mermaid.min.js"> </script>
<script> window.mermaid.init() </script>
<div class="mermaid">
  sequenceDiagram
    retrieve_acs_data->>+getMetaData: URL 
    getMetaData-->>-retrieve_acs_data: MetaData
    retrieve_acs_data->>+getData: URL 
    getData-->>-retrieve_acs_data: Data
    
    retrieve_acs_data-->>+Prettify: Data
    Prettify-->>+Save: Data
</div>
  
sequenceDiagram retrieve_acs_data->>+getMetaData: URL getMetaData-->>-retrieve_acs_data: MetaData retrieve_acs_data->>+getData: URL getData-->>-retrieve_acs_data: Data retrieve_acs_data-->>+Prettify: Data Prettify-->>+Save: Data

Function Examples

Now use this function to Download the Data!

# Our download function will use Baltimore City's tract, county and state as internal paramters
# Change these values in the cell below using different geographic reference codes will change those parameters
tract = '*'
county = '510'
state = '24'

# Specify the download parameters the function will receieve here
tableId = 'B19001'
year = '17'
saveAcs = True
df = retrieve_acs_data(state, county, tract, tableId, year, saveAcs)
df.head()
Number of Columns 17
B19001_001E_Total B19001_002E_Total_Less_than_$10_000 B19001_003E_Total_$10_000_to_$14_999 B19001_004E_Total_$15_000_to_$19_999 B19001_005E_Total_$20_000_to_$24_999 B19001_006E_Total_$25_000_to_$29_999 B19001_007E_Total_$30_000_to_$34_999 B19001_008E_Total_$35_000_to_$39_999 B19001_009E_Total_$40_000_to_$44_999 B19001_010E_Total_$45_000_to_$49_999 B19001_011E_Total_$50_000_to_$59_999 B19001_012E_Total_$60_000_to_$74_999 B19001_013E_Total_$75_000_to_$99_999 B19001_014E_Total_$100_000_to_$124_999 B19001_015E_Total_$125_000_to_$149_999 B19001_016E_Total_$150_000_to_$199_999 B19001_017E_Total_$200_000_or_more state county tract
NAME
Census Tract 1901 796 237 76 85 38 79 43 36 35 15 43 45 39 5 0 6 14 24 510 190100
Census Tract 1902 695 63 87 93 6 58 30 14 29 23 38 113 70 6 32 11 22 24 510 190200
Census Tract 2201 2208 137 229 124 52 78 87 50 80 13 217 66 159 205 167 146 398 24 510 220100
Census Tract 2303 632 3 20 0 39 7 0 29 8 9 44 29 98 111 63 94 78 24 510 230300
Census Tract 2502.07 836 102 28 101 64 104 76 41 40 47 72 28 60 19 27 15 12 24 510 250207