I applied some data analysis concepts to a dataset obtained from Tau Beta Pi HQ.
Goals
- gain a better understanding of the Texas Gulf Coast Alumni Chapter of Tau Beta Pi’s constituency
- determine parts of town where events might be hosted to achieve greater turnout
Data was received in .XLSX format, so I naturally gravitated towards Microsoft Excel for data analysis.
I constructed a series of PivotTables to gain insight about the geographical distribution of the alumni chapter’s constituency.
Prior to mapping, I filtered to count only those entries that have an email address, yielding 2682 out of 9127 records. My focus is on those members that have provided their email addresses: without external support, it will be challenging and expensive to reach out to the broader population.
I discovered OpenHeatMap (GitHub) while searching for a tool that would easily map entries from a CSV file. Thanks to Pete Warden for producing OpenHeatMap.
Tested the effect of cleaning ZIP code data, stripping any ZIP+4 codes back to five-digit ZIP codes.
Limitations
- Some members list their work address instead of their home address, resulting in inflated counts from some parts of the city (namely downtown and Rice University)
- OpenHeatMap did not correctly resolve all ZIP codes
- This was noticeably worse when mapping ZIP+4 (ZIP+4 openheatmap.com vs. ZIP openheatmap.com)
I attempted to overcome the second limitation by providing more contextual location information, appending city and state to the ZIP code.
Success!
openheatmap.com/view.html?map=GangligliaNonfermentedReportings
Note that the maps above were generated prior any data cleaning.
The final map was generated after cleaning the data of any misspelled cities
openheatmap.com/view.html?map=ThallusesBridgetownPseudoskeleton
Updates
20170222 I learned of Google FusionTables, which can be used to generate heatmaps and much more. Beginning further analysis to see how this tool might be used. So far I have observed the Google Maps Geocoding service used by FusionTables to be rather slow – perhaps it’s single-threaded?
Unfortunately, the heatmaps generated by Google FusionTables show only 1000 rows of data. Additional limitations are listed here.