I applied some data analysis concepts to a dataset obtained from Tau Beta Pi HQ.
- gain a better understanding of the Texas Gulf Coast Alumni Chapter of Tau Beta Pi’s constituency
- determine parts of town where events might be hosted to achieve greater turnout
Data was received in .XLSX format, so I naturally gravitated towards Microsoft Excel for data analysis.
I constructed a series of PivotTables to gain insight about the geographical distribution of the alumni chapter’s constituency.
Prior to mapping, I filtered to count only those entries that have an email address, yielding 2682 out of 9127 records. My focus is on those members that have provided their email addresses: without external support, it will be challenging and expensive to reach out to the broader population.
Tested the effect of cleaning ZIP code data, stripping any ZIP+4 codes back to five-digit ZIP codes.
- Some members list their work address instead of their home address, resulting in inflated counts from some parts of the city (namely downtown and Rice University)
- OpenHeatMap did not correctly resolve all ZIP codes
I attempted to overcome the second limitation by providing more contextual location information, appending city and state to the ZIP code.
Note that the maps above were generated prior any data cleaning.
The final map was generated after cleaning the data of any misspelled cities
20170222 I learned of Google FusionTables, which can be used to generate heatmaps and much more. Beginning further analysis to see how this tool might be used. So far I have observed the Google Maps Geocoding service used by FusionTables to be rather slow – perhaps it’s single-threaded?
Unfortunately, the heatmaps generated by Google FusionTables show only 1000 rows of data. Additional limitations are listed here.