“What story do I want to tell?”
That question lies at the heart of every visualization. After two things were stolen in my first two weeks in Cambridge, I got curious about thefts trends. Questions help me clarify stories. In this case there are two:
- How does total theft in some areas compare to total theft in others?
- How does each area’s theft trend over time compare with others around it?
My initial intent was to map the two questions using Cambridge and/or Boston metro area data. The closest I found was a reference from the Cambridge data links page to some pre-made 2005 maps. Mapping the questions sounded fun despite Cambridge data availability issues (apparently a shared problem), so I went ahead using data from Chicago’s awesome city data portal. The resulting map is on the right.
2003-2010 total Chicago thefts under $300 by ward
- Total per Ward
- - Darker == more total theft
- Theft Trend
- - % change from 2003 on a 20%-120% scale
Crafting the Stories
Time series and spatial relationships are a challenge to combine in a single visualization. Three options include animation, small multiples, and embedded charts.
One solution is motion – i.e., representing change over eight years by showing eight maps over eight seconds. I’m not a huge fan of animated choropleths since humans cannot effectively comprehend color transitions in fifty polygons (7.84MB).
Another option advocated by Edward Tufte is small multiple maps. In this scenario, it requires substantial effort to compare many proximal polygons over time, so it wasn’t my first choice.
Embedded charts are ideal. The combination of line charts and geographically positioned wards shows both spatial relationships and trends effectively. Still, they require some tweaking to get there – desaturation, map feature removal, selective recoloring, hiding polygon boundaries to emphasize the trend charts, and varying theft total saturation and lightness all help both stories stand out depending on focus. While absent polygon borders make individual wards differentiation harder, the major areas are more visible – a reasonable tradeoff of low-level details for high-level patterns and trends.
Assuming the data is valid for this purpose, reported thefts in all wards showed net declines between 2003 and 2010. Contrasts between wards with high and low total theft are easy to see – higher theft in the city center extends to the Northwest, West, and South.
Overall I’m happy with the outcome and had fun creating it. In fact, If you’re reading this and you happened to steal a fridge in front of my steps a few weeks ago, consider it a gift. Cheers!
Technologies that went into this visualization (roughly in order applied):
- Chicago Theft Data (CSV)
- Chicago Ward Boundaries (ESRI Shapefiles)
- Shpescape.com (Convert ESRI shapefiles to ward Fusion Tables)
- Fusion Tables (Merge ward geo data and thefts data. Export to Google Refine for cleaning. Re-import cleaned data. Format KML via handy style formatter in visualize>map menu)
- Google API Loader (Load the maps API)
- Google Maps API (Framework for interacting with Google Maps)
- Google Charts API – Image Charts (Info window bar charts and embedded line charts)
Ahead eventually, probably, another visualization mapping near real-time human psychological well-being by country with the newly released NextStage SampleMatch data. Cool stuff!