Clouds across the moon

This movie shows a heatmap of London Bikeshare activity over the course of an average day – red indicates the density of arrivals, cyan the density of departures – and so white areas are where arrival and departures match. Animation by Martin Zaltz Austwick (@sociablePhysics) with help from Oliver O’Brien (@oobr) of UCL-CASA.

This animation scales the intensity of colour to the all-time maximum – which is why the brightest colours occur at rush hour(s). Those two big dots are King’s Cross and Waterloo. This visualisation us better for comparing activity at different timeperiods, but is pretty useless for examining spatial patterns at the quieter times.

This animation scales the intensity of colour to the most intense activity at each time point. This leads to the strange paradox of the animation getting brighter as a whole outside rush hour. This is because many areas are similarly busy and no one area stands out – so many areas appear bright. This visualisation is more useful for understanding geographical patterns at each time point and is useless for comparing total activity at different timeperiods.

So how was this produced? From a network map, surprisingly. I looked at the Transport for London data of bike journeys (covering November 2010-May 2011) and, based on an average of all the data falling on weekdays, constructed a network which told me, minute by minute, how many bikes were on each route. By “route” I mean “edge” as in “it’s 10.33 – how many bikes are travelling between London Bridge and Gower Place”. Then I summed those up – so “At 10.33, how many bikes in total are on journeys that started from London Bridge” and “at 10.33, how many bikes are travelling towards Gower Place”. Network Theorists – this is broadly like in- and out-degree.*

Bear in mind that this is not the same as the number of bikes leaving (arriving) at that time point – it is the number of bikes on the road at that time point that originated (will end up) at that source (destination). The former analysis is easier to do, in fact, but my code was set up for the latter.

That yields a set of points with data about bikes which have left it, and bikes which will arrive at it. The colour scheme could easily be applied to point data, so let’s. Data is scaled to some maximum (the maximum in or out value (whichever’s bigger) either for all time or at the current time, depending on the vis). The colours are overlaid and chosen to be complementary (in this case, red and cyan) – so if the in and out activity is equal, we get White (bright white for strong in, strong out, dimmer grey for weak but equal in and out).

That’s the conceptually tricky part, if you know what Gaussian convolution is – that’s what I did next. I played around with the window until it covered the space reasonably. To speed up the process, I created two Gaussian images (one red, one cyan) with a 3sd extent and used the intensity point data to create a mask which could be used to scale the intensity of each Gaussian. Then the “new” Gaussian could be drawn, centred on the point position, and using the blend() function, the total intensity of the overlapping Gaussians added to create the heatmap. This was repeated for all the points and both the “in” and “out” point data, and when rescaling at each timepoint, a final rescaling was carried out to ensure that the full dynamic range was being used. Using Processing’s built in graphics methods seemed to be faster than “by hand” Gaussian convolution, but there are probably even faster ways to do it. Thanks to Jon Reades for hints on speeding up the calls to the MySql database where the journey data sits.

Possible extensions: cartographers would probably like to see maps. That’s fairly easily done and would enhance readability whilst sacrificing the rather abstract nature, which I like. I would also have to work a bit harder on using graphics methods for the GC if I did that. Another simple extension would be to use actual arrival/departure data rather than the proxy I describe (I suspect this proxy leads to a certain amount of time-smoothing, which has certain advantages and does not massively skew the results, I suspect).

*I divide each bike’s contribution to edge weight by its journey time so a bike on a long journey does not have undue weight on the system over all time just by appearing in multiple time windows. If I did not do this, long journeys would be more important than short ones over the course of the day. I don’t want to dwell on this but thought it important to mention – I will no doubt write about this again in the future.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s