There are lots of ways to visualise data, each of which giving a slightly different perspective. By now regular readers of this blog will have seen this London Bikeshare scheme animation rather a lot:
These animated tadpoles give a fascinating portrayal of one day of data – but lacks analysis. What, if instead we consider all of the data collected on weekdays together and think about the total flows of bikes:
Here, each line represents a possible “route” – in the sense of starting point and end point. The weight of the line represents the frequency of the journey, and the colour represents the average cycling “rate” – defined by the straight line distance distance divided by the journey time. Not that this is different from an average speed, which would be the journey distance (i.e. the length of the route, not the straight line distance between start and end) divided by the journey time. Even for one bike, this would represent the average speed over the journey.
In this static image I have chosen to use unrouted data – so two points are connected by a straight line and not a realistic route. Showing routes would have caused a lot of direct overlap (two routes sharing a line) rather than crossing that we see above. We might consider this a first step of abstraction from the tadpole animation – we have averaged over three months of weekday data, and our routes have become more abstract edges.
One interesting facet of this is that we have defined a network of nodes and edges by flows alone. This means that there is nothing to explicitly link it to a physical infrastructure, and people’s journeys change from day to day, hour to hour, month to month and year to year. A snapshot at 9am would show a very different network than at 1pm. Any one of those points will have statistics, hierarchies and global metrics, things that we can track over time. In my next blogpost, I’d like to talk about those in more detail.
But there’s a final fact to notice – the system is massively underloaded. In 3 months we have 1.4 million journeys; the system had around 350 stands, giving rise to about 120,000 edges. If you were to spread those 1.4 million journeys equally across all the possible edges, you would get about 12 bikes per route. Let’s be clear – that’s an average of one bike per route per week! Of course, the edges are not equally likely (as we will see), so some routes will have no bikes in that period and some will have hundreds of trips. The point is that even for three months of data we are talking about small average numbers per route; then slicing it to look at mornings (say) reduces that further; weekdays further still; and if we wanted to compare month by month (for seasonal effects), smaller still, to the point that a single person deciding to take a journey or not could have a significant impact on the “trends” we think we’re seeing. I’m not cycling today because I forgot to bring my helmet – have I messed up someone’s research project?