Methods of Visualisation Part 3: Visualising Networks

In our Boris Bike journey dataset, there is time and space explicit in it, but also implied networks which connect the nodes (bike stations) via edges (flows). Because those edges are defined by the number of bikes travelling along them, they are weighted – we can sensibly talk about the “importance” of an edge relative to others, and indeed quantify it*. As before, the fact that these edges are defined by flows means that they can be time-sliced arbitrarily small – at the cost of increasing noise and losing significance. This means that we are dealing with a dynamically evolving network, whose structure and topology changes throughout the day, week, month and year – and obviously we expect to see repeated behaviours and trends on each of those timescales.

Let’s look at the aggregated data once again. I want to take space out of my visualisation and simply look at how things are connected to one another. I’ll use a fairly standard approach to lay out these nodes and edges in space, which will help to make some of the structures of the network clearer and more obvious.

The way I’ll approach that it is to treat each node as a charged ball and each edge as a spring – something called a force-graph representation. If you’ve used software like GePhi, using force graphs to arrange and display networks should be familiar – but they’re not very hard to write from scratch – the examples you see below took a day or so.

So, the idea is this – nodes repel one another. If there are no forces of attraction between two nodes, they will move apart from one another. Nodes connected by an edge (“spring”) want to be a certain distance from one another – this is almost always smaller than the forces of repulsion allow, so springs tend to pull connected nodes together. That’s basically it. Springs pull together, nodes push apart, and you hopefully end up with an arrangement where you can see the connections, clusters, hubs and loops reasonably clearly.

There are a couple of other things you need to consider – firstly, how you introduce the edges. If you just draw them all in straight away, you often end up with a structure which can’t find its equilibrium (i.e. stable configuration). My solution to this is to bring in the edges gradually, starting with the strongest link, giving the system a few moments to find equilibrium, and then repeating with the next strongest link. This ensures that the overall structure is defined by the strongest links. This is better viewed in full screen, btw:

This approach is similar in spirit to a mathematical approach used quite a lot in physics called perturbation theory. But that’s not important right now.

I have two other strategies to help this to succeed. First, the spring strengths are proportional to the exponential function of the number of journeys. I assume that there are nearly exponentially more weak links than strong links – so these weaker links could swamp the fewer, stronger links unless we make the stronger links exponentially stronger (we could check this mathematically, but I haven’t yet done that). The normal way is just to make the “strength” of the spring directly proportional to the weight – this would give a slightly different structure.

Secondly, if we include all the links, we more than likely, we end up with a frustrated structure –  one that doesn’t have a unique equilibrium and will tend to bounce around between different states. Because there are a lot of links with more than 0 bikes, this gets computational “expensive” and the whole thing slows down. I assigned an arbitrary cutoff – 90. This correponds to an average of one bike per day on that route – if it’s less than that, I don’t think we need to worry about that edge. This gives a slightly simpler and clearer picture of the network (reducing over 130,000 edges to a ‘mere’ 12,000**), without losing all the nuance of the structure.

Finally, we can use geographical starting points for our force graph – this is one way of seeing, rather crudely, whether the clustering of points in the network is primarily geographical, or whether it is strongly aspatial.

More sophisticated models and analyses will tell us in more detail whether the clustering is spatial or organised along other lines –  but this simple visualisation immediately gives us clues that, in this case, there is a very strong spatial component. For example, what I will call the “hyde park cluster” remains clustered close together in the west of the city (in fact, the links within the cluster ensure that it gets more tightly bound together); further east, you can see Waterloo and King’s Cross assuming more central positions by virtue of their centrality in the network***.

One final note on the visualisations –  if you’re eagle-eyed you may have spotted that flows can go both ways, but the connections in the force graphs don’t really reflect that. Representing two-way flows is not trivial and probably a good subject for future expansion of these visualisations.

* there are networks where the edges are binary – on/off, yes/no.

**incidently, only 76 routes have more than 900 bikes or approx. ten per day; only 3 routes have more than one bike per hour (on average). See what I mean about sparse!?

***To reiterate, the network does not consider space; but distance can (and in this case does) influence the strength of the edges (springs)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s