Datavisualisation programming: a recap

3828878e886211e389aa0e632b3228b2_8About a year ago, I wrote this post which rounded up some useful books showcasing and providing techniques for datavis. I should say that I’m primarily a programmatic visualator (i.e I tend not to deal with the GUI-style visualisation platforms, like GePhi or GIS packages, for example). Looking at the new books I’ve seen on datavis reveals a more fractured landscape of datavis, which came as a bit of a surprise to me. Two years ago, Processing was still the most powerful language for datavis, at least as far as I was concerned, and I thought it would only be a matter of time before R and Python (which I saw as its main alternatives) would create packages to generate animations and interactive visuals at the same level of elegance and sophistication. What I have seen instead is Processing going in a slightly different direction, JavaScript doing some amazing work, and R and Python not moving forward in those directions as much as I expected. They have done some interesting stuff in other directions, though.

Rather than just a simple update of the literature I covered last time, I thought it might be more interesting to compare and contrast these programming languages and talk about where I see them in the datavis landscape. Usual caveats apply: this is my limited and personal take on things, and people may be aware of libraries or techniques that I’ve missed out on with these broad brush strokes. I think you can probably do absolutely anything with any of these languages in theory – I’m more interested in what people use them for in practice. Let me know about those in the comments, or on twitter (@sociablePhysics).

Processing
Tl;dr – Processing is a language for creating images and animations, and looks lovely. It is less well suited to analysis, and not really native to the web. It’s moderately easy to learn.

Processing is a language based on Java, which uses various utility features to make Java programming easier and nicer and less of a syntactic spaghetti. I like Processing as a teaching language, because I think it is fairly approachable, but requires you to know what you are doing. It has types, does object orientation well, it uses curly brackets, and all the stuff a programming language should have. Once you’ve been programming Processing for a while, you’re programming Java. And Java is powerful and fast. If you’re into agent based models, NetLogo and Repast are Java based, so you’re doing it already. Processing has a structure that lends itself to interaction and animation; it’s built to do that. Processing.js and Processing for Android means that you can create an app that can run on desktop, mobile or web with one (slightly modified, or at least carefully created) bit of code, and that has the potential to be pretty awesome. Finally, Processing looks great.

But. Processing has done some stuff that makes making video harder, and that’s a real shame. On interactive stuff, people don’t really use java applets for the web; they use JavaScript (js). Processing is fast, but Processing.js is not especially fast, compared to libraries optimised for JavaScript. Also, Java is not the natural language for scientific computing, so if you want to get beyond vis to modelling or analysis, it can be harder work than it should be. Likewise I’ve always found the map libraries for Java to be large and unwieldy (with the exception of CASA alumnus Jon Reades’ MapThing), so GIS is not readily served here.

Reas, C. & Fry, B., 2007. Processing: A Programming Handbook for Visual Designers and Artists, The MIT Press.
Shiffman, D., 2008. Learning Processing: A Beginner’s Guide to Programming Images, Animation, and Interaction, Morgan Kaufmann.

These books provide a good intro to Processing for the beginner.

Fry, B., 2007. Visualizing Data: Exploring and Explaining Data with the Processing Environment 1st ed., O’Reilly Media.

The Fry book here is a bit dusty (it’s a few years old, now), but is the main book I’ve seen on using Processing for datavis.

Shiffman, D. & Fry, S., 2012. The nature of code

Daniel Shiffman’s a The Nature of Code covers a broad range of techniques related to complexity/biology/physics techniques in Processing. It covers a lot of the approaches I take to programming in both the ASAV and AAC courses I teach and tutor on.

Within my centre, I’m a habitual Processing user, and Camillo Vargas-Ruis and Ed Manley are equally keen on Java.

d3.js
Tl;dr – d3 is pure web datavis. It’s very web friendly, easy to cut and paste but more complex to actually understand. It does web datavis amazingly well, but not much else.

d3 is a strong contender on the visualisation front to rival Processing. Author Mike Bostock (now at the NYT) has used the mantra of Data Driven Documents (hence “d3”), creating a js library which explicitly binds data to visual (svg) objects in the browser, and lots and lots of smooth, well-optimised libraries for creating graphs, bubble charts, force graphs, pie charts, and maps. It is very diverse, fast and web-ready.

But. Maybe it’s just because I’m used to Java, but js is WEIRD. Dynamic typing, callback functions, anonymous functions – they are all kinda kooky. I think d3 is pretty easy to use for cut and paste programmers, but I find some of the things it does very odd, when you get under the hood. d3 is a visualisation language, and it’s very good at it; but I can’t imagine doing anything properly analytical. And programming d3 is still programming; I would characterise js as one of the harder languages to use.

Murray, Scott. Interactive Data Visualization for the Web: [an Introduction to Designing with D3]. Sebastopol, CA: O’Reilly, 2013.

This is a nice introduction to d3, and even gives some tips for those new to js. It doesn’t cover much more than the basics, but I found it a very good way in. I don’t have any recommendations for general js books, but there are plenty out there; codeacademy was quite useful for me here.

Within CASA, Rob Levy, Panos Mavros, Elio Marchione and Robin Edwards are d3 users.

Python
Tl;dr – Python does all sorts. It’s very easy to learn and use, and very flexible and powerful. But it’s not super web-friendly, and it’s not all that pretty.

Python is pretty much designed to be nice to use and learn. The syntax is way easier than any of these other languages for anything you might actually want to do. The new Python notebook lets you write narrative around your code in a nicely presented format, yet another reason why it’s a good language to learn if you’re new to programming. For scientific computing, there are tons of packages and a nice friendly user community, making Python very powerful if you want to get into modelling and analysis.

But. All that dynamic typing and whitespace can make you a sloppy programmer if you don’t know better; and I shudder at the thought of debugging large programmes which use meaningful white space. Python outputs aren’t all that pretty. They’re fine, and it’s ok for mapping and graphing, but I haven’t seen Python produce anything really innovative and beautiful. I’m not sure I’d know where to start if I wanted to build something interactive with Python, which doesn’t mean it’s impossible – here’s something new but I’ve not had a look yet.

McKinney, Wes. Python for Data Analysis: [agile Tools for Real-World Data]. Sebastopol, CA: O’Reilly, 2013.

This covers pandas, one of the user-friendly data manipulation packages that Python uses. The book builds on an IPython approach, which is a nice, friendly, literate programming environment. For educational users, Enthought Canopy is a good IPython environment.

At CASA, Steph Hugel uses Canopy to teach on the BASc data course, and Python use is pretty widespread – Hannah Fry and lots of others on the Enfolding project are keen. Python is pretty ubiquitous in scientific computing.

R
Tl;dr – R is a powerful statistical programming language. It’s great for maps, but not very flexible or web-friendly.

R is my least favourite programming language, but even I have to admit how powerful it is, and how much it’s improved in the last few years. It’s not wildly dissimilar from Python in the things it’s used for, but it started life as primarily a statistical language. It is really powerful at this; almost any statistical analysis you might wish to do, from K-mean clustering to Support Vector Machines, there will exist a package that does it. RStudio is a nice (MatLab-like) environment (IDE) and has notebook outputs for that “literate programming” vibe*. If you know what you’re doing, you can make really nice maps, too.

But. R is kinda funny-lookin. It’s only recently started using equals signs for assignment. I don’t really know why that is. R makes it very easy to do very complex stuff, and I suspect that’s a double edged sword. R isn’t particularly suited to interactive visualisation or animation, although as with all these languages, it is possible, and apparently there are ways of pushing R outputs into d3. I don’t think R is a particularly flexible language, but as with all of these, I imagine that people are figuring out clever ways to do all sorts of stuff with it.

Yau, N., 2011. Visualize This: The Flowing Data Guide to Design, Visualization, and Statistics, John Wiley & Sons.

Nathan Yau’s Flowing Data is a wonderful website of beautiful visualisations – and this book is full of great vis and great advice. He also released Data Points last year, but I’ve only just ordered it from my local bookshop, so I haven’t had a chance to read it.

At CASA, James Cheshire is CASA’s R ninja, and produces a lot of beautiful maps and sharp analyses with R. A lot of geographers, like Adam Dennett, like it too.

There are other specific libraries and software packages that people use, but these are probably the most common. Actually, asking around the office, there’s a pretty even spread – as well a fair few people using C++ and MatLab, which I’ve not even mentioned†. And in the wider world, Ruby and PHP seem to be popular, although I’ve never written a line of either language. Increasingly, though, a smorgsabord approach might be the best way forward if you want to learn the skills to visualise, analyse, model and share interactive web visuals.

So there you go; if you’re interested in studying these techniques, our MRes ASAV covers a lot of Processing, some R and a little bit of Python in Adam Dennett’s GIS modules (as well as a range of 3D visualisation techniques taught by centre director Andy Hudson-Smith). If you’re already at UCL, some of these modules can be taken as options, and if you’re studying UCL’s BASc, the second year “Digital Literacy and Data Visualisation” uses Python for data analysis and some simple visualisation. Get in touch if you’d like to know more.

*I know that there are a bunch of neckbeards that won’t do any programming unless its on the command line with some rubbish text editor, but I like living in the 21st Century. Colour coding! Debugging! Dropdown menus! Oooooh. If you must, Sublime Text is a lovely text editor.

MatLab is commercial and mainly used by physical scientists; I actually think it’s very good, but if you’re new to programming, I’d recommend Python or R instead. I don’t know much about C++ except it’s vaguely similar to Java but I think it interfaces with hardware a bit better than Java.

3 thoughts on “Datavisualisation programming: a recap

  1. Pingback: The Functional Art and other stories | Sociable Physics

  2. processing.py is an interesting option for those that wish to use Processing’s capabilities while using Python coding. It uses Jython to port the code to Java. External libraries are also mostly supported…

  3. Pingback: Data Visualisation for Public Engagement at #scicomm14 | Sociable Physics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s