Data Visualisation for Public Engagement at #scicomm14

UCL sustainability research around energy (credit: Martin Zaltz Austwick and Charlotte Johnson 2014)

UCL sustainability research around energy (credit: Martin Zaltz Austwick and Charlotte Johnson 2014)

I’m excited to be chairing a session on Data Visualisation for Public Engagement at the British Science Association’s annual Science Communication conference, which is in sunny Guildford this year. It’s not until May, but when you keen scicommers, academics, science journalists, students, museums people and scicurious freelancers sign up, you’ll need to tell the nice people that you want to come to our session and not one of the equally awesome other ones, so I thought I’d get in ahead of time.

Data visualisation (aka “datavis”) is in the news constantly. The British Library are currently running an exhibition of scientific visualisation, books about visualisation and infographics sell by the truckload, and broadsheets and tabloids alike are running data journalism and visualisation blogs. What does this mean for public engagement with research, and science in particular? I’ve put together this session because I want to understand these issues. I’m a lecturer in spatial analysis and visualisation – which means I teach students (mainly from an architecture or geography background) techniques for visualising “human” data (like demographics, transport, twitter data, research funding data) and models (networks, agent-based, cellular automata, neural nets). I think datavis is already having a massive impact in social sciences, but I’m a physicist at heart, and I am really curious about how this all works in the natural sciences.

To this end, I’ve put together what I think is a really exciting panel. Damien George is the most focussed on communicating research outputs from natural science – not only that, but his efforts to map the research landscape in physics articulates what research is for both publics and practitioners. Andrew Steele has done great work visualising government science spending with his Scienceogram, and continues to find ways to communicate and challenge science policy via datavis. Artemis Skarlartidou has worked with communities in mapping potential sites for nuclear waste disposal, and has particular expertise around building trust through visualisation. Together, I want to explore what I think are key questions about datavis – what can it articulate that other ways of communicating cannot? How can it be used for meaningful engagement? Who can use these tools? What opportunities are we missing? And what are the limits of these techniques?

But of course, it won’t just be the panel doing all the talking. Each panellist will discuss datavis in general, and visualisations they’ve worked on, for about ten minutes each, leaving a generous 45 minutes for a decent discussion – technical, ethical, practical, or otherwise. Because datavis is fairly current, I’m expecting a lot of interesting views in the room – but we don’t require attendees to be experts, so even if you don’t know the right end of a visualisation from the wrong one, come along to question, debate and see what the fuss is about.

The session runs from 3.30 on Thursday May 1st – I hope to see you there.

—If you’re a newcomer, I recently wrote a post recommending some introductory books, as well as one which has my thoughts about which languages you might consider using if you want to get into the nitty-gritty programming side of things.


My favourite episode of Global Lab


The Magic of Podcasting

CASA’s homegrown Podcast, The Global Lab is shortly to relaunch with a new team of interviewers appearing alongside the wizened faces (/voices) of Steve Gray, Hannah Fry and Claire Ross (and me, of course). As part of this relaunch, we’re also getting our back catalogue onto Soundcloud and linking to all of those interviews over the last two and a bit years, and getting the original team to give a shout out to their favourite episode as it goes up. Because it’s March, I started to use the #marchOfGlobalLab hashtag, which quickly turned into #theImplacableMarchOfGlobalLab in my mind, thanks to its connotations of an army of podcasters and interviewees.

What have I learned from Global Lab since Steve and I started it in 2011? Well, I arguably already knew a fair bit about podcasts, at least if my share of two Sony awards is to be taken seriously*. I suppose I learned things I already knew, namely that the most important and valuable thing about any endeavour is usually people, and people in demanding jobs with other priorities will struggle with time-consuming things like podcasting. But I’ve also thought about ways to help with that, so I’ll share some of those here for other podcasters.

So, firstly, lower the barrier to entry. The original show format contained a “news section”, which was basically a chat between two hosts, followed by a short interview, then a brief outro. The news section might take half an hour to prepare, an hour to record and two hours to edit. Half a day’s work every fortnight was too much, so we ditched it. Also, a light edit is ok in the right circumstances. Our interviews used to run for 30-60 minutes, edited down to 15-20 minutes (which takes at least two hours unless you’re very fast/experienced) or unedited, which is too long for a casual audience (IMHO). Now we record 20-25 minute interviews and edit very sparingly.

That means interviews have to be well done. Although I’m not an expert interviewer, I love interviewing people. It is fascinating, it’s a real art, and I think I’ve massively improved at it since my first attempts. Making inexperienced interviewees feel at ease is important, and usually the best way to do that is to be better at interviewing – for example, knowing when to interrupt and interject, because then it will feel more like a chat and less like a monologue, and when not too, because it can be offputting. It’s important to know that the interviewer isn’t there to look like an authority on the topic. The interviewer is the voice of the audience, so if I know (or am busy showing I know) too much, I may not ask important questions at the points where the audience is getting lost. I tend to think that the audience aren’t tuning in to hear my personality, but for a show like Global Lab, we have different guests each episode, and the interviewers are the glue that bind things together, so we need to have a little personality. Hopefully not a deplorable excess.

From the perspective of bringing people in as part of the team, I’ve increasingly tried to make the tech easy. Our original workflow led to a really strong web frontend, but the process was a bit complex and not readily transferable. So we’ve reduced edit expectations and are experimenting with a Soundcloud feed. Sometimes the off-the-shelf option is the best. Also, if you have a team, use the team to train each other in the tech, technique and workflow – they will improve by teaching, and the learning process is fresh in their minds when they train someone else.

My current thinking is that getting any ongoing outreach or engagement activity rolling is in great part about finding enthusiastic people and lowering barriers to them starting and continuing, so that it becomes a small bit of their research life that they look forward to! Having a group of people who are keen really helps, as they can support one another. I hope that this normalisation of public engagement, outreach and dissemination as part of the research process will have long-term impacts. I guess we will have to check with the Global Lab team and see what they say a little bit down the line.

I’ve carefully avoided divulging my favourite Global Lab episode to date – possibly the Sounds of Science panel I participated in, but that’s not a proper Global Lab episode, just me talking about microphones and the sound of a shuttle taking off**. I honestly don’t have a favourite interviewee, and it would be a bit unfair to pick if I did. Maybe Nicholas Peroni’s social life of bats, or Jason Dittmer’s nationalist superheroes. Now if only I’d got James Kneale to talk about H P Lovecraft…

*if you haven’t heard of the Sony Awards and therefore struggle to take them, or me, remotely seriously – you are reluctantly forgiven

**it is really good

The Functional Art and other stories

Force-directed graph of whisky flavours (using d3.js)

Force-directed graph of whisky flavours (using d3.js)

I recently recapped on some of the datavis languages, and some books I’ve found useful to get started with them. I didn’t talk about the more conversational/popsci end of things, so I thought I’d mention some of those here. The previous post would be useful for people with some programming chops , or masters- and Phd- level students; the books here should be accessible to most, or useful as context for undergrads. This is nowhere near comprehensive, I’ll add more as additional blogposts as I go along

First, the venerable classics:
David McCandlessInformation is Beautiful is the coffee table book of the genre, selling millions of copies and spawning an awards. McCandless’ work is well-liked by many, but not universally so – and some people just don’t like Infographics much. Edward Tufte is incredibly influential, The Visual Display of Quantitative Information being his most read. (I think) I’ve said before that I don’t tend to agree with everything he has to say – his redesign of the scatter graph just isn’t going to catch on – and his “data/ink ratio” heuristic has the danger of leading to visuals so information-rich that no one knows what’s going on (although I don’t believe that’s what he intended). He is a critical thinker, though, and a talented designer in his own right, and rightly cited in any discussion on information design. Beautiful Visualisation, edited by Julie Steele and Noah Illinsky is a good showcase of visual design, but given it’s an edited collection, has less of an authorial position than Tufte.

Datavis for data journalism seems to be a growing literary genre – The Data Journalism Handbook (various authors) is a good roundup of data journalism case studies, focussed on journalism as much as data. Simon Rogers (formerly of the Guardian datablog, now twitter) released Facts are Sacred last year, which sits somewhere between McCandless and the above. It has some quite nice case studies, but the image quality is not always what it should be for a vis book. Incidentally, CASA podcast The Global Lab interviewed John Burn-Murdoch (formerly of the guardian datablog, with FT interactive at time of writing) who gives a very good overview of what data journalism is and how to get into it.

I was very impressed with Alberto Cairo‘s The Functional Art; Cairo is an experienced data journalist and visual designer, and in this 2013 book, weaves questions of journalistic practice in to some quite detailed exploration of the principles of visual design and perception. He illustrates this using personal case studies and interviews with key practitioners. To my mind, this is one of the best recent books on the subject – a disadvantage for researchers or scientists is that it tends to focus on news media and infographics more than datavis. But it marries journalism and graphic design wonderfully, even offering balanced critique of Tufte, who people often seem reluctant to criticise by virtue of his stature.

Nathan Yau‘s FlowingData website is must read for datavis, and his technical book on the subject is great. Last year’s Data Points is his foray into more popular style, and is well-designed and full of great example visuals and discussions of datavis and some general design principles. To my mind, this long arc is less compelling than his individual examples, but this is something I find with many vis books I read (/see?), so it might just be my preference. Certainly it’s well designed, but while the print quality is generally good, some of the more detailed images suffer a bit from the smaller format. But at least it’s small enough to read it on the tube. One in the eye* for Tufte and McCandless.

Although it’s not a book (yet!), is worth a look – it finds a terrible visualisation, says pithy and sarcastic things about it, and moves on to the next. It is a fun antidote to serious journalists talking seriously about the importance of telling stories and serious graphic designers eulogising about hand drawn piecharts, and a good palate cleanser if you’re working your way through these.

Finally, if you’re reading this at time of publishing and you’re London-based, the British Library’s Beautiful Science exhibition on visualising science is open now, and runs until May 26th (2014). The “science” they cover is a pleasingly broad church; they have Jon Snow’s cholera map, which while being epidemiological is also considered the birth of GIS, there’s biology and climateology and all sorts. The BL is right next to Kings Cross and Euston, so if you’re visiting the old smoke, it’s definitely worth spending 20 minutes looking at exhibits drawn from their GARGANTUAN archives alongside more recent digital creations.

Happy visualating!

*see what I did there?

Datavisualisation programming: a recap

3828878e886211e389aa0e632b3228b2_8About a year ago, I wrote this post which rounded up some useful books showcasing and providing techniques for datavis. I should say that I’m primarily a programmatic visualator (i.e I tend not to deal with the GUI-style visualisation platforms, like GePhi or GIS packages, for example). Looking at the new books I’ve seen on datavis reveals a more fractured landscape of datavis, which came as a bit of a surprise to me. Two years ago, Processing was still the most powerful language for datavis, at least as far as I was concerned, and I thought it would only be a matter of time before R and Python (which I saw as its main alternatives) would create packages to generate animations and interactive visuals at the same level of elegance and sophistication. What I have seen instead is Processing going in a slightly different direction, JavaScript doing some amazing work, and R and Python not moving forward in those directions as much as I expected. They have done some interesting stuff in other directions, though.

Rather than just a simple update of the literature I covered last time, I thought it might be more interesting to compare and contrast these programming languages and talk about where I see them in the datavis landscape. Usual caveats apply: this is my limited and personal take on things, and people may be aware of libraries or techniques that I’ve missed out on with these broad brush strokes. I think you can probably do absolutely anything with any of these languages in theory – I’m more interested in what people use them for in practice. Let me know about those in the comments, or on twitter (@sociablePhysics).

Tl;dr – Processing is a language for creating images and animations, and looks lovely. It is less well suited to analysis, and not really native to the web. It’s moderately easy to learn.

Processing is a language based on Java, which uses various utility features to make Java programming easier and nicer and less of a syntactic spaghetti. I like Processing as a teaching language, because I think it is fairly approachable, but requires you to know what you are doing. It has types, does object orientation well, it uses curly brackets, and all the stuff a programming language should have. Once you’ve been programming Processing for a while, you’re programming Java. And Java is powerful and fast. If you’re into agent based models, NetLogo and Repast are Java based, so you’re doing it already. Processing has a structure that lends itself to interaction and animation; it’s built to do that. Processing.js and Processing for Android means that you can create an app that can run on desktop, mobile or web with one (slightly modified, or at least carefully created) bit of code, and that has the potential to be pretty awesome. Finally, Processing looks great.

But. Processing has done some stuff that makes making video harder, and that’s a real shame. On interactive stuff, people don’t really use java applets for the web; they use JavaScript (js). Processing is fast, but Processing.js is not especially fast, compared to libraries optimised for JavaScript. Also, Java is not the natural language for scientific computing, so if you want to get beyond vis to modelling or analysis, it can be harder work than it should be. Likewise I’ve always found the map libraries for Java to be large and unwieldy (with the exception of CASA alumnus Jon Reades’ MapThing), so GIS is not readily served here.

Reas, C. & Fry, B., 2007. Processing: A Programming Handbook for Visual Designers and Artists, The MIT Press.
Shiffman, D., 2008. Learning Processing: A Beginner’s Guide to Programming Images, Animation, and Interaction, Morgan Kaufmann.

These books provide a good intro to Processing for the beginner.

Fry, B., 2007. Visualizing Data: Exploring and Explaining Data with the Processing Environment 1st ed., O’Reilly Media.

The Fry book here is a bit dusty (it’s a few years old, now), but is the main book I’ve seen on using Processing for datavis.

Shiffman, D. & Fry, S., 2012. The nature of code

Daniel Shiffman’s a The Nature of Code covers a broad range of techniques related to complexity/biology/physics techniques in Processing. It covers a lot of the approaches I take to programming in both the ASAV and AAC courses I teach and tutor on.

Within my centre, I’m a habitual Processing user, and Camillo Vargas-Ruis and Ed Manley are equally keen on Java.

Tl;dr – d3 is pure web datavis. It’s very web friendly, easy to cut and paste but more complex to actually understand. It does web datavis amazingly well, but not much else.

d3 is a strong contender on the visualisation front to rival Processing. Author Mike Bostock (now at the NYT) has used the mantra of Data Driven Documents (hence “d3”), creating a js library which explicitly binds data to visual (svg) objects in the browser, and lots and lots of smooth, well-optimised libraries for creating graphs, bubble charts, force graphs, pie charts, and maps. It is very diverse, fast and web-ready.

But. Maybe it’s just because I’m used to Java, but js is WEIRD. Dynamic typing, callback functions, anonymous functions – they are all kinda kooky. I think d3 is pretty easy to use for cut and paste programmers, but I find some of the things it does very odd, when you get under the hood. d3 is a visualisation language, and it’s very good at it; but I can’t imagine doing anything properly analytical. And programming d3 is still programming; I would characterise js as one of the harder languages to use.

Murray, Scott. Interactive Data Visualization for the Web: [an Introduction to Designing with D3]. Sebastopol, CA: O’Reilly, 2013.

This is a nice introduction to d3, and even gives some tips for those new to js. It doesn’t cover much more than the basics, but I found it a very good way in. I don’t have any recommendations for general js books, but there are plenty out there; codeacademy was quite useful for me here.

Within CASA, Rob Levy, Panos Mavros, Elio Marchione and Robin Edwards are d3 users.

Tl;dr – Python does all sorts. It’s very easy to learn and use, and very flexible and powerful. But it’s not super web-friendly, and it’s not all that pretty.

Python is pretty much designed to be nice to use and learn. The syntax is way easier than any of these other languages for anything you might actually want to do. The new Python notebook lets you write narrative around your code in a nicely presented format, yet another reason why it’s a good language to learn if you’re new to programming. For scientific computing, there are tons of packages and a nice friendly user community, making Python very powerful if you want to get into modelling and analysis.

But. All that dynamic typing and whitespace can make you a sloppy programmer if you don’t know better; and I shudder at the thought of debugging large programmes which use meaningful white space. Python outputs aren’t all that pretty. They’re fine, and it’s ok for mapping and graphing, but I haven’t seen Python produce anything really innovative and beautiful. I’m not sure I’d know where to start if I wanted to build something interactive with Python, which doesn’t mean it’s impossible – here’s something new but I’ve not had a look yet.

McKinney, Wes. Python for Data Analysis: [agile Tools for Real-World Data]. Sebastopol, CA: O’Reilly, 2013.

This covers pandas, one of the user-friendly data manipulation packages that Python uses. The book builds on an IPython approach, which is a nice, friendly, literate programming environment. For educational users, Enthought Canopy is a good IPython environment.

At CASA, Steph Hugel uses Canopy to teach on the BASc data course, and Python use is pretty widespread – Hannah Fry and lots of others on the Enfolding project are keen. Python is pretty ubiquitous in scientific computing.

Tl;dr – R is a powerful statistical programming language. It’s great for maps, but not very flexible or web-friendly.

R is my least favourite programming language, but even I have to admit how powerful it is, and how much it’s improved in the last few years. It’s not wildly dissimilar from Python in the things it’s used for, but it started life as primarily a statistical language. It is really powerful at this; almost any statistical analysis you might wish to do, from K-mean clustering to Support Vector Machines, there will exist a package that does it. RStudio is a nice (MatLab-like) environment (IDE) and has notebook outputs for that “literate programming” vibe*. If you know what you’re doing, you can make really nice maps, too.

But. R is kinda funny-lookin. It’s only recently started using equals signs for assignment. I don’t really know why that is. R makes it very easy to do very complex stuff, and I suspect that’s a double edged sword. R isn’t particularly suited to interactive visualisation or animation, although as with all these languages, it is possible, and apparently there are ways of pushing R outputs into d3. I don’t think R is a particularly flexible language, but as with all of these, I imagine that people are figuring out clever ways to do all sorts of stuff with it.

Yau, N., 2011. Visualize This: The Flowing Data Guide to Design, Visualization, and Statistics, John Wiley & Sons.

Nathan Yau’s Flowing Data is a wonderful website of beautiful visualisations – and this book is full of great vis and great advice. He also released Data Points last year, but I’ve only just ordered it from my local bookshop, so I haven’t had a chance to read it.

At CASA, James Cheshire is CASA’s R ninja, and produces a lot of beautiful maps and sharp analyses with R. A lot of geographers, like Adam Dennett, like it too.

There are other specific libraries and software packages that people use, but these are probably the most common. Actually, asking around the office, there’s a pretty even spread – as well a fair few people using C++ and MatLab, which I’ve not even mentioned†. And in the wider world, Ruby and PHP seem to be popular, although I’ve never written a line of either language. Increasingly, though, a smorgsabord approach might be the best way forward if you want to learn the skills to visualise, analyse, model and share interactive web visuals.

So there you go; if you’re interested in studying these techniques, our MRes ASAV covers a lot of Processing, some R and a little bit of Python in Adam Dennett’s GIS modules (as well as a range of 3D visualisation techniques taught by centre director Andy Hudson-Smith). If you’re already at UCL, some of these modules can be taken as options, and if you’re studying UCL’s BASc, the second year “Digital Literacy and Data Visualisation” uses Python for data analysis and some simple visualisation. Get in touch if you’d like to know more.

*I know that there are a bunch of neckbeards that won’t do any programming unless its on the command line with some rubbish text editor, but I like living in the 21st Century. Colour coding! Debugging! Dropdown menus! Oooooh. If you must, Sublime Text is a lovely text editor.

MatLab is commercial and mainly used by physical scientists; I actually think it’s very good, but if you’re new to programming, I’d recommend Python or R instead. I don’t know much about C++ except it’s vaguely similar to Java but I think it interfaces with hardware a bit better than Java.

Academic New Years Resolutions 2014

tronI feel like I’ve come a long way in the last year; written papers, applied for grants, read more, directed a course, improved my modules, created new ones, reflected a lot on my teaching – all the stuff a Proper Academic is supposed to do. And while there’s still plenty of ways in which I still have things to learn, I feel like there are lots of areas in which I feel I’ve understood the basics, oriented myself and got the lay of the land.

So be a bit braver is my first resolution. In many ways I now have a comfort zone I can go outside; two or three years ago that probably wasn’t the case! It’s necessary to have a little base camp to return to sometimes, or you freeze to death. It’s nice to feel I have something like that. But now isn’t the time to sit around eating beans over an open fire and belching. Not being afraid to fail is the flip side of this. After all, it would be silly to be afraid of something I appear to be so good at.

Using reading as a source of inspiration is my next resolution. A lot of academic writing is terrible, especially with the way that academic papers are incentivised to do a set of things unrelated to clear communication†. In 2013, I started transforming reading from a chore to a reflex. In 2014, I want to make it a pleasure. While I doubt I’ll ever do that with the intricacies of module proposal paperwork, I’d like to do the same for other aspects of my work. I’m a fan of the maxim that “every action is an opportunity for creativity“, but it takes a lot of energy to live up to that. It’s good enough, I think, to pick a few of those opportunities and make the most of them.

TEDx LSE (in March 2013) was an interesting event for me to take part in; and the theme I kept seeing was “connections”: whether in Helen Arney’s “use everything” – take everything you do and put it in the pot, or Ellie Saltmarshe’s talk in praise of the generalist. I’d like to form and continue to find new connections, whether it’s overlapping teaching and research, working with external partners on student projects or running new projects over the course of my public engagement fellowship. And make sure there are fun, creative things happening outside of my academic life. A lot of these things work their way into making me a better academic one way or another, and certainly contribute to my being a better and happier human being.

On the subject of connections, I like collaboration; in the last year, I’ve had the chance to collaborate on exciting work with people I like, respect and trust – academia has given me the privilege of doing that, and I’d like more, please. Taking more risks is easier when there are people to catch you, and be caught by you. I’d like to read more by people I don’t agree with*, I’d like to find time to blog more regularly, because it helps me to organise my thoughts about things I’ve read, and incentivise setting aside time for proper critical and comparative reading and reflection.

And that’s it. Doubt I’ll get all that done by next January. If you’d like to tweet me yours, I’m @sociablephysics.


†oh, was it REF year? I hadn’t noticed.

*partly for the very specific reason that I want to write about the vision of architecture in The Fountainhead and how that relates to a normal person’s conception of the built environment

Communication on the web for smart men 101


I’ve seen a lot of very unhelpful comments lately, by men, on blogs, by women, usually ones women have written about sexism or some aspect of the way women are treated in particular high-skill industries (tech, science, journalism, or academia) and it’s been acutely embarrassing to read so many dismissive, rude, point-missing and point-scoring discussions instigated by seemingly intelligent people. I expect some of them are just misogynist bullies and trolls, and know exactly what they are doing – but I’m prepared to give some the benefit of the doubt and say that I think that there are men who may not realise that they are being unhelpful or dismissive or irrelevant or childish or hectoring or bullying. I’m mentioning this on what is nominally an academic blog because a lot of this seems to be from men who are middle-class and educated and sciencey, which superficially describes me pretty well, so perhaps I feel I recognise where some of their behaviour is coming from, which is all the more frustrating. That education and relative level of comfort doesn’t correlate with thoughtfulness, it would seem.

I’m not trying to be patronising, but there seem to be some ways in which they would get a lot more out of internet discussions by thinking about the way they interact. Here are ten I thought of for starters:

1) Don’t be a troll. If it helps, think of it this way: there is no such thing as a troll. There are bullies, there are people who tease other people to get a rise, there are people who are trying to play devils advocate or use humour to start an argument, there are misogynists angrily objecting to women calling out bad behaviour, and so on. Sometimes these overlap, but I don’t think any of these are very fun, useful, or worthwhile. The internet is not a contact sport and I think everyone has a worthwhile time when these stereotypes are given as wide a berth as possible. Know when you’re being playful and when you’re being a bore. TIP: people will tell you.

2) Your humour is not a get out of jail free card. Neither is “irony”. When did “I was using HUMOR” become synonymous with “don’t object to anything I just said”? Phrasing something “as if it it’s a joke” doesn’t give you carte blanche to say whatever you want, consequence-free. NB: the only consequence might be that everyone thinks you’re an idiot, or a disagreeable person, and ignores you; there will be no actual Thought Police kicking down your door, just a bunch of people wondering why you’re being unpleasant and not wishing to continue the conversation with you in it.

3) You’re probably not that funny. At least not to most people. That’s ok, humour is subjective, most people don’t find the things I say very funny either. I don’t mean you should stop having a sense of humour, just be aware that if doesn’t come across to someone it might not be their fault.

Newsflash: people use humour to say some very unpleasant things. Often this is to communicate something unpleasant in a palatable way. So, Doc, what you’re saying is, “Don’t buy green bananas”. Sometimes the teller thinks these unpleasant things are bad (cf. sexism, racism, homophobia…), but sometimes they’re saying these things because they believe or like what they express (cf. sexism, racism, homophobia…) and humour lets them pretend that they don’t really mean it if they get called. If there’s any ambiguity, it might be worth rephrasing what you’ve written in case everyone does think you want women to be chained to the cooker. There are people in the world who sincerely believe that, and being a Nice, Educated Chap doesn’t mean you’re automatically Not One of Those People. Indeed, it’s not unusual for Nice, Educated Chaps to do and say things that don’t seem very nice or educated at all.

4) Don’t derail. You might find it terribly interesting to raise the issue of what Evolutionary Psychology tells us about women having supple digits for manipulating dish scourers*, but it may not be what other commenters want to talk about. How about respecting that, and if people don’t seem interested in your tangent, take it somewhere else? You could write your own blog, and have the chat there if you like, what’s wrong with doing that? If people are interested, they can join in. In the same way that, when you comment, you’re taking part in a discussion on a topic that the blogger has chosen.


5) It’s not all about science. “Science” does not make every social situation easy to understand if we model it as a perfect sphere in a vacuum. You may feel like you’ve got a lever which moves the argument wonderfully, but simplifying a situation by ignoring important factors and claiming you’re being “scientific” or “rational” or “logical” can makes people feel like you’re disregarding or diminishing their comments without adding much else to the mix. Maybe you have a great, simplifying insight, or maybe those details you’ve left out, or not thought about, are pretty crucial.

While you’re being “scientific”, consider the evidence. All the evidence**. Ok, more of the evidence. Including the evidence being presented to you by the blogger, and other commenters, even if they’re not blokes and they’re not using the same approach as you or agreeing with you. You’ve already thought about which evidence is important, now think about what evidence other people think is important and why. Maybe you could ask them (nicely) if you don’t get why. It’s not their job to educate you, but you can always ask (nicely).

If you don’t actually think science is helping and you are just using it as a cheap point-scoring tactic, please stop, it’s so boring. No one likes a Sophist. The goal of a conversation is not to win points and level up†. It is not a boss fight.

6) Don’t make it about you. This is a very man-specific thing that women have pointed out to me again and again. It goes a bit like this:

Woman: [X] community has some crappy behaviour towards women
Man [from Community X]: Well, I don’t do that
Woman: I wasn’t addressing it to you – it’s a wider issue that women need to be aware of
Man: Yeah but I’m not doing it so why are you accusing me
Woman: I wasn’t, but it’s something that needs to change
Man: I don’t do that – why won’t you admit it?
And so on

Don’t be that man. It’s unpleasant to hear that members of a community you’re part of are doing something awful. That will produce some cognitive dissonance – maybe you’ll think “oh I know all those guys, they wouldn’t behave that way, it must have been misconstrued or fabricated”; well, consider the possibility that some people do behave in that way. It doesn’t take many people doing something horrible to have a disproportionate effect; it doesn’t mean everyone is behaving that way; and neither does that mean it should not be taken seriously. And if somebody says something unpleasant happened, there is every possibility that they aren’t lying or misconstruing and it is true. Bear in mind that unpleasant people can be clever, and if they have a lot of practice at doing whatever bad thing it is they enjoy doing, they are often quite skillful in leaving enough ambiguity in their behaviour that it can make even those directly affected question how they should be feeling. Usually the answer is far from positive.

Anyway – if you’re asking yourself “Is this person writing about me?” or “Does this apply to my behaviour?” – either the answer is yes, and you need to think about changing your behaviour, or the answer is no and you need to think about others’ behaviour.


7) Don’t expect automatically to be listened to or taken seriously. Everyone has the right to an opinion, and everyone has a right to ignore your opinion if they don’t think it’s helpful or especially well-informed. If you go to someone’s blog and comment ignorantly, divisively or tangentially on the subject, don’t expect anyone to care. If you’re not respectful of the person writing, why should they be respectful of you? If you think you’re making a valid point which is being ignored, nevermind. Worse things happen at sea. And to women and minorities every day. Withdraw gracefully. Not grumpily. You might want to chat to these people another day, even if they seem ill-disposed to do so today.

More generally,

8) Do be sincere. Please don’t treat someone’s discussion of an issue that upsets and impacts them as an opportunity to put on your Clever Hat and show off your knowledge of logical fallacies. (NB. Being sincere is not the same thing as being humourless).

9) Be forgiving. The internet is written is ink, and people make errors, whether factual, typographical, tonal or otherwise. Actually calling someone an idiot or otherwise being rude or patronising doesn’t give them anywhere to go if they do change their mind about their views. I’ve seen people be swayed by good, compassionate argument. People so often argue against things they know to be true – the cognitive dissonance of recognising a truth and not wanting to deal with the consequences of accepting that truth – that is quite a motivator. Learn to recognise it in yourself as well as seeing it in others.

10) “Oh but that’s how people act on the internet” is not an excuse. Sure, we behave differently in different contexts, I would never call Ayn Rand an idiot to her face*** (as I imply repeatedly on the world wide web), but that doesn’t mean we should expect people to behave cruelly, dismissively and rudely as a matter of course. Don’t do it, or excuse it. Lead by example.

Finally, I apologise again to readers who find this patronising or simplistic. If you find it either of those things, hopefully you’re going around not doing any of these things. This really is Internet 101 as far as I’m concerned, but I’ve seen so much that doesn’t manage to meet even these basic standards.  I’ve done more than one of these things in the past, I’ve certainly called people idiots I shouldn’t have, but they were bigger boys and they called me worse back so that’s ok. I doubt that the men who are being bullies on the internet will pay me much mind, but for those men who care about more than showing off – and I think that’s a lot of men reading – just chill out a bit and listen. You can do so much better and have much more interesting conversations and learn interesting and valuable things.

I’m guessing this won’t entirely solve the the Internet, but here’s to optimism.

*obviously this is nonsense, to be clear, I just made some nonsense up

**actually, you won’t be able to do that, but while you’re gathering All The Evidence In The World, we will get some peace and quiet

***she’s dead

†Unless you’re a character from an Ayn Rand novel. Then life is a debate you’ve conclusively won. Well done Dagny, you’re a Level 3 pain in the neck.

Big Social Data and Invasive Species

Spot the invasive species

Spot the invasive species

I just read Emma Uprichard’s excellent piece on big data in social sciences, I’d recommend doing the same. She argues persuasively that Big Data is not the panacea that will solve social ills, and drills into some specific concerns that social scientists might want to think about as the hype machine grinds into gear. There were a few points I wanted to address and reinforce in there.

I’ll start by saying that I don’t think there are consistent definitions of Big Data , and I’m ok with that. Big Data is something I’ve always seem defined functionally (“there sure is a lot of data”) and not structurally; for example, data with large dimension (information about lots of characteristics of some population), large scale (lots of members of a population) or high rate (“live” or frequently sampled data) could all generate massive datasets. I don’t think any one of these is a necessary or sufficient condition, and I’ve seen arguments in the past which elide some of these features, which can be problematic because each poses different problems. I thought the comparison with qualitative data was especially illuminating. Here, you may have a small number of subjects but a very rich “data” set around them (questionnaires, recorded interviews, and so on). This represents a very high-dimensional dataset around a smallish population. You don’t have to be a reductionist, of course – it may not be the best analysis method to try to convert interview data to purely quantitative data and do a regression. But if you want to, you can find big data all over. The UK census is pretty big.

This leads onto the question of expertise. Why are physical scientists/computer scientists/engineers doing this work? Because they have the technical skills. I have no doubt that in a generation, social scientists will graduate with the technical chops to do the machine learning, databasing, visualisation and so on to do it themselves (in fact, we train some of those people, at least at Masters level). In the meantime, transplanted physicists become naturalised in their social soil. I’m not terribly keen to be identified as a positivist invasive species – surely this route into social sciences is as valid as any other? Couldn’t we instead communicate to undergrad physical/computer scientists the value of their skills in social sciences, and encourage them to take on some of the ideas of these disciplines? So many physicists and engineers end up in the most dismal of the social sciences when they go and get finance jobs following graduation – wouldn’t in be good to snag some of those? The fact that so many people with this high-consensus training choose to cross over suggests that there is an appetite amongst hard scientists to work in these areas (I mean in academia – apparently the financial sector offers renumeration to entice physical scientists so they may not be as tempted). I’m not sure this needs to be viewed with such suspicion, even if you disagree on approach and methodology.

I don’t see the “methodological genocide” occurring that Dr Uprichard fears. Big Data self-evidently doesn’t have All The Answers. No one method can. Big Data’s not even a method, really. And there are plenty of important questions big data doesn’t ask, or effect change in response to. The article seems to be suggesting that sociologists need to be ready to argue back. Is that something sociologists are good at? I hadn’t noticed.*

There are some other bits of the article that I think are as true of little data as of big data. I wasn’t sure whether this was the point, but it’s certainly one worth making. Big data opens new questions and fills in detail for some older ones, but (like all data) it doesn’t predict, models and theories do that, and this is hard in social sciences, even with whizzy agent based models and suchlike. Compressing and reducing the data does tend to regress towards the mean – but as hinted, that also allows those who aren’t in the “mainstream” to be spotted. Often it’s these behaviours which are more interesting. The ethics of how and why this is done absolutely does need to be explored – but potentially, identifiying the majority allows you to chuck out that data and see interesting outliers. There are a lot of interesting quantitative techniques out there.

Data, and models, are an imperfect representation of the world. To Tukey’s “no data set is large enough to provide complete information about how it should be analysed”, we might add “no data set is large enough to describe the world we’re examining”. Data is filtered by experimental design, theoretical question, and increasingly, by the data that’s available. Data, models and analysis always need context and interpretation, to identify patterns, results and meaningless anomalies. Ironically, this is something good (natural) scientists and engineers do a lot of, too. But as the article pointed out, physical scientists aren’t used to the atoms changing their behaviour in response to their experiment**, or needing to persuade government that the results of their experiments require a change in policy**, or thinking about whether a study is ethical in the first place**. Big data won’t obliterate people interpreting things, but it might mean some of those people have (gasp) an engineering degree, or a social sciences degree that has a lot of things that make it look like a turn of the century stats or compsci degree. I’m actually rather hopeful about areas like big data, because they will allow people like me to learn a lot from sociologists. I think in asking “How and why is big data useful, and for whom?”, we will need all the expertise we can get.

*in the sweep of low-consensus subjects, I would have thought sociology as an example par excellence.

**well, they are, sort of