Why your graph should be as simple as possible (also motion charts)

A few weeks ago I attempted my first motion chart. I suspect that Rosling will go down as the David Bowie of statisticians, his alternative and exciting graphs breathing life into statistics.

But I am not Hans Rosling. And you are not Hans Rosling. There are a few reasons why motion charts are not often a good idea. Firstly, the way they need to be formatted is extremely time consuming. Here is an example:

Screen Shot 2013-05-10 at 12.53.55

In Google’s motion charts you need at least three columns. The date has to be in the first column, a dimension in the second column and anything can go in the third. One of the third or fourth has to be a number on which the change can be anchored. Click Google Charts wizard widget within the Google Spreadsheet and there you have it.

As Rosling’s graph shows it works really well when you can have two statistical measures because then two changes over time can be shown. Which is key. Because it’s a really overcomplex way of showing something quite simple. In almost every case a line chart would be better.

Simplicity works

In February I made a bubble chart showing the different modes of accidental death for both men and women. See below (won’t work on Chrome, only a javascript enabled browser like Firefox).

Accidental deaths by injury or poisoning registered in 2011 E+W



Paul Bradshaw gave me the following feedback:


Screen Shot 2013-05-10 at 13.21.27
Screen Shot 2013-05-10 at 13.21.37


Translation: don’t try and make something that hides the meaning. Now when I see visualisations that simply obfuscate the message for the point of flashiness I am simply quite wary.

Motion charts

Zach Gemignani did this excellent write up of when it is appropriate to use a motion chart and I pretty much concur with him.

Although animation looks nice and is impressive when you pull it off, with the wrong data it is simply confusing.

Also, you do not get the pleasure of Hans Rosling explaining what is going on for you while it happens.


Islington Now – and working on data in the newsroom

In March, I worked on Islington Now, a hyperlocal blog and unpublished newspaper that forms part of our MA timetable. Throughout the three weeks that it was running I worked on some data stories there and these are some of my reflections.


In the newsroom, geography is everything (H/T to Henry Taylor for that quote). As part of our team’s content strategy we wanted to make data pretty central to the Islington Now operation and we meant that literally.

The first week, when I was online editor, I made sure that the online team (all of whom were Interactive students) were in the earliest so we got to sit at the very centre of the newsroom.As that is where pitches and stories get thrown around – catching a passing idea may lead to a chance to do some data journalism. So, for example, we got to do some data work on the Islington Council’s budget, which went into the paper (but unfortunately was not published online).

It firmly embedded what we could do in the minds of the other student journalists. So, despite being moved further away from the others in week two (due to room changes), we then got approached quite a lot when anybody had a story that could have a potential data element.

Saying “no”

“You guys are good with numbers, right?”. On the one hand it is lovely to be approached by journalists with some numbers but often there simply is no point in using a graph to show something. For example, a pie chart showing that of someone’s fortune, the son gets half and the daughter gets half.

Screen Shot 2013-05-10 at 12.17.46


Static visualisations should be just like journalism. In almost every case, you should get the angle and make sure that is the most prominent thing. Sometimes the numbers just do not have a way of being visualised that is of any use to the story.

Verifying numbers

During the second week, I was passed numbers from the National Housing Federation by reporter Ben Finch that showed their estimated number of people affected by bedroom tax. I used a simple formula to divide that by the population and to see how many people were affected per thousand residents. Islington came out as being the third worst hit borough and twice as bad as the London average.

I was pretty sure that I was tight on my numbers and the news team were pretty keen to make it the splash of the paper. But I just wanted to check the figures over with the NHF to make sure that what I was doing was a legitimate way to use the numbers. Also, I was concerned about the announcement that morning that there had been concessions made on who the tax would apply to. A quick phone call told me the data had been created as I had expected and we were able to run with this:
Islington hit twice as hard by bedroom tax

With a Google Fusion Map added in to make the story a little bit more interactive.

Sometimes you do work for no result

We were due to run some analysis on stabbings in Islington and I made this map of arrests relating to assaults with offensive weapons.

Screen Shot 2013-05-10 at 12.11.00


It did not take too long but I made it was almost ready for publication. Unfortunately the analysis got dropped because of something else and the map never had a vehicle to work with. This can happen wherever you are and you just have to accept that it’s not right for that time. There may have been another opportunity for us to use the map though within the remaining weeks, so it was probably worth doing in any case.

Using “separators and labels” in Outwit Hub pro – open notes

N.B Separator and labels are only available to users of Outwit Hub Pro – similar and probably better results can be achieved if you use regex (regular expressions) so I would try that out if you have a lot of time prior to covering an event. But if you are like I am (with limited regex knowledge) and wanting to get this information quickly, separator and labels are incredibly useful.

I wanted all the names of all the candidates in the Lincolnshire election. The council’s website includes this snazzy interactive map with all the information you want but it requires a lot of clickthroughs and copying and pasting if you were to get each name manually.

Naturally my first inclination was scraping with Outwit Hub. This is how I got on:

1. Click on any part of the map – I went for East Lindsey, a district bigger than the counties of Surrey, Hertfordshire, and Buckinghamshire.

Screen Shot 2013-05-03 at 23.25.36


2. You’ll see that you get a page with another map – click on one of the wards within it. I went for Louth Wolds

Screen Shot 2013-05-03 at 23.39.58

3. Now you can boot up Outwit Hub. Copy the link url: http://www.lincolnshire.gov.uk/ElectionsResultsDetail.aspx?division=6&locationGroup=144 and paste it into the browser bar. The same page should come up. On the left hand column, underneath automators, is an option saying scrapers. Click that and you should get this:
Screen Shot 2013-05-04 at 00.11.54

That’s HTML – almost all of you know that that is the code which says what to display on the page. If you don’t know anything about HTML…don’t panic.  You really don’t need to know any of what that code does to do the next part. All you need to know is that bits of that contain the bits of the text that you want.
Screen Shot 2013-05-04 at 00.27.54

Beneath the code is something that looks like this. Like I have done, write ward in row 1 column 1, just to try it out.

6. You need to find the bits of the page that you want. So we want the name of the ward, the name of all the candidates, their parties, their results and the overall turnout. Let’s start with the ward. Press CMD/CTRL+F to bring up the search box and look up “Louth Wolds” (minus inverted commas).

This will be what comes up first:

Screen Shot 2013-05-04 at 00.23.25

Now Louth Wolds does not stand out here among the wards and it should look like a very important place because it is the subject of the page. Press next on the search.

Screen Shot 2013-05-04 at 00.25.44

That that looks like the Louth Worlds you can see on the page, prefaced by Electoral Division Map and above the election results themselves.

7. As you can see at the bottom of the screen there is a column saying name – which you should have written ward in – marker before, marker after. If you put one bit of HTML in marker before and one bit of HTML in marker after, Outwit will scrape anything between the two. Powerful stuff.

Scraping is about knowing exactly what bit of the page you want. As all the electoral ward pages look pretty similar it should work on every page to get the right bit of information you want.

Here we want the ward name first – as you can see what comes before the “Louth Wolds” that we want is the following:

<div class="flash container sleeve">
<h2>Electoral Division Map -

Copy and paste that from the HTML (not from here) into the “marker before” column.

You generally don’t need to be as precise about the “marker after” but, just to be safe”, do the same and copy the

<div class="elecflash">

after Louth Worlds.

8. If you press “Execute” now you should just get the name “Louth Wolds”. Minor success!

9. Now, let’s find the winners and their percentage. Search the page for “Marfleet” (i.e the winner of the seat – H. Marfleet).

Screen Shot 2013-05-04 at 08.33.21

There he is!

But this is where it gets a bit tricky. As you can see the code around B.P. Burnett is more or less identical to the coding around H. (Hugo) Marfleet. If we want Marfleet’s party, percentage etc. then we need to use another feature of Outwit Hub – separators and labels. If you’ve used delimeters in Excel they work in a similar way, splitting the information into the columns that you want. Start off by marking your second scraper row “Candidate”.
Put the bit before the candidates start in marker before:


Look at the bottom of the list of candiates and you should find this code:

<div id=”elec_map”>

Paste that in to marker after.

Now what Outwit Hub is currently doing is looking at all the code between what we have said are the markers before and markers after. The only unique identifier between each candidate is an <li> or line break (you can see it on code line 1294 in the picture above).

Simply put <li> in the separator column.

Labels will mark each column which you have separated. As the candidates are graded in descending order of vote we can put

“winner”, “runner up”

Press execute, name your scraper and and you should have something like this:

Screen Shot 2013-05-04 at 10.52.15

Success (we have all the information we want for each candidate in a way that we can clean up later. I also got the overall turnout/electorate but that its up to you.

The next bit is pretty specific for the Lincolnshire page but read on if you would like to find out how I got a workaround

10. To get all the links I needed was a little trickier than first anticipated. If you had a normal page with a series of links inside you should be able to pull out those links and scrape them automatically – unfortunately this is an animation and the links to each individual page are not available in the HTML. However, there is a way:

If you search for one of the other wards in the same original page we were scraping “Boston Coastal” – you will see it underneath some HTML saying “locationid” with each ward having an option value. So Boston Coastal’s option value is <option value=”99″>.

Look at the URL we had originally:


Change that to simply:


And you get the same result! Exciting. But what is more exciting is that the location ids are sequential from 94-170. I put a list of urls together (using Excel and concatenate but you can choose your own poison) and then published it on this website.

I then put the website page I had created into Outwit Hub and in the left hand menu column selected “links”

Screen Shot 2013-05-04 at 11.31.03

You get quite a few URLs but make good use of the “Catch” option that lies at the bottom of OutwitHub and put in:


All the links you want should now be selected in a lovely shade of lime green. Right click on them, press fast-scrape and apply the scraper you created earlier.

Now make a cup of tea while all 76 are scraped. Clean ’em up, put ’em together and here they are:

Get the data

…Well I had to clear it up a bit

Data byline: “Are Conservative MPs a load of posh boys?” – The Guardian

Screen Shot 2013-05-05 at 23.26.27
Yes…and how…

Maybe they need a visit to the thought camp to help them get back on track.

Seriously though this was quite interesting as I was sticking quite closely to the findings of an academic paper so I had to pick out the data that was most chart-a-ble from that. I had a few people tell me that Francis Maude was an Old-Etonian. He was not, he actually went to school in Cambridge – private one, mind.

Data byline: “Who works the most hours MPs or teachers?” – The Guardian

Screen Shot 2013-05-05 at 23.10.24

Probably the data byline I am happiest with so far – it was pretty fun to come out with an opposite conclusion to what I expected coming into it. I was sure it would upset some teachers but I was pretty careful with my numbers and, I think, fair to both MPs and teachers in how I analysed it. A few people criticised me in the comments for not including the amount of hours that teachers work during the holiday compared to MPs. But just to re-emphasise what I said in the article:

It was not clear in the Hansard Society research how much MPs were working during the recess but a similar survey in 1983 by SSRB found that MPs were doing a 42 hour week compared to the 69 hours they were doing while Westminster was sitting. It is also unclear how much teachers are working during their 13-week break.

Data byline: “How does the London Marathon compare to other races worldwide” – The Guardian

Screen Shot 2013-05-05 at 23.04.37

I wrote some of this and also put together the basis for the Tableau visualisation at the bottom.

What I found quite interesting was how the larger marathons seemed to spread by continent. They were being founded in the US sporadically for the first half of the twentieth century and then suddenly there was a rush in the late 70s and early 80s in Europe. A similar thing happened later on for Asia.

The sample size is not that big because obviously there are not that many marathons with over 10,000 people running but it’s still quite a nice thing to watch on the map. I could get a Tableau animation in worksheet view but not in dashboard view. Anybody else have this problem?