Who is the perfect Doctor?

Tardis

Photo: Toenex Lacey

*This is a bit of fun. Probably (see-definitely) not statistically sound.

In a live show hosted tonight on BBC1, Matt Smith’s replacement as the Doctor will be unveiled. But what can the Timelord’s past incarnations tell us the man (or woman) whose face will be revealed tonight will look like?

I used the age, height and hair colour of every Doctor to step out of the Tardis since William Hartnell in 1963 to find the “perfect Doctor”.

Screen Shot 2013-08-04 at 10.31.32

Sources: Celebheights, How tall is Doctor Who?

Firstly, they tend to have brown hair, which is not too unusual. Seven of the 12 previous Doctors (including John Hurt – Doctor Zero?) have been rocking some brown locks, with a brief blonde blip coming in the form of Peter Davison and Colin Baker in the 80s. Ginger does not even enter the equation.

The Doctor’s age is quite interesting, as allegedly Doctor Who’s writer Steven Moffat originally wanted an older actor than Smith to play the role.  If he was to go with past form in his latest casting, the median age of the actor when they first get to do their “ohmygod” regeneration face is 40.5.

Screen Shot 2013-08-04 at 10.34.58

(Click on graph for full interactive)

Historically the age began to slide down from William Hartnell’s 55 as soon as the regenerations started. Although it has mostly followed an older-younger-older-younger pattern since then.

I would personally rule out an older actor. Admittedly, Moffat did cast the youngest Doctor ever; Matt Smith was 27 when he donned his bow tie and natty boots. However, he has also since cast the oldest with 73 year old John Hurt being a whopping 32.5 years above average. Casting the 55 year old bookies’ favourite Peter Capaldi does not seem like much of a stretch when you put it in that perspective.

Finally, the third metric is height. The titan that was Tom Baker is still the tallest Doctor so far – at 191cm, it was not just the hair. Meanwhile, Scottish actor Sylvester McCoy, whose Doctor was known for his sinister game-playing, was the shortest at 168cm.

Since the show’s reboot with Christopher Eccleston all the Doctors have been around the 6 foot mark with David Tennant stacking up the highest at 185cm.

Screen Shot 2013-08-04 at 10.32.35

(Click on graph for full interactive)

6 foot seems to be the perfect height for the Doctor generally, with the average across the series’ history being 181.5cm. Timelords are tall-sorts-of-chaps aren’t they?

So – we have the dimensions of our perfect Doctor. They would be around 41 years of age, be about 181.5cm tall and probably have brown hair. So when we compare most of the favourites (according to William Hill) against those dimensions, who is the most likely to become Doctor?


Chris Addison

Photo: Stuart Crawford

Chris Addison! Yeah the comedian and the Thick of it actor is pretty much spot on the dimensions. At 41 years of age and 180cm tall, he is the best match from the top 14 list of potential Doctors, trouncing his former co-star and bookies’ favourite Peter Capaldi.

Screen Shot 2013-08-04 at 10.26.11
(
I could not find Rory Kinnear’s height listed anywhere)

Others who come close to the perfect age, height and hair colour are Sherlock star Benedict Cumberbatch and Chiwetel Eijofor.

Of course, this is not the perfect way to measure the next Doctor. Obviously it is skewed against the female and BME candidates for the role because every Doctor so far has been white and male. It also does not take into account past fame. Obviously, none of the Doctors so far have had the in-vogue fame of Benedict Cumberbatch, for example.

However, just taking what we have considered – it looks like a star from the Thick of It might just be about to rip an arsehole in the timestream with their sonic screwdriver, just not, um, Malcolm Tucker.

Why your graph should be as simple as possible (also motion charts)

A few weeks ago I attempted my first motion chart. I suspect that Rosling will go down as the David Bowie of statisticians, his alternative and exciting graphs breathing life into statistics.

But I am not Hans Rosling. And you are not Hans Rosling. There are a few reasons why motion charts are not often a good idea. Firstly, the way they need to be formatted is extremely time consuming. Here is an example:

Screen Shot 2013-05-10 at 12.53.55

In Google’s motion charts you need at least three columns. The date has to be in the first column, a dimension in the second column and anything can go in the third. One of the third or fourth has to be a number on which the change can be anchored. Click Google Charts wizard widget within the Google Spreadsheet and there you have it.

As Rosling’s graph shows it works really well when you can have two statistical measures because then two changes over time can be shown. Which is key. Because it’s a really overcomplex way of showing something quite simple. In almost every case a line chart would be better.

Simplicity works

In February I made a bubble chart showing the different modes of accidental death for both men and women. See below (won’t work on Chrome, only a javascript enabled browser like Firefox).

Accidental deaths by injury or poisoning registered in 2011 E+W

 

 

Paul Bradshaw gave me the following feedback:

 

Screen Shot 2013-05-10 at 13.21.27
Screen Shot 2013-05-10 at 13.21.37

 

Translation: don’t try and make something that hides the meaning. Now when I see visualisations that simply obfuscate the message for the point of flashiness I am simply quite wary.

Motion charts

Zach Gemignani did this excellent write up of when it is appropriate to use a motion chart and I pretty much concur with him.

Although animation looks nice and is impressive when you pull it off, with the wrong data it is simply confusing.

Also, you do not get the pleasure of Hans Rosling explaining what is going on for you while it happens.

Islington Now – and working on data in the newsroom

In March, I worked on Islington Now, a hyperlocal blog and unpublished newspaper that forms part of our MA timetable. Throughout the three weeks that it was running I worked on some data stories there and these are some of my reflections.

Geography

In the newsroom, geography is everything (H/T to Henry Taylor for that quote). As part of our team’s content strategy we wanted to make data pretty central to the Islington Now operation and we meant that literally.

The first week, when I was online editor, I made sure that the online team (all of whom were Interactive students) were in the earliest so we got to sit at the very centre of the newsroom.As that is where pitches and stories get thrown around – catching a passing idea may lead to a chance to do some data journalism. So, for example, we got to do some data work on the Islington Council’s budget, which went into the paper (but unfortunately was not published online).

It firmly embedded what we could do in the minds of the other student journalists. So, despite being moved further away from the others in week two (due to room changes), we then got approached quite a lot when anybody had a story that could have a potential data element.

Saying “no”

“You guys are good with numbers, right?”. On the one hand it is lovely to be approached by journalists with some numbers but often there simply is no point in using a graph to show something. For example, a pie chart showing that of someone’s fortune, the son gets half and the daughter gets half.

Screen Shot 2013-05-10 at 12.17.46

Useful!

Static visualisations should be just like journalism. In almost every case, you should get the angle and make sure that is the most prominent thing. Sometimes the numbers just do not have a way of being visualised that is of any use to the story.

Verifying numbers

During the second week, I was passed numbers from the National Housing Federation by reporter Ben Finch that showed their estimated number of people affected by bedroom tax. I used a simple formula to divide that by the population and to see how many people were affected per thousand residents. Islington came out as being the third worst hit borough and twice as bad as the London average.

I was pretty sure that I was tight on my numbers and the news team were pretty keen to make it the splash of the paper. But I just wanted to check the figures over with the NHF to make sure that what I was doing was a legitimate way to use the numbers. Also, I was concerned about the announcement that morning that there had been concessions made on who the tax would apply to. A quick phone call told me the data had been created as I had expected and we were able to run with this:
Islington hit twice as hard by bedroom tax

With a Google Fusion Map added in to make the story a little bit more interactive.

Sometimes you do work for no result

We were due to run some analysis on stabbings in Islington and I made this map of arrests relating to assaults with offensive weapons.

Screen Shot 2013-05-10 at 12.11.00

 

It did not take too long but I made it was almost ready for publication. Unfortunately the analysis got dropped because of something else and the map never had a vehicle to work with. This can happen wherever you are and you just have to accept that it’s not right for that time. There may have been another opportunity for us to use the map though within the remaining weeks, so it was probably worth doing in any case.

Using “separators and labels” in Outwit Hub pro – open notes

Screen Shot 2013-05-04 at 00.25.44

N.B Separator and labels are only available to users of Outwit Hub Pro – similar and probably better results can be achieved if you use regex (regular expressions) so I would try that out if you have a lot of time prior to covering an event. But if you are like I am (with limited regex knowledge) and wanting to get this information quickly, separator and labels are incredibly useful.

I wanted all the names of all the candidates in the Lincolnshire election. The council’s website includes this snazzy interactive map with all the information you want but it requires a lot of clickthroughs and copying and pasting if you were to get each name manually.

Naturally my first inclination was scraping with Outwit Hub. This is how I got on:

1. Click on any part of the map – I went for East Lindsey, a district bigger than the counties of Surrey, Hertfordshire, and Buckinghamshire.

Screen Shot 2013-05-03 at 23.25.36

Cripes.

2. You’ll see that you get a page with another map – click on one of the wards within it. I went for Louth Wolds

Screen Shot 2013-05-03 at 23.39.58

3. Now you can boot up Outwit Hub. Copy the link url: http://www.lincolnshire.gov.uk/ElectionsResultsDetail.aspx?division=6&locationGroup=144 and paste it into the browser bar. The same page should come up. On the left hand column, underneath automators, is an option saying scrapers. Click that and you should get this:
Screen Shot 2013-05-04 at 00.11.54

That’s HTML – almost all of you know that that is the code which says what to display on the page. If you don’t know anything about HTML…don’t panic.  You really don’t need to know any of what that code does to do the next part. All you need to know is that bits of that contain the bits of the text that you want.
Screen Shot 2013-05-04 at 00.27.54

Beneath the code is something that looks like this. Like I have done, write ward in row 1 column 1, just to try it out.

6. You need to find the bits of the page that you want. So we want the name of the ward, the name of all the candidates, their parties, their results and the overall turnout. Let’s start with the ward. Press CMD/CTRL+F to bring up the search box and look up “Louth Wolds” (minus inverted commas).

This will be what comes up first:

Screen Shot 2013-05-04 at 00.23.25

Now Louth Wolds does not stand out here among the wards and it should look like a very important place because it is the subject of the page. Press next on the search.

Screen Shot 2013-05-04 at 00.25.44

That that looks like the Louth Worlds you can see on the page, prefaced by Electoral Division Map and above the election results themselves.

7. As you can see at the bottom of the screen there is a column saying name – which you should have written ward in – marker before, marker after. If you put one bit of HTML in marker before and one bit of HTML in marker after, Outwit will scrape anything between the two. Powerful stuff.

Scraping is about knowing exactly what bit of the page you want. As all the electoral ward pages look pretty similar it should work on every page to get the right bit of information you want.

Here we want the ward name first – as you can see what comes before the “Louth Wolds” that we want is the following:

<div class="flash container sleeve">
<h2>Electoral Division Map -

Copy and paste that from the HTML (not from here) into the “marker before” column.

You generally don’t need to be as precise about the “marker after” but, just to be safe”, do the same and copy the

</h2>
<div class="elecflash">

after Louth Worlds.

8. If you press “Execute” now you should just get the name “Louth Wolds”. Minor success!

9. Now, let’s find the winners and their percentage. Search the page for “Marfleet” (i.e the winner of the seat – H. Marfleet).

Screen Shot 2013-05-04 at 08.33.21

There he is!

But this is where it gets a bit tricky. As you can see the code around B.P. Burnett is more or less identical to the coding around H. (Hugo) Marfleet. If we want Marfleet’s party, percentage etc. then we need to use another feature of Outwit Hub – separators and labels. If you’ve used delimeters in Excel they work in a similar way, splitting the information into the columns that you want. Start off by marking your second scraper row “Candidate”.
Put the bit before the candidates start in marker before:

<li=”first”>
<strong>

Look at the bottom of the list of candiates and you should find this code:

<li>
</ul>
</div>
</div>
</div>
</div>
<div id=”elec_map”>

Paste that in to marker after.

Now what Outwit Hub is currently doing is looking at all the code between what we have said are the markers before and markers after. The only unique identifier between each candidate is an <li> or line break (you can see it on code line 1294 in the picture above).

Simply put <li> in the separator column.

Labels will mark each column which you have separated. As the candidates are graded in descending order of vote we can put

“winner”, “runner up”

Press execute, name your scraper and and you should have something like this:

Screen Shot 2013-05-04 at 10.52.15

Success (we have all the information we want for each candidate in a way that we can clean up later. I also got the overall turnout/electorate but that its up to you.

The next bit is pretty specific for the Lincolnshire page but read on if you would like to find out how I got a workaround

10. To get all the links I needed was a little trickier than first anticipated. If you had a normal page with a series of links inside you should be able to pull out those links and scrape them automatically – unfortunately this is an animation and the links to each individual page are not available in the HTML. However, there is a way:

If you search for one of the other wards in the same original page we were scraping “Boston Coastal” – you will see it underneath some HTML saying “locationid” with each ward having an option value. So Boston Coastal’s option value is <option value=”99″>.

Look at the URL we had originally:

http://www.lincolnshire.gov.uk/ElectionsResultsDetail.aspx?division=6&locationGroup=144

Change that to simply:

http://www.lincolnshire.gov.uk/ElectionsResultsDetail.aspx?locationGroup=144

And you get the same result! Exciting. But what is more exciting is that the location ids are sequential from 94-170. I put a list of urls together (using Excel and concatenate but you can choose your own poison) and then published it on this website.

I then put the website page I had created into Outwit Hub and in the left hand menu column selected “links”

Screen Shot 2013-05-04 at 11.31.03

You get quite a few URLs but make good use of the “Catch” option that lies at the bottom of OutwitHub and put in:

“ElectionsResultsDetail.aspx”

All the links you want should now be selected in a lovely shade of lime green. Right click on them, press fast-scrape and apply the scraper you created earlier.

Now make a cup of tea while all 76 are scraped. Clean ’em up, put ’em together and here they are:

Get the data

…Well I had to clear it up a bit

Data byline: “Are Conservative MPs a load of posh boys?” – The Guardian

Screen Shot 2013-05-05 at 23.26.27
Yes…and how…

Maybe they need a visit to the thought camp to help them get back on track.

Seriously though this was quite interesting as I was sticking quite closely to the findings of an academic paper so I had to pick out the data that was most chart-a-ble from that. I had a few people tell me that Francis Maude was an Old-Etonian. He was not, he actually went to school in Cambridge – private one, mind.

Data byline: “Who works the most hours MPs or teachers?” – The Guardian

Screen Shot 2013-05-05 at 23.10.24

Probably the data byline I am happiest with so far – it was pretty fun to come out with an opposite conclusion to what I expected coming into it. I was sure it would upset some teachers but I was pretty careful with my numbers and, I think, fair to both MPs and teachers in how I analysed it. A few people criticised me in the comments for not including the amount of hours that teachers work during the holiday compared to MPs. But just to re-emphasise what I said in the article:

It was not clear in the Hansard Society research how much MPs were working during the recess but a similar survey in 1983 by SSRB found that MPs were doing a 42 hour week compared to the 69 hours they were doing while Westminster was sitting. It is also unclear how much teachers are working during their 13-week break.