The Language of Statistics

In November 2012 I wrote about the use of statistics in football, and the backlash against them, for the now defunct You can read the article in its original context via the Wayback Machine, or by scrolling down. Some of my examples feel a bit dated, but otherwise I’d stand by what I wrote.

Over recent years, the use of statistics in sport has become increasingly common. Match of the Day and Sky Sports routinely present us with the basic stats at the end of matches – shots, shots on targets, fouls, etc. These stats are useful, insightful to a point. Even if you see the stats at the bottom of a print media match report, you can get a sense of how balanced the match was, whether the winners deserved their win.

But every trend will inspire a counter-trend. A few weeks ago, the Mail on Sunday’s Chief Sports Writer Patrick Collins wrote the following:

Question: what do you do when you don’t really like sport but wish to convey an air of blokeish authority? Answer: you produce a statistic. Stats are what they serve up in gastro pubs and Premier League hospitality boxes. Always they are preceded by the crushing query: ‘Did you know?’

Each weekend yields a new and gloriously useless crop – the most ‘assists’, the greatest number of ‘flick-ons’ – and Saturday morning’s gem was up there with the best.

Did you know that West Bromwich have blocked more of their opponents’ shots than any other Premier League side this season? A total of 44. Just in front of Sunderland and QPR. How amazing is that? Yes, I’ll have another sandwich, please. Prawn, for preference.

It’s a reaction that isn’t just in place in our sporting culture – in Clint Eastwood’s latest film Trouble With the Curve, he plays an old school baseball scout, making recommendations of which youngsters have the potential to make it big based on observation and instinct. He has a rival, played by Matthew Lillard, who’s an expert number cruncher but doesn’t really understand in human terms what those stats mean. Critics seem to agree the sub-plot on the conflict between the two has a strong anti-Moneyball streak to it.

Are they right? Are statistics totally worthless?

I’ll detour slightly. Possession is often used as a measure of a team’s quality – a losing or drawing manager will cite his team’s superior possession as proof that they competed well and on another day, would have picked up a win. Barcelona’s 60%+ possession numbers are often used post-match to display how utter their dominance was. By contrast, Stoke City have still not, to the best of my knowledge, kept more than 50% of possession in any match in the four+ years since they rose into the Premier League. So does that mean that Stoke didn’t deserve any of their 49 league wins in that time?

Stoke City FC V Arsenal 34 by Ronnie Macdonald 2010-01-24.jpg
Stoke City’s Rory Delap in 2010, literally throwing the ball away. / Ronnie MacDonald, Wikimedia Commons

Rather than possession being a measure of how good a team is, it’s a measure of… how much of the ball the team had during the game. Obvious maybe. But the problem with statistics isn’t that they prove nothing, but that people overstate their value.

Stoke City’s modus operandi involves getting the ball forward quickly, taking on the opposition with pace, physicality and set-piece routines. This means that they rarely have the time to build up the possession percentage with passes around defence or in safe areas of the midfield. In basic terms, 51% of possession against Stoke is probably less impressive than 49% of possession against Swansea – a team of similar talent whose style of play does involve holding on to the ball. Effectively, statistics is a language, and you have to understand precisely what certain terms mean before you can wrap your head around the information they’re giving you.

So, going back to the stat quoted by Patrick Collins. Statistics are probably most useful (amongst fans) to people like myself – I’m not always a great reader of the game. To use West Brom as an example, I’m not completely sure why the Baggies are doing so well this year. I don’t mean to talk them down – Morrison, Gera and Brunt are really good playmakers and Odemwingie, Long and Lukaku are a pretty decent array of forward options. But I don’t really understand why they’ve done so much better than last year. I like watching them – they have a style that’s efficient, with a dash of flair. But I can’t see by myself why they’ve been more successful this year than last.

If we take Collins’ stat as truth, that by October 27th West Brom had collectively blocked 44 shots in eight matches, the best in the league, which shows us at least one of their strengths. Perhaps the Baggies have been putting in more last ditch tackles than last year? Maybe they’ve pushed forward in numbers more often, putting more pressure on the defence to make last gasp blocks, and their backline have risen to the challenge? Or maybe the defensive structure has fallen apart since Michael Appleton and Roy Hodgson departed in close succession and defenders are having to make desperate lunges to block shots from forwards who would have been prevented from receiving the ball this time last year? The trouble with statistics is that you need a lot of them to build up an all-round image of a match. Really, you’re probably better off treating statistics as a seasoning rather than the heart of a meal, using your eyes as the main method of insight into a game.

In the summer was set up, a website providing statistical analysis in real time via their Match Centre and writing statistically based blogs on a variety of football subjects. One recent feature on Luis Suarez grabbed my attention.

Judging by what I’ve seen, I’d consider Suarez a very selfish player. A few ridiculously long range shots (there was one from just inside the opposition half against Norwich) were eye-grabbing, and helped form this opinion. But statistics treat the eye-grabbing and the bland equally. Looking at Squawka’s statistical report on Suarez, a surprising number of his shots have been less than five yards from being on target, and he’s created far more ‘chances’ for his team-mates than I’d noticed.

After 11 games, the 33 ‘chances’ created by Suarez resulted in only two assists. This could mean that the opportunities he provides are only just good enough to count as such. But when added to the fact that Suarez is the only Liverpool player to score more than once in the league, the stats paint a picture of an all-round forward being let down by his wasteful team-mates. It’s this kind of all-round picture that makes statistics worthwhile and interesting.

That’s not to say that statistics are perfect.

Squawka have an algorithm to form an overall ‘Player Performance Rating’, which allows ratings to reach from minus numbers to well above fifty. It’s an approach that makes more sense to me than the finite ‘out of ten’ rating, which seems to assume that players should be judged against a scale of potential perfection. But it does lead to some surprising results.  In a recent article, putting together the best of both teams on Merseyside, the midfield three who were most impressive statistically didn’t include Maroune Fellaini, the outstanding midfield player for either side this season. (He did come in fourth, behind Joe Allen, Steven Gerrard and Leon Osman.)

Among the statistics covered by Squawka are ‘Squawka Awards‘ for individual players, positive and negative. Of the three central midfielders chosen, only Allen (seven) has more positive awards than Fellaini (six), but Fellaini is the only one of the four to have more than one negative award (eight). Allen’s awards are for either the overall ‘Top man in team’ (three) or his specialist area of ‘Best possession’ (four). Fellaini has six positive awards, two each for ‘Best attack’; ‘Top man in team’ and ‘Man of the match’. But he has eight negative awards, two each for ‘Worst defence’; ‘Worst possession’; ‘Worst man in team’ and ‘Worst man of match’. This surprised me, as, while I’m only an occasional Everton watcher, Fellaini has always struck me over the years as being both a strong defensive presence and a tidy passer.

Clint Dempsey and Fellaini by Erik Drost, Flickr 2013-05-29
Dempsey and Fellaini levitate a football with their minds. / Erik Drost, Flickr

Combining the stats and what I’ve seen, it seems that Everton’s tactical system allows Fellaini to be a specialist attacking midfielder, free from some of the more practical duties, while the likes of Leon Osman provide a solid base behind him. If this is so, how would you statistically tell the difference between players who are told to specialise and have teammates ready to make up for his specialisation, or a player who through laziness neglects the duties he’s been assigned but doesn’t want to do?It’s difficult to say with any sort of factual basis which of the two performance types, Fellaini or Osman, is ‘better’. Players capable of specialising to the extent Fellaini has this season are rarer but they need a reliable backup to free them to do their stuff. Ironically, it’s usually the Osman type of player who goes under-rated, but it’d seem that Squawka’s Player Performance Rating (and probably all statistical models) under-rate the specialists.

Nic English, Head of Data and Content at Squawka, acknowledges that this bias exists in their calculations.

“A lot of the arguments we make with the Performance Score relates to what a team needs from its players, that’s why the score is broken down into Attack, Possession and Defence. As an example: Jelavic last season only scored goals, he did nothing else for his team and that’s all Everton needed from him; it doesn’t mean he’s a bad player, it just means that he has a role. Players like Leighton Baines have numerous responsibilities for their team and do them very very well so have a higher overall score by contributing in many ways.”

Any language can be ambiguous and confusing in a sufficiently small sample size. The whole genre of farce relies on this fact – it doesn’t prove that English is an awful language for communicating in. I don’t think it’s possible to put together a set of statistics to say definitively which of two players of different styles is better than the other. And even statistics that compare two similar players can be misleading. But statistics, when used in an intelligent way, can give us another angle to look at the game from, to draw our attention to minutiae we might otherwise have missed.

2017-10-18 The Language of Statistics v2.png


1 thought on “The Language of Statistics”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s