It has been great fun blogging about soccer statistics, but now it's time for a new hobby project. Thanks for following me.
Tuesday, January 1, 2013
Saturday, October 6, 2012
Is number of cards related to league position?
Haven't got much time these days, but I think it's about time to deliver a new post.
Inspired by an article from the homepage of my absolute favourite soccer magazine http://www.offside.org/de-tre-redaktorerna/2012/snallast-vinner-ibland?page=2 (Swedish), I decided to investigate the relationship between number of cards and league position. The article doesn't state any historical relationship, but gives concrete examples from last season, which indicates that the pattern could differ between leagues.
Once again I have looked at my favourite league, the English Premier League. I have collected cards data (yellow + red) for the last 17 seasons (1995-2012) from www.statbunker.com. The graphical representation of cards against league position is following:
Below you see the average number of cards for each league position.
The best linear fit is shown in the figures. The confidence interval for the position coefficient is (0.2305, 0.6669), which clearly rejects the hypothesis of no relation between position and number of cards. The assumptions of the linear model is verified by graphical inspection of the residuals, which indicates that the distribution of number of cards for each league position is approximately normal and has same variance.
The conclusion is that there exist a linear relation between league position and number of cards given in the English Premier League, and on seasonal average, bottom teams receive more cards than top teams.
But how about the distribution of cards between teams? Is there significant differences between teams?
43 teams has taken part in the English Premier League between 1995 and 2012. 30 of these 43 teams has played 3 or more seasons in the period, and they are represented in below figure.
If you find the details interesting, feel free to look into the team data below. Notice that among the earlier big four, that Chelsea and Arsenal on average have received 10-15 more cards per season than Liverpool and Manchester United.
The seasonal card distribution clearly differ from team to team as we see from the averages, and a boxplot of the two extreme observations makes it clear that there are differences between (at least) some of the teams.
Inspired by an article from the homepage of my absolute favourite soccer magazine http://www.offside.org/de-tre-redaktorerna/2012/snallast-vinner-ibland?page=2 (Swedish), I decided to investigate the relationship between number of cards and league position. The article doesn't state any historical relationship, but gives concrete examples from last season, which indicates that the pattern could differ between leagues.
Once again I have looked at my favourite league, the English Premier League. I have collected cards data (yellow + red) for the last 17 seasons (1995-2012) from www.statbunker.com. The graphical representation of cards against league position is following:
Below you see the average number of cards for each league position.
The best linear fit is shown in the figures. The confidence interval for the position coefficient is (0.2305, 0.6669), which clearly rejects the hypothesis of no relation between position and number of cards. The assumptions of the linear model is verified by graphical inspection of the residuals, which indicates that the distribution of number of cards for each league position is approximately normal and has same variance.
The conclusion is that there exist a linear relation between league position and number of cards given in the English Premier League, and on seasonal average, bottom teams receive more cards than top teams.
But how about the distribution of cards between teams? Is there significant differences between teams?
43 teams has taken part in the English Premier League between 1995 and 2012. 30 of these 43 teams has played 3 or more seasons in the period, and they are represented in below figure.
If you find the details interesting, feel free to look into the team data below. Notice that among the earlier big four, that Chelsea and Arsenal on average have received 10-15 more cards per season than Liverpool and Manchester United.
Team | Cards/S | Seasons |
Arsenal | 64,82 | 17 |
Aston Villa | 58,47 | 17 |
Chelsea | 67,06 | 17 |
Everton | 65,18 | 17 |
Liverpool | 52,88 | 17 |
Manchester United | 55,47 | 17 |
Tottenham Hotspur | 58,88 | 17 |
Newcastle United | 59,63 | 16 |
Blackburn Rovers | 72,80 | 15 |
West Ham United | 69,86 | 14 |
Bolton Wanderers | 68,92 | 13 |
Middlesbrough | 69,31 | 13 |
Manchester City | 61,58 | 12 |
Fulham | 56,82 | 11 |
Sunderland | 70,45 | 11 |
Southampton | 59,10 | 10 |
Leeds United | 76,22 | 9 |
Charlton Athletic | 53,00 | 8 |
Birmingham City | 65,29 | 7 |
Derby County | 77,57 | 7 |
Leicester City | 58,43 | 7 |
Portsmouth | 61,43 | 7 |
Wigan Athletic | 67,71 | 7 |
Coventry City | 67,33 | 6 |
West Bromwich Albion | 54,00 | 6 |
Sheffield Wednesday | 51,20 | 5 |
Wimbledon | 53,00 | 5 |
Stoke City | 70,00 | 4 |
Wolverhampton Wanderers | 67,50 | 4 |
Nottingham Forest | 69,00 | 3 |
Bradford City | 55,00 | 2 |
Crystal Palace | 65,50 | 2 |
Hull City | 72,00 | 2 |
Ipswich Town | 38,50 | 2 |
Norwich City | 48,50 | 2 |
Queens Park Rangers | 73,50 | 2 |
Reading | 52,50 | 2 |
Watford | 60,00 | 2 |
Barnsley | 73,00 | 1 |
Blackpool | 49,00 | 1 |
Burnley | 58,00 | 1 |
Sheffield United | 70,00 | 1 |
Swansea City | 43,00 | 1 |
The seasonal card distribution clearly differ from team to team as we see from the averages, and a boxplot of the two extreme observations makes it clear that there are differences between (at least) some of the teams.
Sunday, August 19, 2012
Which part of the league table has the most goals? Top or bottom?
My thesis was that top teams of a league account for the majority of the big wins during a season, and the bottom teams the majority of big losses. And the approach should be something like this: Take top 3 (or 5) and bottom 3 (or 5) of each years end league table, sum goals for and goals against to obtain a total goal count for each team, remove goals scored in matches between the selected teams and analyze the distribution of the remaining goals for the top and bottom sides.
I chose to analyze the English Premier League, and I experienced that a great deal of the teams with high total goal counts actually was middle teams. It was of course not surprising to see some middle teams with high total goal count, but the number of occurrences was to great to neglect. Therefore, I took a different approach: Take the top 6 total goal counts of each season and analyze the league positions of these teams (in case of ties between the 6th and the 7th highest total goal count, I brought in both teams to the data).
Here's the 2011-2012 data.
*GF=goals for, GA=goals against, GT=goals total
The table shows that Blackburn finished 19th in the league table, scoring 48 goals and conceding 78 made them the team with highest total goals of the season. The top-6 total goal positions were this year top-3 and bottom-3 in the league table.
Collecting corresponding data for the 17 seasons between 1995 and 2012 gives following distribution of top-6 total goal occurrences for each league table position (the parabola forms the best 2nd degree fit).
The result is not surprising. It simply states that top and bottom teams more often has the highest total goal count than the rest of the league table.
If we re-categorize (sum) the positions in to 3 groups (top, middle and bottom), we obtain measures that are easier to compare in order to answer the initial question. As suggested above, I look at top/bottom-3 and top/bottom-5. The aggregated results are following.
As we can see from the aggregates, the number of top and bottom occurrences are almost identical, indicating that on average, the same number of top and bottom teams is in the top-6 of total goals for the season. The Average TG column indicates that the top of the table on average accounts for more goals than bottom, but Tukey corrected confidence intervals for pairwise differences between the 3 groups (top, middle, bottom) shows that average of top and bottom don't differ significantly (p-value of 0.2840 and 0.2346 respectively). That is, we can assume that top and bottom accounts for the same number of goals (middle is significantly different from top).
The source of the data is www.statto.com, and the highest total goal counts since 1995 are following.
Stating that top-teams so far accounts for 3 of 4 or 5 of 10 of the overall highest total goal counts since 1995 in the English Premier League.
I chose to analyze the English Premier League, and I experienced that a great deal of the teams with high total goal counts actually was middle teams. It was of course not surprising to see some middle teams with high total goal count, but the number of occurrences was to great to neglect. Therefore, I took a different approach: Take the top 6 total goal counts of each season and analyze the league positions of these teams (in case of ties between the 6th and the 7th highest total goal count, I brought in both teams to the data).
Here's the 2011-2012 data.
Pos. | Team | GF* | GA* | GT* |
19 | Blackburn Rovers | 48 | 78 | 126 |
3 | Arsenal | 74 | 49 | 123 |
18 | Bolton Wanderers | 46 | 77 | 123 |
1 | Manchester City | 93 | 29 | 122 |
2 | Manchester United | 89 | 33 | 122 |
20 | Wolverhampton Wndrs | 40 | 82 | 122 |
The table shows that Blackburn finished 19th in the league table, scoring 48 goals and conceding 78 made them the team with highest total goals of the season. The top-6 total goal positions were this year top-3 and bottom-3 in the league table.
Collecting corresponding data for the 17 seasons between 1995 and 2012 gives following distribution of top-6 total goal occurrences for each league table position (the parabola forms the best 2nd degree fit).
The result is not surprising. It simply states that top and bottom teams more often has the highest total goal count than the rest of the league table.
If we re-categorize (sum) the positions in to 3 groups (top, middle and bottom), we obtain measures that are easier to compare in order to answer the initial question. As suggested above, I look at top/bottom-3 and top/bottom-5. The aggregated results are following.
Category | Occ. | Average TG |
Top (1-3) | 25 | 117,48 |
Middle | 56 | 109,32 |
Bottom (18-20) | 24 | 113,96 |
Category | Occ. | Average TG |
Top (1-5) | 35 | 115,31 |
Middle | 37 | 109,84 |
Bottom (16-20) | 33 | 111,9 |
As we can see from the aggregates, the number of top and bottom occurrences are almost identical, indicating that on average, the same number of top and bottom teams is in the top-6 of total goals for the season. The Average TG column indicates that the top of the table on average accounts for more goals than bottom, but Tukey corrected confidence intervals for pairwise differences between the 3 groups (top, middle, bottom) shows that average of top and bottom don't differ significantly (p-value of 0.2840 and 0.2346 respectively). That is, we can assume that top and bottom accounts for the same number of goals (middle is significantly different from top).
The source of the data is www.statto.com, and the highest total goal counts since 1995 are following.
Season | Pos. | Team | GF | GA | GT |
1999-2000 | 1 | Manchester United | 97 | 45 | 142 |
2009-2010 | 1 | Chelsea | 103 | 32 | 135 |
2010-2011 | 19 | Blackpool | 55 | 78 | 133 |
2001-2002 | 3 | Manchester United | 87 | 45 | 132 |
2010-2011 | 11 | West Bromwich Albion | 56 | 71 | 127 |
2007-2008 | 11 | Tottenham Hotspur | 66 | 61 | 127 |
2002-2003 | 2 | Arsenal | 85 | 42 | 127 |
2011-2012 | 19 | Blackburn Rovers | 48 | 78 | 126 |
2001-2002 | 4 | Newcastle United | 74 | 52 | 126 |
1995-1996 | 14 | Wimbledon | 55 | 70 | 125 |
Stating that top-teams so far accounts for 3 of 4 or 5 of 10 of the overall highest total goal counts since 1995 in the English Premier League.
Monday, August 13, 2012
Are midfielders more likely to become managers than other players?
Is it right to assume that midfielders have more tactical knowledge than the rest of the team? And is it right to assume that there is more manager potential in a midfield player?
It is a hard task to find out what position current managers preferred to play in their professional playing careers. But I have done the research on the English Premier League Managers of today's teams. Most of the data is from Wikipedia.
*P.P. is short for preferred position. Managers without p.p. have no or very short professional playing career.
The expected proportion of midfielders should be calculated in order to do any comparison. A simple (and perhaps not right) way to do this, is to take look at the most preferred playing systems and count the midfielders. Without any references, I have chosen following systems (4-4-2, 4-5-1, 3-5-2, 4-3-3), which I hope is representative of the game 10-30 years ago.
I have calculated the expected proportions without weighing the systems. For example, the expected proportion of midfielders is summed over the systems (4+5+5+3)/44 = 0.3864.
A proportion test, 8 of 17 being equal to 0.3864 gives a p-value of 0.6428 (CI: 0.2386-0.7147), clearly accepting the hypothesis of equal proportion, which means that midfielders aren't more likely to become managers than other players.
Proportion test of the 4 counts against the expected proportions is done with Pearson's chi-squared goodness of fit test, and gives a p-value of 0.8927, indicating that the playing position distribution is highly expected. The sample size is of course too small to draw any final conclusions and the assumptions of the test is not really fulfilled, but the results so far, give very strong indications.
What about captains? Are they more likely to become managers than non-captains? The expected proportion of captain players I will set to 1/11, other estimates are very welcome. Proportion test, 5 of 17 being equal to 1 of 11 gives a p-value of 0.0127 (CI: 0.1138-0.5595) indicating that captains are more likely to become managers than non-captain players.
It is a hard task to find out what position current managers preferred to play in their professional playing careers. But I have done the research on the English Premier League Managers of today's teams. Most of the data is from Wikipedia.
Team | Manager | Highest League | P.P. | Captain |
Arsenal | Arsene Wenger | |||
Aston Villa | Paul Lambert | First div. Germany | CDM | Lower league |
Chelsea | Roberto Di Matteo | First div. England | CDM | No |
Everton | David Moyes | Third div. England | CD | No |
Fulham | Martin Jol | First div. England | OCM | No |
Liverpool | Brendan Rodgers | |||
Manchester City | Roberto Mancini | First div. Italy | F | No |
Manchester United | Sir Alex Ferguson | First div. Scotland | F | No |
Newcastle United | Alan Pardew | First div. England | CM | No |
Norwich City | Chris Hughton | First div. England | LD | No |
Queens Park Rangers | Mark Hughes | First div. England | F | No |
Reading | Brian McDermott | First div. England | RM | No |
Southampton | Nigel Adkins | Third div. England | GK | No |
Stoke City | Tony Pulis | Second div. England | CD | No |
Sunderland | Martin O'Neill | First div. England | CM | National team |
Swansea City | Michael Laudrup | First div. Spain | OCM | National team |
Tottenham Hotspur | Andre Villas-Boas | |||
West Bromwich Albion | Steve Clarke | First div. England | RD | No |
West Ham United | Sam Allardyce | First div. England | CD | Lower league |
Wigan Athletic | Roberto Martinez | Third div. England | CDM | Yes |
The expected proportion of midfielders should be calculated in order to do any comparison. A simple (and perhaps not right) way to do this, is to take look at the most preferred playing systems and count the midfielders. Without any references, I have chosen following systems (4-4-2, 4-5-1, 3-5-2, 4-3-3), which I hope is representative of the game 10-30 years ago.
I have calculated the expected proportions without weighing the systems. For example, the expected proportion of midfielders is summed over the systems (4+5+5+3)/44 = 0.3864.
Position | Count | Percent | Expected |
Goal Keeper | 1 | 5,88 | 9,09 |
Defender | 5 | 29,41 | 34,09 |
Midfielder | 8 | 47,06 | 38,64 |
Forward | 3 | 17,65 | 18,18 |
17 | 100,00 | 100,00 |
A proportion test, 8 of 17 being equal to 0.3864 gives a p-value of 0.6428 (CI: 0.2386-0.7147), clearly accepting the hypothesis of equal proportion, which means that midfielders aren't more likely to become managers than other players.
Proportion test of the 4 counts against the expected proportions is done with Pearson's chi-squared goodness of fit test, and gives a p-value of 0.8927, indicating that the playing position distribution is highly expected. The sample size is of course too small to draw any final conclusions and the assumptions of the test is not really fulfilled, but the results so far, give very strong indications.
What about captains? Are they more likely to become managers than non-captains? The expected proportion of captain players I will set to 1/11, other estimates are very welcome. Proportion test, 5 of 17 being equal to 1 of 11 gives a p-value of 0.0127 (CI: 0.1138-0.5595) indicating that captains are more likely to become managers than non-captain players.
Monday, July 23, 2012
Did Wiggins do better than Froome in the Tour?
Le Tour de France 2012 finished yesterday with the usual rounds on Champs-Élysées, and not surprisingly, it all came down to a bunch sprint, where Mark Cavendish proved to be the fastest.
Unfortunately I haven't been able to watch very much of the tour this year, but I have read a couple of times that the sky team has done incredibly well and that many people were wondering if young Chris Froome could have taken the overall first prize if his classification role in the team wasn't to help Bradley Wiggins to the victory. It looks like Wiggins settled the discussion Saturday on the last time trial of the tour, where he won with a solid margin of 1:16 to Froome.
But let's put the results to a quick test. I have collected Wiggins and Froomes time differences (in seconds) to the winning time of each stage. The source is http://www.dr.dk/Sporten/Tour_de_France/ and here is the resulting data.
The one sided paired wilcoxon signed rank test, testing for true location shift less than zero gives a p-value of 0.07056 > 0.05, which on a 95% confident measure means that Wiggins didn't do significantly better than Froome in this year's tour.
The p-value however, is so close to the rejection region, that only one more stage time difference with Wiggins being 3 seconds (or more) faster than Froome, would have made his performance significantly better (according to the test). On the contrary, if Froome wasn't caught up in the middle of the run of crashes within the last 25 kilometers of stage 1, then he probably wouldn't have gotten the 85 seconds time difference to Wiggins, and leaving out that single observation gives a more reliably p-value (0.1393).
Notice that the test only consider the observations different from zero (the ties are left out), which means that the actual number of observations for the test is down to only 6.
Unfortunately I haven't been able to watch very much of the tour this year, but I have read a couple of times that the sky team has done incredibly well and that many people were wondering if young Chris Froome could have taken the overall first prize if his classification role in the team wasn't to help Bradley Wiggins to the victory. It looks like Wiggins settled the discussion Saturday on the last time trial of the tour, where he won with a solid margin of 1:16 to Froome.
But let's put the results to a quick test. I have collected Wiggins and Froomes time differences (in seconds) to the winning time of each stage. The source is http://www.dr.dk/Sporten/Tour_de_France/ and here is the resulting data.
Stage | Wiggins | Froome | diff |
Prologue | 7 | 16 | 9 |
1 | 0 | 85 | 85 |
2 | 0 | 0 | 0 |
3 | 1 | 1 | 0 |
4 | 0 | 0 | 0 |
5 | 0 | 0 | 0 |
6 | 4 | 4 | 0 |
7 | 2 | 0 | -2 |
8 | 26 | 26 | 0 |
9 | 0 | 35 | 35 |
10 | 196 | 196 | 0 |
11 | 57 | 55 | -2 |
12 | 474 | 474 | 0 |
13 | 0 | 0 | 0 |
14 | 1095 | 1095 | 0 |
15 | 710 | 710 | 0 |
16 | 429 | 429 | 0 |
17 | 19 | 19 | 0 |
18 | 4 | 4 | 0 |
19 | 0 | 76 | 76 |
20 | 9 | 9 | 0 |
The one sided paired wilcoxon signed rank test, testing for true location shift less than zero gives a p-value of 0.07056 > 0.05, which on a 95% confident measure means that Wiggins didn't do significantly better than Froome in this year's tour.
The p-value however, is so close to the rejection region, that only one more stage time difference with Wiggins being 3 seconds (or more) faster than Froome, would have made his performance significantly better (according to the test). On the contrary, if Froome wasn't caught up in the middle of the run of crashes within the last 25 kilometers of stage 1, then he probably wouldn't have gotten the 85 seconds time difference to Wiggins, and leaving out that single observation gives a more reliably p-value (0.1393).
Notice that the test only consider the observations different from zero (the ties are left out), which means that the actual number of observations for the test is down to only 6.
Monday, July 9, 2012
How predictable was EURO 2012? - Historical view
In my latest post How predictable was EURO 2012?, I introduced 2 predictability measures, and in the preceding comment I came up with a third measure of predictability for a tournament like the EUROs.
All 3 measures are based on the FIFA/Coca-Cola Zonal Ranking for the members of UEFA (FIFA ranking). The latest ranking table before the finals reflects the general expectations in the model.
FIFA plain ranking
Pearson correlation between the final ranking and prediction based solely on the FIFA ranking, without taking group composition and tournament structure in to account. That is, the highest ranked team is expected to win the second highest ranked team is expected to become runner ups and so on. The final ranking is a 1 to 16 list of the qualified teams, based on (in given order)
a) final position or position in group,
b) points,
c) goal difference,
d) goals scored.
If two or more teams are equal, the previous stage of the tournament is consulted, to decide who has done best.
FIFA group dependent
Pearson correlation between the final ranking and prediction based on FIFA ranking under the constraints of the tournament structure and the group compositions. Zonal ranking has decided each round of the tournament, bases on zonal ranking difference in match or zonal ranking order in group.
FIFA prediction score
Apply FIFA ranking to the members of each group. Give score for how many teams were predicted to qualify for the knockout stage (x_1 of 8). In the quarter finals, reapply the FIFA ranking, and give score for how many teams were predicted to qualify for the semifinals (x_2 of 4). Repeat for the semifinals (x_3 of 2) and the final (x_4 of 1). This gives a total score of (x_1 + … + x_4)/15.
In order to determine whether the 3 predictability scores for EURO 2012 were high or low, we should compare them with their historical counterparts. I have done the math, and here is the result in a nice diagram (the sources are www.fifa.com/worldranking/ and www.uefa.com).
As the diagram illustrates, EURO 2012 scored highest in 3 out of 3 measures, which indicates that EURO 2012 has been at least as predictable as the previous 4 European Championships, if not the most predictable.
The diagram is quite interesting. Beside the above conclusion, that EURO 2012 in fact was predictable compared to the previous tournaments, it gives a lot of information about the strengths and the weaknesses of the different measures.
First, the FIFA plain ranking and the FIFA group dependent are closely related (the correlation measures). That should perhaps not come as a surprise, since the teams (beside the hosts) are seeded according to the FIFA ranking. I would say that the FIFA group dependent is most valuable of the two, because the tournament structure constraints are obeyed.
Second, EURO 1996, EURO 2008 and EURO 2012 displays the same pattern, the level of the FIFA plain ranking and FIFA group dependent is 0.2-0.3 points below the FIFA prediction score. EURO 2000 has very low correlation scores. The favorites France won, but many teams surprised a lot (Netherlands, Portugal, Turkey) or didn’t live up to the expectations at all (Germany, Czech Republic, Denmark) of the FIFA ranking.
The correlation measures give more (negative) weight to great disappointments than the FIFA prediction score, and the positive effect of the overall win of the favorites is not noticeable. The FIFA prediction score, on the contrary, gives very much weight to the overall win of the favorites, since they will add to the score in each round of the tournament. In 2004 it is the exact opposite. The overall winners Greece was ranked 13 out of the 16 qualified teams, so they had great negative impact on the FIFA prediction score because they surprised in all rounds of the tournament, and had relatively little impact on the correlation measures.
All 3 measures are based on the FIFA/Coca-Cola Zonal Ranking for the members of UEFA (FIFA ranking). The latest ranking table before the finals reflects the general expectations in the model.
FIFA plain ranking
Pearson correlation between the final ranking and prediction based solely on the FIFA ranking, without taking group composition and tournament structure in to account. That is, the highest ranked team is expected to win the second highest ranked team is expected to become runner ups and so on. The final ranking is a 1 to 16 list of the qualified teams, based on (in given order)
a) final position or position in group,
b) points,
c) goal difference,
d) goals scored.
If two or more teams are equal, the previous stage of the tournament is consulted, to decide who has done best.
FIFA group dependent
Pearson correlation between the final ranking and prediction based on FIFA ranking under the constraints of the tournament structure and the group compositions. Zonal ranking has decided each round of the tournament, bases on zonal ranking difference in match or zonal ranking order in group.
FIFA prediction score
Apply FIFA ranking to the members of each group. Give score for how many teams were predicted to qualify for the knockout stage (x_1 of 8). In the quarter finals, reapply the FIFA ranking, and give score for how many teams were predicted to qualify for the semifinals (x_2 of 4). Repeat for the semifinals (x_3 of 2) and the final (x_4 of 1). This gives a total score of (x_1 + … + x_4)/15.
In order to determine whether the 3 predictability scores for EURO 2012 were high or low, we should compare them with their historical counterparts. I have done the math, and here is the result in a nice diagram (the sources are www.fifa.com/worldranking/ and www.uefa.com).
As the diagram illustrates, EURO 2012 scored highest in 3 out of 3 measures, which indicates that EURO 2012 has been at least as predictable as the previous 4 European Championships, if not the most predictable.
The diagram is quite interesting. Beside the above conclusion, that EURO 2012 in fact was predictable compared to the previous tournaments, it gives a lot of information about the strengths and the weaknesses of the different measures.
First, the FIFA plain ranking and the FIFA group dependent are closely related (the correlation measures). That should perhaps not come as a surprise, since the teams (beside the hosts) are seeded according to the FIFA ranking. I would say that the FIFA group dependent is most valuable of the two, because the tournament structure constraints are obeyed.
Second, EURO 1996, EURO 2008 and EURO 2012 displays the same pattern, the level of the FIFA plain ranking and FIFA group dependent is 0.2-0.3 points below the FIFA prediction score. EURO 2000 has very low correlation scores. The favorites France won, but many teams surprised a lot (Netherlands, Portugal, Turkey) or didn’t live up to the expectations at all (Germany, Czech Republic, Denmark) of the FIFA ranking.
The correlation measures give more (negative) weight to great disappointments than the FIFA prediction score, and the positive effect of the overall win of the favorites is not noticeable. The FIFA prediction score, on the contrary, gives very much weight to the overall win of the favorites, since they will add to the score in each round of the tournament. In 2004 it is the exact opposite. The overall winners Greece was ranked 13 out of the 16 qualified teams, so they had great negative impact on the FIFA prediction score because they surprised in all rounds of the tournament, and had relatively little impact on the correlation measures.
Tuesday, July 3, 2012
How predictable was EURO 2012?
The world cup champions and reigning european champions Spain won their third conscecutive major tournament a few days ago. Many bookmakers and experts had Germany, Netherlands and Spain as favourites, and I personaly (not an expert!) had the world cup runner-ups Netherlands as favourites.
Now it's time to evaluate. How predictable was the outcome. Who surprised and who disappointed the most?
The FIFA/Coca-Cola World Ranking seems like a good measure for the expectations. A lot of different sources could be equally good or even better, like for example one of the leading bookmakers odds before the tournament, but I will make use of the FIFA ranking table from june 6th 2012 (http://www.fifa.com/worldranking/rankingtable/index.html).
Above table need some explanation. Final position is based on (in given order)
a) final placement or position in group,
b) points,
c) goal difference,
d) goals scored
in each round of the tournament. FIFA prediction plain ranking is the EURO teams in the order they appear in the FIFA ranking table from june. Error is FIFA prediction minus final position. FIFA prediction group dependent is the simulated tournament where ranking has decided the outcome of each round in the tournament. The simulation is shown below. The numbers in front of each team is the FIFA zonal ranking (ranking for members of UEFA), and final position is based on
a) zonal ranking difference in match or position in group,
b) zonal ranking
As we can see from the table, Netherlands (-12) has disappointed most of all team in both predictive models, and Czech Republic (+8) has surprised most according to the plain ranking model, and Portugal (+10) and Italy (+8) has surprised most according to the group dependent model.
A graphical representation of above results is shown in below scatter plots.
As the figures illustrate, the FIFA prediction was better than random choosing, but far from superb. This statement is based on the fact that random choosing, on average, will have zero correlation with final position, and perfect prediction will have perfect correlation with final position, that is, correlation coefficient 1. The correlation coefficient between prediction and final position can therefore be used as measure of the goodness of the prediction, or in our case, where the FIFA ranking represent our expectations, the correlation coefficient will measure the predictability of the tournament.
The correlation between the predicted rankings and the final positions is 0.47 and 0.40 respectively.
Now it's time to evaluate. How predictable was the outcome. Who surprised and who disappointed the most?
The FIFA/Coca-Cola World Ranking seems like a good measure for the expectations. A lot of different sources could be equally good or even better, like for example one of the leading bookmakers odds before the tournament, but I will make use of the FIFA ranking table from june 6th 2012 (http://www.fifa.com/worldranking/rankingtable/index.html).
Team | Final position | FIFA prediction plain ranking | Error plain ranking |
FIFA prediction group dependent | Error group dep. |
Spain | 1 | 1 | 0 | 1 | 0 |
Italy | 2 | 8 | 6 | 10 | 8 |
Portugal | 3 | 7 | 4 | 13 | 10 |
Germany | 4 | 2 | -2 | 2 | -2 |
England | 5 | 4 | -1 | 4 | -1 |
Czech Republic | 6 | 14 | 8 | 12 | 6 |
Greece | 7 | 11 | 4 | 8 | 1 |
France | 8 | 10 | 2 | 7 | -1 |
Russia | 9 | 9 | 0 | 6 | -3 |
Croatia | 10 | 5 | -5 | 5 | -5 |
Denmark | 11 | 6 | -5 | 9 | -2 |
Ukraine | 12 | 15 | 3 | 15 | 3 |
Sweden | 13 | 12 | -1 | 11 | -2 |
Poland | 14 | 16 | 2 | 16 | 2 |
Netherlands | 15 | 3 | -12 | 3 | -12 |
Republic of Ireland | 16 | 13 | -3 | 14 | -2 |
Above table need some explanation. Final position is based on (in given order)
a) final placement or position in group,
b) points,
c) goal difference,
d) goals scored
in each round of the tournament. FIFA prediction plain ranking is the EURO teams in the order they appear in the FIFA ranking table from june. Error is FIFA prediction minus final position. FIFA prediction group dependent is the simulated tournament where ranking has decided the outcome of each round in the tournament. The simulation is shown below. The numbers in front of each team is the FIFA zonal ranking (ranking for members of UEFA), and final position is based on
a) zonal ranking difference in match or position in group,
b) zonal ranking
Group A | Quarter final 1 | Semifinal 1 | |||||||
Russia | 9 | Winner group A/Runner up group B | Winner QF 1/Winner QF 3 | ||||||
Greece | 11 | Russia | 9 | Spain | 1 | ||||
Czech Republic | 16 | Netherlands | 3 | Netherlands | 3 | ||||
Poland | 32 | ||||||||
Quarter final 2 | Semifinal 2 | ||||||||
Group B | Winner group B/Runner up group A | Winner QF 2/Winner QF 4 | |||||||
Germany | 2 | Germany | 2 | Germany | 2 | ||||
Netherlands | 3 | Greece | 11 | England | 4 | ||||
Denmark | 6 | ||||||||
Portugal | 7 | Quarter final 3 | |||||||
Winner group C/Runner up group D | |||||||||
Group C | Spain | 1 | |||||||
Spain | 1 | France | 10 | ||||||
Croatia | 5 | ||||||||
Italy | 8 | Quarter final 4 | Final | ||||||
Rep. of Ireland | 13 | Winner group D/Runner up group C | Winner SF 1/Winner SF 3 | ||||||
England | 4 | Spain | 1 | ||||||
Group D | Croatia | 5 | Germany | 2 | |||||
England | 4 | ||||||||
France | 10 | ||||||||
Sweden | 12 | ||||||||
Ukraine | 28 |
As we can see from the table, Netherlands (-12) has disappointed most of all team in both predictive models, and Czech Republic (+8) has surprised most according to the plain ranking model, and Portugal (+10) and Italy (+8) has surprised most according to the group dependent model.
A graphical representation of above results is shown in below scatter plots.
As the figures illustrate, the FIFA prediction was better than random choosing, but far from superb. This statement is based on the fact that random choosing, on average, will have zero correlation with final position, and perfect prediction will have perfect correlation with final position, that is, correlation coefficient 1. The correlation coefficient between prediction and final position can therefore be used as measure of the goodness of the prediction, or in our case, where the FIFA ranking represent our expectations, the correlation coefficient will measure the predictability of the tournament.
The correlation between the predicted rankings and the final positions is 0.47 and 0.40 respectively.
Subscribe to:
Posts (Atom)