Pages

Saturday, October 6, 2012

Is number of cards related to league position?

Haven't got much time these days, but I think it's about time to deliver a new post.

Inspired by an article from the homepage of my absolute favourite soccer magazine http://www.offside.org/de-tre-redaktorerna/2012/snallast-vinner-ibland?page=2  (Swedish), I decided to investigate the relationship between number of cards and league position. The article doesn't state any historical relationship, but gives concrete examples from last season, which indicates that the pattern could differ between leagues.

Once again I have looked at my favourite league, the English Premier League. I have collected cards data (yellow + red) for the last 17 seasons (1995-2012) from www.statbunker.com. The graphical representation of cards against league position is following:


Below you see the average number of cards for each league position.


The best linear fit is shown in the figures. The confidence interval for the position coefficient is (0.2305, 0.6669), which clearly rejects the hypothesis of no relation between position and number of cards. The assumptions of the linear model is verified by graphical inspection of the residuals, which indicates that the distribution of number of cards for each league position is approximately normal and has same variance.

The conclusion is that there exist a linear relation between league position and number of cards given in the English Premier League, and on seasonal average, bottom teams receive more cards than top teams.


But how about the distribution of cards between teams? Is there significant differences between teams?

43 teams has taken part in the English Premier League between 1995 and 2012. 30 of these 43 teams has played 3 or more seasons in the period, and they are represented in below figure.


If you find the details interesting, feel free to look into the team data below. Notice that among the earlier big four, that Chelsea and Arsenal on average have received 10-15 more cards per season than Liverpool and Manchester United.

Team Cards/S Seasons
Arsenal  64,82 17
Aston Villa  58,47 17
Chelsea  67,06 17
Everton  65,18 17
Liverpool  52,88 17
Manchester United  55,47 17
Tottenham Hotspur  58,88 17
Newcastle United  59,63 16
Blackburn Rovers  72,80 15
West Ham United  69,86 14
Bolton Wanderers  68,92 13
Middlesbrough  69,31 13
Manchester City  61,58 12
Fulham  56,82 11
Sunderland  70,45 11
Southampton  59,10 10
Leeds United  76,22 9
Charlton Athletic  53,00 8
Birmingham City  65,29 7
Derby County  77,57 7
Leicester City  58,43 7
Portsmouth  61,43 7
Wigan Athletic  67,71 7
Coventry City  67,33 6
West Bromwich Albion  54,00 6
Sheffield Wednesday  51,20 5
Wimbledon  53,00 5
Stoke City  70,00 4
Wolverhampton Wanderers  67,50 4
Nottingham Forest  69,00 3
Bradford City  55,00 2
Crystal Palace  65,50 2
Hull City  72,00 2
Ipswich Town  38,50 2
Norwich City  48,50 2
Queens Park Rangers  73,50 2
Reading  52,50 2
Watford  60,00 2
Barnsley  73,00 1
Blackpool  49,00 1
Burnley  58,00 1
Sheffield United  70,00 1
Swansea City  43,00 1


The seasonal card distribution clearly differ from team to team as we see from the averages, and a boxplot of the two extreme observations makes it clear that there are differences between (at least) some of the teams.

Sunday, August 19, 2012

Which part of the league table has the most goals? Top or bottom?

My thesis was that top teams of a league account for the majority of the big wins during a season, and the bottom teams the majority of big losses. And the approach should be something like this: Take top 3 (or 5) and bottom 3 (or 5) of each years end league table, sum goals for and goals against to obtain a total goal count for each team, remove goals scored in matches between the selected teams and analyze the distribution of the remaining goals for the top and bottom sides.

I chose to analyze the English Premier League, and I experienced that a great deal of the teams with high total goal counts actually was middle teams. It was of course not surprising to see some middle teams with high total goal count, but the number of occurrences was to great to neglect. Therefore, I took a different approach: Take the top 6 total goal counts of each season and analyze the league positions of these teams (in case of ties between the 6th and the 7th highest total goal count, I brought in both teams to the data).

Here's the 2011-2012 data.

Pos. Team GF* GA* GT*
19 Blackburn Rovers  48 78 126
3 Arsenal  74 49 123
18 Bolton Wanderers  46 77 123
1 Manchester City  93 29 122
2 Manchester United  89 33 122
20 Wolverhampton Wndrs  40 82 122
*GF=goals for, GA=goals against, GT=goals total

The table shows that Blackburn finished 19th in the league table, scoring 48 goals and conceding 78 made them the team with highest total goals of the season. The top-6 total goal positions were this year top-3 and bottom-3 in the league table.

Collecting corresponding data for the 17 seasons between 1995 and 2012 gives following distribution of top-6 total goal occurrences for each league table position (the parabola forms the best 2nd degree fit).


The result is not surprising. It simply states that top and bottom teams more often has the highest total goal count than the rest of the league table.

If we re-categorize (sum) the positions in to 3 groups (top, middle and bottom), we obtain measures that are easier to compare in order to answer the initial question. As suggested above, I look at top/bottom-3 and top/bottom-5. The aggregated results are following.


Category Occ. Average TG
Top (1-3) 25 117,48
Middle 56 109,32
Bottom (18-20) 24 113,96


Category Occ. Average TG
Top (1-5) 35 115,31
Middle 37 109,84
Bottom (16-20) 33 111,9

As we can see from the aggregates, the number of top and bottom occurrences are almost identical, indicating that on average, the same number of top and bottom teams is in the top-6 of total goals for the season. The Average TG column indicates that the top of the table on average accounts for more goals than bottom, but Tukey corrected confidence intervals for pairwise differences between the 3 groups (top, middle, bottom) shows that average of top and bottom don't differ significantly (p-value of 0.2840 and 0.2346 respectively). That is, we can assume that top and bottom accounts for the same number of goals (middle is significantly different from top).

The source of the data is www.statto.com, and the highest total goal counts since 1995 are following.


Season Pos. Team GF GA GT
1999-2000 1 Manchester United  97 45 142
2009-2010 1 Chelsea  103 32 135
2010-2011 19 Blackpool  55 78 133
2001-2002 3 Manchester United  87 45 132
2010-2011 11 West Bromwich Albion  56 71 127
2007-2008 11 Tottenham Hotspur  66 61 127
2002-2003 2 Arsenal  85 42 127
2011-2012 19 Blackburn Rovers  48 78 126
2001-2002 4 Newcastle United  74 52 126
1995-1996 14 Wimbledon  55 70 125

Stating that top-teams so far accounts for 3 of 4 or 5 of 10 of the overall highest total goal counts since 1995 in the English Premier League.

Monday, August 13, 2012

Are midfielders more likely to become managers than other players?

Is it right to assume that midfielders have more tactical knowledge than the rest of the team? And is it right to assume that there is more manager potential in a midfield player?

It is a hard task to find out what position current managers preferred to play in their professional playing careers. But I have done the research on the English Premier League Managers of today's teams. Most of the data is from Wikipedia.


Team Manager Highest League P.P. Captain
Arsenal  Arsene Wenger


Aston Villa  Paul Lambert First div. Germany CDM Lower league
Chelsea  Roberto Di Matteo First div. England CDM No
Everton  David Moyes Third div. England CD No
Fulham  Martin Jol First div. England OCM No
Liverpool  Brendan Rodgers


Manchester City  Roberto Mancini First div. Italy F No
Manchester United  Sir Alex Ferguson First div. Scotland F No
Newcastle United  Alan Pardew First div. England CM No
Norwich City  Chris Hughton First div. England LD No
Queens Park Rangers  Mark Hughes First div. England F No
Reading  Brian McDermott First div. England RM No
Southampton  Nigel Adkins Third div. England GK No
Stoke City  Tony Pulis Second div. England CD No
Sunderland  Martin O'Neill First div. England CM National team
Swansea City  Michael Laudrup First div. Spain OCM National team
Tottenham Hotspur  Andre Villas-Boas


West Bromwich Albion  Steve Clarke First div. England RD No
West Ham United  Sam Allardyce First div. England CD Lower league
Wigan Athletic  Roberto Martinez Third div. England CDM Yes
*P.P. is short for preferred position. Managers without p.p. have no or very short professional playing career.

The expected proportion of midfielders should be calculated in order to do any comparison. A simple (and perhaps not right) way to do this, is to take look at the most preferred playing systems and count the midfielders. Without any references, I have chosen following systems (4-4-2, 4-5-1, 3-5-2, 4-3-3), which I hope is representative of the game 10-30 years ago.

I have calculated the expected proportions without weighing the systems. For example, the expected proportion of midfielders is summed over the systems (4+5+5+3)/44 = 0.3864.

Position Count Percent Expected
Goal Keeper 1 5,88 9,09
Defender 5 29,41 34,09
Midfielder 8 47,06 38,64
Forward 3 17,65 18,18

17 100,00 100,00

A proportion test, 8 of 17 being equal to 0.3864 gives a p-value of 0.6428 (CI: 0.2386-0.7147), clearly accepting the hypothesis of equal proportion, which means that midfielders aren't more likely to become managers than other players.

Proportion test of the 4 counts against the expected proportions is done with Pearson's chi-squared goodness of fit test, and gives a p-value of 0.8927, indicating that the playing position distribution is highly expected. The sample size is of course too small to draw any final conclusions and the assumptions of the test is not really fulfilled, but the results so far, give very strong indications.


What about captains? Are they more likely to become managers than non-captains? The expected proportion of captain players I will set to 1/11, other estimates are very welcome. Proportion test, 5 of 17 being equal to 1 of 11 gives a p-value of 0.0127 (CI: 0.1138-0.5595) indicating that captains are more likely to become managers than non-captain players.

Monday, July 23, 2012

Did Wiggins do better than Froome in the Tour?

Le Tour de France 2012 finished yesterday with the usual rounds on Champs-Élysées, and not surprisingly, it all came down to a bunch sprint, where Mark Cavendish proved to be the fastest.

Unfortunately I haven't been able to watch very much of the tour this year, but I have read a couple of times that the sky team has done incredibly well and that many people were wondering if young Chris Froome could have taken the overall first prize if his classification role in the team wasn't to help Bradley Wiggins to the victory. It looks like Wiggins settled the discussion Saturday on the last time trial of the tour, where he won with a solid margin of 1:16 to Froome.

But let's put the results to a quick test. I have collected Wiggins and Froomes time differences (in seconds) to the winning time of each stage. The source is http://www.dr.dk/Sporten/Tour_de_France/ and here is the resulting data.


Stage Wiggins Froome diff
Prologue 7 16 9
1 0 85 85
2 0 0 0
3 1 1 0
4 0 0 0
5 0 0 0
6 4 4 0
7 2 0 -2
8 26 26 0
9 0 35 35
10 196 196 0
11 57 55 -2
12 474 474 0
13 0 0 0
14 1095 1095 0
15 710 710 0
16 429 429 0
17 19 19 0
18 4 4 0
19 0 76 76
20 9 9 0

The one sided paired wilcoxon signed rank test, testing for true location shift less than zero gives a p-value of 0.07056 > 0.05, which on a 95% confident measure means that Wiggins didn't do significantly better than Froome in this year's tour.

The p-value however, is so close to the rejection region, that only one more stage time difference with Wiggins being 3 seconds (or more) faster than Froome, would have made his performance significantly better (according to the test). On the contrary, if Froome wasn't caught up in the middle of the run of crashes within the last 25 kilometers of stage 1, then he probably wouldn't have gotten the 85 seconds time difference to Wiggins, and leaving out that single observation gives a more reliably p-value (0.1393).

Notice that the test only consider the observations different from zero (the ties are left out), which means that the actual number of observations for the test is down to only 6.

Monday, July 9, 2012

How predictable was EURO 2012? - Historical view

In my latest post How predictable was EURO 2012?, I introduced 2 predictability measures, and in the preceding comment I came up with a third measure of predictability for a tournament like the EUROs.

All 3 measures are based on the FIFA/Coca-Cola Zonal Ranking for the members of UEFA (FIFA ranking). The latest ranking table before the finals reflects the general expectations in the model. 

FIFA plain ranking
Pearson correlation between the final ranking and prediction based solely on the FIFA ranking, without taking group composition and tournament structure in to account. That is, the highest ranked team is expected to win the second highest ranked team is expected to become runner ups and so on.  The final ranking is a 1 to 16 list of the qualified teams, based on (in given order)
a) final position or position in group,
b) points,
c) goal difference,
d) goals scored.
If two or more teams are equal, the previous stage of the tournament is consulted, to decide who has done best. 

FIFA group dependent
Pearson correlation between the final ranking and prediction based on FIFA ranking under the constraints of the tournament structure and the group compositions. Zonal ranking has decided each round of the tournament, bases on zonal ranking difference in match or zonal ranking order in group. 

FIFA prediction score
Apply FIFA ranking to the members of each group. Give score for how many teams were predicted to qualify for the knockout stage (x_1 of 8). In the quarter finals, reapply the FIFA ranking, and give score for how many teams were predicted to qualify for the semifinals (x_2 of 4). Repeat for the semifinals (x_3 of 2) and the final (x_4 of 1). This gives a total score of (x_1 + … + x_4)/15.


In order to determine whether the 3 predictability scores for EURO 2012 were high or low, we should compare them with their historical counterparts. I have done the math, and here is the result in a nice diagram (the sources are www.fifa.com/worldranking/ and www.uefa.com).


As the diagram illustrates, EURO 2012 scored highest in 3 out of 3 measures, which indicates that EURO 2012 has been at least as predictable as the previous 4 European Championships, if not the most predictable.

The diagram is quite interesting. Beside the above conclusion, that EURO 2012 in fact was predictable compared to the previous tournaments, it gives a lot of information about the strengths and the weaknesses of the different measures.

First, the FIFA plain ranking and the FIFA group dependent are closely related (the correlation measures). That should perhaps not come as a surprise, since the teams (beside the hosts) are seeded according to the FIFA ranking. I would say that the FIFA group dependent is most valuable of the two, because the tournament structure constraints are obeyed.

Second, EURO 1996, EURO 2008 and EURO 2012 displays the same pattern, the level of the FIFA plain ranking and FIFA group dependent is 0.2-0.3 points below the FIFA prediction score. EURO 2000 has very low correlation scores. The favorites France won, but many teams surprised a lot (Netherlands, Portugal, Turkey) or didn’t live up to the expectations at all (Germany, Czech Republic, Denmark) of the FIFA ranking.

The correlation measures give more (negative) weight to great disappointments than the FIFA prediction score, and the positive effect of the overall win of the favorites is not noticeable. The FIFA prediction score, on the contrary, gives very much weight to the overall win of the favorites, since they will add to the score in each round of the tournament. In 2004 it is the exact opposite. The overall winners Greece was ranked 13 out of the 16 qualified teams, so they had great negative impact on the FIFA prediction score because they surprised in all rounds of the tournament, and had relatively little impact on the correlation measures.

Tuesday, July 3, 2012

How predictable was EURO 2012?

The world cup champions and reigning european champions Spain won their third conscecutive major tournament a few days ago. Many bookmakers and experts had Germany, Netherlands and Spain as favourites, and I personaly (not an expert!) had the world cup runner-ups Netherlands as favourites.

Now it's time to evaluate. How predictable was the outcome. Who surprised and who disappointed the most?

The FIFA/Coca-Cola World Ranking seems like a good measure for the expectations. A lot of different sources could be equally good or even better, like for example one of the leading bookmakers odds before the tournament, but I will make use of the FIFA ranking table from june 6th 2012 (http://www.fifa.com/worldranking/rankingtable/index.html).

Team Final position FIFA prediction plain ranking Error
plain ranking
FIFA prediction group dependent Error
group dep.
Spain 1 1 0 1 0
Italy 2 8 6 10 8
Portugal 3 7 4 13 10
Germany 4 2 -2 2 -2
England 5 4 -1 4 -1
Czech Republic 6 14 8 12 6
Greece 7 11 4 8 1
France 8 10 2 7 -1
Russia 9 9 0 6 -3
Croatia 10 5 -5 5 -5
Denmark 11 6 -5 9 -2
Ukraine 12 15 3 15 3
Sweden 13 12 -1 11 -2
Poland 14 16 2 16 2
Netherlands 15 3 -12 3 -12
Republic of Ireland 16 13 -3 14 -2

Above table need some explanation. Final position is based on (in given order)
a) final placement or position in group,
b) points,
c) goal difference,
d) goals scored
in each round of the tournament. FIFA prediction plain ranking is the EURO teams in the order they appear in the FIFA ranking table from june. Error is FIFA prediction minus final position. FIFA prediction group dependent is the simulated tournament where ranking has decided the outcome of each round in the tournament. The simulation is shown below. The numbers in front of each team is the FIFA zonal ranking (ranking for members of UEFA), and final position is based on
a) zonal ranking difference in match or position in group,
b) zonal ranking

Group A

Quarter final 1

Semifinal 1

Russia 9
Winner group A/Runner up group B Winner QF 1/Winner QF 3
Greece 11
Russia 9

Spain 1
Czech Republic 16
Netherlands 3

Netherlands 3
Poland 32










Quarter final 2

Semifinal 2

Group B

Winner group B/Runner up group A Winner QF 2/Winner QF 4
Germany 2
Germany 2

Germany 2
Netherlands 3
Greece 11

England 4
Denmark 6







Portugal 7
Quarter final 3







Winner group C/Runner up group D


Group C

Spain 1




Spain 1
France 10




Croatia 5







Italy 8
Quarter final 4

Final

Rep. of Ireland 13
Winner group D/Runner up group C Winner SF 1/Winner SF 3



England 4

Spain 1
Group D

Croatia 5

Germany 2
England 4







France 10







Sweden 12







Ukraine 28








As we can see from the table, Netherlands (-12) has disappointed most of all team in both predictive models, and Czech Republic (+8) has surprised most according to the plain ranking model, and Portugal (+10) and Italy (+8) has surprised most according to the group dependent model.

A graphical representation of above results is shown in below scatter plots.

As the figures illustrate, the FIFA prediction was better than random choosing, but far from superb. This statement is based on the fact that random choosing, on average, will have zero correlation with final position, and perfect prediction will have perfect correlation with final position, that is, correlation coefficient  1. The correlation coefficient between prediction and final position can therefore be used as measure of the goodness of the prediction, or in our case, where the FIFA ranking represent our expectations, the correlation coefficient will measure the predictability of the tournament.

The correlation between the predicted rankings and the final positions is 0.47 and 0.40 respectively.

Monday, July 2, 2012

Do goals scored correlate with yellow cards?

Last week I amused myself with the Statistical Kit for the Olympic Football Tournament in London. It is 56 pages of statistics on the Men's Olympic Football Tournament, produced by the FIFA Division Communications & Public Affairs. It can be found here: http://www.fifa.com/mensolympic/organisation/documents/index.html. Very interesting, and nice that it's public available, thumbs up.

Very well. At the last page of the report you find following diagram.

Which indicated a positive correlation between goals scored and yellow cards. Is this a coincidence or a pattern for further investigation?


I have checked the counts for the European Championships and World Cups in same period. Sources are www.uefa.com and www.fifa.com.

The Euro and World Cup graphs don't display the same pattern as the Olympics graph, so correlation between goals scored and yellow cards is unlikely. A correlation plot should support this statement, but in order to compare data from the three different tournaments in one diagram, we should divide the series by games played.

Wednesday, June 27, 2012

Which of the semifinalists in EURO 2012 has done best?

Comparing the four semifinalists in the categories supplied in the Castrol Edge team statistics available on www.uefa.com gives following ranking results.


Spain Portugal Germany Italy
Goals scored 2 3 1 4
Goals conceded 1 3,5 3,5 2
Attempts on target 2 3,5 3,5 1
Attempts off target 4 1,5 3 1,5
Passes completed 1 4 2 3
Ball possession 1 4 2 3
Corners 2 1 4 3
Cards 2 3 1 4
Fouls committed 2 3,5 1 3,5
Fouls suffered 2 3 4 1
Average 1,9 3,0 2,5 2,6

The ranking results should be read as Germany has scored more goals than Spain, which in turn has scored more goals than Portugal and so on. When the count is "bad" the ranking is reversed, as for example the category Cards, where Germany has received less cards than Spain, which in turn has received less cards than Portugal and so on.

The average of the ranking table indicates that Spain has done better than the three other semifinalists so far. But one of the great things about soccer is of course that you don't win on stats, and everything can happen in a single game.

Monday, June 25, 2012

Is England poor or unlucky in shoot-outs?

After their 4-2 loss in the quarter finals of this year’s European Championship finals, England holds a record of 6 losses and only 1 win in penalty shoot-outs in the European Championship and the World Cup finals.

World Cup Finals 1990 West Germany England 4-3
European Championship Finals 1996 Spain England 2-4
European Championship Finals 1996 Germany England 6-5
World Cup Finals 1998 Argentina England 4-3
European Championship Finals 2004 Portugal England 6-5
World Cup Finals 2006 England Portugal 1-3
European Championship Finals 2012 England Italy 2-4
(Source http://www.englandfootballonline.com/TeamPenalty/EngPenKickShootoutMtchs.html)

Journalists, players, coaches and managers often describe or compare penalty shoot-outs to draw of lots or a coin toss. If that’s true, each team of the penalty shoot-out has exactly 50% chance of winning, even England.

The probability of 0 or 1 wins out of 7 independent shoot-outs, if the chances are even, can be read from a table of binomial probabilities to P{X <= 1} = 0.0078 + 0.0547 = 0.0625. That is, if England has same chances as its opponent, the chances of winning only one (or zero) out of seven shoot-outs is 6.25%. If we look through the glasses of a 95% confidence interval or acceptance area, we can’t reject that England has just been unlucky, but the low likelihood gives some evidence to believe that England is poorer than their opponents, that is, the assumption of 50% probability of a win for England is probably false.

Saturday, June 16, 2012

How important is goal difference in the European Championship group stage?

In the 1980s championship, UEFA introduced a group stage to the finals, with 8 teams split in to 2 groups, from which the best team qualified directly to final and the runner ups played for 3rd place. In 1984 UEFA introduced a semifinal round. This model continued until 1996, where the number of teams doubled, and 16 teams where split in 4 groups. Furthermore, points for a win increased from 2 to 3. This model is still alive, but in France 2016, 24 teams will enter the tournament in 6 groups, which means that not the same number of teams from each group, will qualify for knockout phase.

A group of 4 teams is decided after only 6 games, each team playing 3 matches, one against each group member. The possible outcomes are therefore limited to 3^6 = 729 different outcomes. But in which of these outcomes does the goal difference play a role. Well, whenever 2 (or 3) of the 3 teams with most obtained points in each group are equal on points and the number of points obtained in the matches played between the teams in question are equal, then goal difference is important.

So we need to do one analysis for each of above models, to answer the question firmly. However, I will only do the analysis to the current model. That is, on the model introduced in 1996, which will be outdated in France 2016. And I will simplify the analysis by assuming that all teams are perfectly matched in the sense that the probability of 1, X and 2 in each match is 1/3. Furthermore, I will only investigate when goal difference is important to qualification to the knockout stage, that is, I will only look at the importance of goal difference to decide who will finish top two in the group, the order of the top two teams will be ignored (of course the order is important, because the group winner won’t be pared with a group winner in the first round of the knockout stage).

In the table below I have listed all final group standings (on points), in which goal difference can decide who will finish top two in the group.

Standing Frequency Occurencies
9-4-4-0 12
9-3-3-3 8
9-2-2-2 4
7-4-4-1 12 1996 grp A
6-6-6-0 8
6-4-4-3 36 2004 grp A
5-5-5-0 4 2004 grp C
5-3-3-2 12
4-4-4-4 6
4-4-4-3 8
3-3-3-3 1

The sum of the frequencies are 111, so under the assumption of perfectly matched teams, the probability of a final group standing in which goal difference is important is 111/729 = 0.1523. That is, in one out of 7 groups, goal difference will decide who will qualify for the knockout stage of the European Championship, if all teams are perfectly matched.

How does this agree with the empirical statistics up until now? Well, since 1996, 16 groups has been formed and finished, and 3 times (see the table above) has goal difference been decisive for second place in the group. That is, the empirical statistic (0.1875) is quite close to the theoretical probability under the assumption of perfectly matched teams (0.1523), which perhaps could indicate that the assumption of perfectly matched teams is a quite nice simplification of this exact problem.

Tonight, group A of this year European Championship will be decided. There are 9 different outcomes of the 2 remaining games, and the only outcome which will make goal difference important, is Greece-Russia 1 and Czech Republic-Poland X. Then the final standing is 4-4-4-3, where Poland has drawn against all opponents and Russia, Greece and Czech Republic has won one each.

Sunday, group B will be decided, and two outcomes will make goal difference important, Portugal-Netherlands 1 and Denmark-Germany 1 (6-6-6-0) or Portugal-Netherlands 2 and Denmark-Germany 2 (9-3-3-3).

Monday, group C will be decided, and the only outcome which will make goal difference important, is Croatia-Spain X and Italy-Republic of Ireland 1 (5-5-5-0).

Tuesday, group D will be decided, and the only outcome which will make goal difference important, is England-Ukraine 2 and Sweden-France 1 (6-4-4-3).

The source for the Euro results is www.uefa.com. The frequencies in the table should be recalculated if needed in any analysis of importance.

Sunday, May 13, 2012

How often do we experience an unbeaten season?

This season, Italian side Juventus fulfilled an extraordinary achievement. Not only did they win the Italian league, they did it without losing a single game on the way. But exactly how extraordinary is this achievement?

Well, after clicking through all final league tables of the 5 big European leagues (England, Spain, Italy, Germany, France) at Wikipedia, I found that unbeaten seasons is a rather new phenomena. It happened twice in the early thirties in Spain, when the league only consisted of 10 teams, making the likelihood of an unbeaten season much higher than today, where the league consist of 20 teams (Exactly how much more likely is not interesting right now). After these two occurrences, the remaining 4 unbeaten seasons in the 5 leagues are to be found from the late seventies and up to this season.


Country Season Team W D L Placement
Italy 2011-2012 Juventus 23 15 0 1
England 2003-2004 Arsenal 26 12 0 1
Italy 1991-1992 Milan 22 12 0 1
Italy 1978-1979 Perugia 11 19 0 2

Just looking at the table, it looks like it wasn’t a coincidence that the unbeaten season should appear in Italy, but the observations are very few, so let us assume that unbeaten seasons have same probability of occurring in all 5 leagues. Then, counting from the 1946-1947 seasons (1963-1964 for Germany) until the 2011-2012 seasons, the estimated likelihood of an unbeaten season in each league is 4/(66*4 + 49) = 0.0128. That is, in approximately 1 of every 78 league seasons, we will experience an unbeaten season, or we will experience 1 unbeaten season every 16 year across all 5 leagues.

Say Italy is special, in the matter, that unbeaten seasons are more likely in Italy than in the other 4 leagues, then the estimated likelihood of an unbeaten season each year in Italy is 3/66. That is, every 22nd year an Italian team will make an unbeaten season.

Sunday, April 1, 2012

How many seasons does a newly promoted team survive in the higher league?

Seldom all newly promoted teams can avoid relegation in the first season of the new and higher league, but quite often I hear people saying that second season is the toughest. How can both statements possibly be true?

With help of Wikipedia, I have done little summary statistics on the promotion and relegation to and from the highest division in England and France. I’ve only counted from one season after last change to the number of teams in each league, that is since 2003 for the French league (expanded from 18 to 20 teams) and since 1996 for the English league (reduced from 22 to 20 teams).

First I chose the French league, because I thought that the number of teams promoted and relegated and the size of the league had been constant since the 1958-59 season, but I hadn’t done my research good enough. I realized that at least in the period from 1997-2002 the French league where reduced from 20 to 18 teams.

Then, to go a little bit further with the analysis, than just to look at the 2003-2011 seasons of the French league, I decided also to look at the English league, which has the same size and the same number of teams promoted and relegated.

The results are surprisingly equal between the two national leagues.

France

Promoted teams 2003-2010
24

Seasons in the highest division before relegation
Seasons Club count Percent
1 11 45,8
2 3 12,5
3 1 4,2
4 0 0,0
5 or more 6 25,0
Still in highest division but for less than 5 seasons

3 12,5

England

Promoted teams 1996-2010
45

Seasons in the highest division before relegation
Seasons Club count Percent
1 21 46,7
2 6 13,3
3 0 0,0
4 2 4,4
5 or more 12 26,7
Still in highest division but for less than 5 seasons

4 8,9

As expected, many of the newly promoted teams are relegated in the first season. In fact, in 1998 all three newly promoted teams in the English league (Barnsley, Bolton, Crystal Palace) where relegated after just one season in the highest league. But two things surprised me about the data.

First, the data between the two countries are almost identical (look at the percentages), especially if we regroup data so 3 and 4 seasons add up. Second, the statement about the second season being the hardest might in fact give meaning in some sense. As we see, almost no teams are relegated after 3 or 4 seasons, which could be interpreted as if a team survives 2 seasons, it might have established itself in the league, and will most likely do better than the new upcoming teams.


How about relegation then? How long does a team stay down after relegation? The summary statistics are following.

France

Relegated teams 2003-2010
24

Seasons away from highest division
Seasons Club count Percent
1 6 25,0
2 2 8,3
3 1 4,2
4 0 0,0
5 or more 8 33,3
Not back in highest division and relegated for no more than 5 seasons ago

7 29,2

England

Relegated teams 1996-2010
45

Seasons away from highest division
Seasons Club count Percent
1 12 26,7
2 5 11,1
3 1 2,2
4 1 2,2
5 or more 20 44,4
Not back in highest division and relegated for no more than 5 seasons ago

6 13,3
 
Be careful with the interpretation of above data. I think that the story to be told about the relegation data is, that many teams make an extraordinary effort to get back in the higher league (back to the big money) in the first (perhaps two) season, but if that fails, the team might be gone for a long time.

The relegation data differ a bit between the two leagues (look at the last two percentages). I haven't investigated it further, but maybe this is due to the differences in the size of league 2 and the promotion rules between the two countries? In France league 2 consist of 20 teams and the top 3 teams qualify directly for league 1, whereas in England, league 2 consist of 24 teams, the top 2 teams qualify directly and the 4 teams who finish 3 to 6, go through playoffs with two-legged semifinals and a final, for the last promotion spot.

Monday, February 27, 2012

How often does a cup final go to penalties?


Yesterday Liverpool won the English League Cup and took home their first trophy since the FA Cup triumph in 2006 and the following FA Community Shield win. As Liverpool supporter I’m of course very happy, but I wasn’t impressed by the performance, especially not the performance by Captain fantastic (Gerrard). His performance was miles away from the outstanding show he pulled off in the 2005 Champions League final and the 2006 FA Cup final. He can’t run away from his age, and I’m sad to say, that one of my absolute favorites of all time doesn’t have many seasons left as a top player.

Liverpool excels in giving their supporters nail-biting and nerve-wracking finals. Yesterday’s performance was no exception, and an exciting penalty shoot-out should decide it all at last. A total of 5 missed penalties, tells the story: Liverpool didn’t earn it, but Cardiff couldn’t close it.

Enough about Liverpool. Lately I watched another final, the African Cup of Nations final between Ivory Coast and Zambia. Ivory Coast were big favorites before the match, and didn’t concede a single goal during the entire tournament. The final was a goalles draw, and in the penalty shoot-out, Zambia won the lottery.

After watching two consecutive finals go in to penalties, and thinking about Liverpool’s 2005 and 2006 wins (also penalties), I find myself browsing for different kind of statistics on penalties in finals.

A hypothesis is that finals and semifinals (and perhaps quarterfinals) more often go into penalties than preliminary round knock-out matches, because the teams are more evenly matched at that stage of the tournament. But collecting the information below, I have already spend my weekly blog hour, so it will have to do for now.

It’s hard to find information on when (for the first time) finals in specific tournament was scheduled to be decided on shoot-outs after full time and 30 minutes extra time. It has been introduced in different years in different tournaments. And often penalties are introduced several years before in earlier knock-out rounds than in the final.

I’m surprised that the two English Cups have introduced penalties in finals so late compared to the other tournaments I have investigated. Take a look at the stats. It certainly looks like the African teams should practice penalties before the final!

Tournament Shoot-out scheduled in final first time Finals since first scheduled Shoot-out finals %
African Cup Of Nations 1980* 16 43,75
Fifa World Cup 1986 7 28,57
Uefa Champions League 1980** 32 28,13
English League Cup 1998 15 20,00
Uefa European Championship 1976 9 11,11***
English FA Cup 1993 19 10,53

*My best guess. In 1974 the drawn final triggered a replay. In 1976 the final stage of the tournament didn’t involve knock-out matches. In 1978 a qualification match was decided on shoot-out, but shoot-outs were also used in the qualification in 1974. In 1980 the first ever match of the knock-out stage of the finals was decided on penalties.
**My best guess. In 1974 the drawn final triggered a replay. In 1980 the drawn final of the Uefa Cup Winners Cup is decided on penalties after the 30 minutes extra time. In 1984 the first Uefa Champions League final is decided on penalties.
***Two finals which possibly could have gone to penalties were decided on golden goal.


And here is a really interesting one on penalties. By Karel Stokkermans and Carles Lozano, enjoy:
http://www.rsssf.com/miscellaneous/penalties.html