Wednesday, December 31, 2003

How well can monkeys rank football teams? [repost]

We've all experienced befuddlement upon perusing the NCAA Division I-A college football
Bowl Championship Series (BCS) standings, because of the seemingly divine inspiration that must have been incorporated into their determination. The relatively small numbers of games between a large number of teams makes any ranking immediately suspect because of the dearth of head-to-head information. Perhaps you've even wondered if a bunch of monkeys could have ranked the football teams as well as the expert coaches and sportswriters polls and the complicated statistical ranking algorithms.

We had these thoughts, so we set out to test this hypothesis, although with simulated monkeys (random walkers) rather than real ones.

Each of our simulated "monkeys" gets a single vote to cast for the "best" team in the nation, making their decisions based on only one simple guideline: They periodically look up the win-loss outcome of a single game played by their favorite team, and flip a weighted coin to determine whether to change their allegiance to the other team. In order to make this process even modestly reasonable, this random decision is made so that there is higher probability that the monkey's allegiance and vote will go with the team that won the head-to-head contest. For instance, the weighting of the coin might be chosen so that 75% (say) of the time the monkey changes his vote to go with the winner of the game, meaning only a 25% chance of voting for the loser.

The monkey starts by voting for a randomly chosen team. Each monkey then meanders around a network which describes the collection of teams, randomly changing allegiance from one team to another along connections representing games played between the two teams that year. This network is graphically depicted in the figure here, with the monkeys---okay, technically one is a gorilla---not so happily lent to us by Ben Mucha (inset). It's a simple process: if the outcome of the weighted coin flip indicates that he should be casting his vote for the opposing team, the monkey stops cheerleading for the old team and moves to the site in the network representing his new favorite team. While we let the monkeys change their minds over and over again---indeed, a single monkey voter will forever be changing his vote in this scheme---the percentage of votes cast for each football team quickly stabilizes. We thereby obtain rankings each week of the season and at the end of the season, based on the games played to that point of the season, by looking at the fraction of monkeys that vote for each team.


The Simple Rules for Each "Monkey"
(1) Pick a game played by your "favorite" team; that is, the team you are currently casting your single vote for.

(2) Flip a weighted coin that is more likely to come up heads. [How much more likely? What percentage of the time does it come up heads? That is the one number we can modify.]
(3) Completely forgetting which team you voted for before, go with the winner of the game if heads, the loser if tails, changing your vote if necessary.
(4) Return to step 1.

Mathematically analyzing how these simulated monkeys behave, we examined the resulting rankings for the past 33 seasons of Division I-A football. The calculations involved are related to a class of so-called "direct methods" of ranking, but the interpretation in terms of random-walkers appears to be novel. Under this system, winning games is directly rewarded and strength of schedule is automatically incorporated because games played against highly-ranked opponents lead to more monkeys inquiring about and making decisions based on the outcome of such games. Armed only with the single simple rule of more often voting for the winner of a game instead of the loser, the top few teams determined by total vote counts are typically quite reasonable. For instance, it isn't any surprise that the pre-bowl-game monkey rankings at the end of the 2002 season choose Miami and Ohio State as the top two teams, nor is it surprising that they pick Miami as the top team in 2001 and Oklahoma as top in 2000 (all 4 were major, undefeated teams).

More intriguing are the differences between the #2 teams in the pre-bowl monkey rankings and the BCS standings at the ends of the controversial 2000 and 2001 seasons. The results of the monkey ranking system depend on the precise value selected to describe how strongly the flipped coin is weighted, but over a wide range of the coin's weighting, the monkeys select Tennessee to play Miami for the championship at the end of 2001 and Washington to play Oklahoma for the championship at the end of 2000. Both of these selections are mildly surprising, as neither team was commonly backed in the controversies at the ends of the respective seasons. The pre-bowl monkey rankings select these two teams in part because our simple rule includes neither information about the dates of games played (no special weight on Tennessee losing the SEC Championship game to LSU), nor margin of victory (Washington won a number of close games). We could have included date of game and margin of victory in such a system by modifying the weighting of the coin according to some formula describing these
factors, but such redefinitions would require essentially arbitrary choices about how strongly to weight such factors---in the face of such potential arbitrariness, we prefer the simpler system with just a single parameter to determine the transition probability of going with a game winner.


Also important in these controversial #2 selections is the relatively high ratings that the monkey votes accord to the top teams in the SEC in 2001 and the Pac 10 in 2000---in part because all 7 losses by the top three teams in the 2001 SEC were in conference, while 3 of the 4 losses by the top three teams in the 2000 Pac 10 were in conference.

One might very well argue over whether such selections are correct in any sense. We do not advocate that this method is superior to any other method; rather, our interest was to develop and study a very simplistic ranking system. Rather than directly rate the teams, the random walking monkeys are a simplistic behavioral model for voters who get to choose who they believe is the top team. Any arguments about who "should" have been picked for the National Championship game in controversial years remain inconclusive, underscoring the fundamental difficulty in attempting to rank college football teams based on the relatively small numbers of games played. Additionally, we should emphasize that the scheme is likely skewed towards distinguishing top teams, as opposed to separating, say, #31 from #32, since each random walking monkey has only a single vote. It may seem ironic that a group of mathematicians would prefer the easier to describe algorithm; but in the absence of more complete information---remembering that we're using only the win-loss outcome of each game---we prefer this simple ranking system of coin-flipping random-walking monkey voters with only one number (the weighting of the coin) that needs to be selected.

The virtue of this ranking system lies in its relative ease of explanation. Its performance is arguably on par with the expert polls and (typically more complicated) computer algorithms employed by the BCS. Can a bunch of monkeys rank football teams as well as the systems in use now? Perhaps they can.

[This is a partially-edited repost of material from the original, pre-blog, random walker rankings site. As such, the listed post date is approximate.]

Labels: ,

Mathematicians examine the BCS? [repost]

The figure to the right represents the expected distribution of model random-walker votes cast for each NCAA Division I-A football [known as the "Football Bowl Subdivision"] team in 2001 pre-bowl-games rankings. The organization of the teams and the lines connecting them represent the community structure hierarchy, of which the conferences are one level of organization. The colors represent the expected percentage of votes cast per team at each level in the hierarchy, from individual teams up through intra-conference organization, the conferences, and the connections between conferences. The biased probability for voting for the winners of games in the data represented in this figure is p = 0.65. Details are interspersed throughout these pages.

This work grew out of a Research Experiences for Undergraduates (REU) project in Summer 2003 by Georgia Tech undergraduate Thomas Callaghan, in collaboration with postdoctoral visiting assistant professor Mason Porter and assistant professor Peter Mucha. This work was funded (to pay Thomas' summer salary) by an NSF VIGRE "vertical integration" grant, justified by the enrichment of Thomas' educational experiences and by its true vertical integration spirit of joint work between an undergrad, postdoc, and professor. Later support for Thomas was also provided by the Georgia Tech President's Undergraduate Research Award (PURA). After graduating from GT, Thomas went on to a Ph.D. program in Computational and Mathematical Engineering at Stanford.

At the outset, we want to make three things very clear:

(1) We have NOTHING to do with the official Bowl Championship Series (BCS) standings.

(2) Volumes have been written by many mathematically- and statistically-inclined football fans who have developed a multitude of different ways of ranking college football teams (see David Wilson's excellent site).

(3) We don't claim that the system described here is "better"; rather, our approach was to ask if the most naive ranking system we could dream up would do a reasonable job. So we envisioned a collection of random walkers, which you might prefer to think of as monkeys.

[This is a partially-edited repost of material from the original, pre-blog, random walker rankings site. As such, the listed post date is approximate.]

Labels: ,