Sunday, August 23, 2009

Quarterback ratings

Trent McCotter is back again as one of the targets of links in today's post, courtesy of his first column in the The News & Observer, which he will write in his copious spare time as a UNC law student. Yesterday, I sang the praises of Steve Strogatz, so now it's Trent's turn. One of the outwardly most mellow people I know, Trent's outward calm conceals a strong passion for sports statistics. A four time winner of the Jack Kavanagh Memorial Youth Baseball Research Award from SABR (three times in the college division, once in the high school division), Trent distinguishes himself by frequently taking a different tack in his work while also delving into the detailed numbers when needed.

In "How to fix the 'perfect game'," Trent avoids the details of the quarterback ratings definition and gets right to the interesting issue of the recent prevalence of perfect passer rating performances, wondering along the way quite how perfect they are.

While we're talking about quarterback ratings and looking forward to the upcoming season, check out "Vick as a Quarterback? He’s Underrated" by Brian Burke of Advanced NFL Stats in The Fifth Down (both sites are full of interesting items). Without getting into recalculations of possible quarterback ratings, Burke's discussion about more conventional statistics makes clear that neither they nor quarterback ratings tell the whole story.

Labels:

Saturday, August 22, 2009

The Mathematics of Hitting Streaks

With the hope that there's actually someone other than my coauthors reading these posts once the college football season arrives (when the hits to the old page understandably ramped up in past years), one of the upsides to transitioning to a blog is to provide easy pointers to other interesting work in the mathematics and statistics of sports.

There are a pair of papers about hitting streaks that have appeared on arXiv.org in the past year. Making things particularly interesting, these two papers take completely different methodological approaches. Sam Arbesman and Steve Strogatz "examine Joe DiMaggio’s 56-game hitting streak and look at its likelihood, using a number of simple models. And it turns out that, contrary to many people’s expectations, an extreme streak, while unlikely in any given year, is not unlikely to have occurred about once within the history of baseball." Meanwhile, Trent McCotter uses permutation tests to find that there appear to have been a significantly larger number of 20-25 game streaks in real life than one would obtain in an independent-games model. You can hear Steve talk more about both studies in a Radiolab podcast from earlier this summer.

Finally, for perhaps the only timely element of this post, Steve has a new book just out this past week, The Calculus of Friendship: What a Teacher and a Student Learned about Life while Corresponding about Math. If it's like everything else Steve does, it will be amazing.

----
Addition (29Aug): For more discussion about hitting streaks, other streaks, and the way that people tend to overinterpret streaks, check out Leonard Mlodinow's interesting WSJ essay, "The Triumph of the Random."

----
Another addition (31Aug): Trent McCotter's second N&O column is about hitting streaks, with a decidedly local-to-NC flavor ("Zimmerman best in state at hitting streaks").

Labels: , , ,

Tuesday, August 11, 2009

Random walking through baseball

Now that the new site format appears to be largely up and working, it's time to start digging into a backlog of math-in-sports topics I've wanted to briefly write about. That said, if anyone has a general solution for the seemingly infamous "Publishing your blog is taking longer than expected" problem occasionally afflicting those of us who ftp-publish to other servers, I would love to hear about it, please!

Today's links are all about baseball. No, not the recent Yankees 4-game sweeping of the Red Sox (just typing that hurts). Instead, consistent with the title of this site, today is all about random walker rankings applied to baseball players. Well, sort of. Specifically, some of my collaborators and I recently wrote a paper (submitted for publication) studying the network of baseball players defined by the collection of pitcher-batter matchups across 1954-2008. Our focus so far is the study of this large network, and one of the (many) ways to try to understand a network is to study some process occurring on that network: enter the biased random walkers that can be used to define a ranking. Of course, the result is a very crude ranking. If one wanted to turn this into a more serious ranking of baseball players, numerous effects could and indeed should be included.

Brandon Keim picked up the story about our work for Wired Science, nicely including some thoughts (both ours and his) about the limitations of using this as a ranking. From there it got some nice attention and further helpful comments, some of which we'll use to clarify and acknowledge in an eventual revision. My coauthor, Mason Porter, has already collected most of the resulting links, including an interview he did with 27pitches.com.

A big thanks to Brandon for writing such a nice story about our work.

Don't worry, we'll start discussing and adding links to less narcissistic topics soon. Maybe.

Labels: , ,

Sunday, August 9, 2009

Rebuilding the old site into a blog

With the new college football season almost upon us, I wanted to finally start to clean this site up and see if we can't give it a more uniform look and feel, now that we've switched to a blog format. The blog will have its benefits, including automatically archiving everything and allowing for comments. But converting the old, existing pages is a pain; in particular, there are already figures and tables in those pages that would have to be reformatted for the blog. I would rather use that time to start adding some new content here. So we're going to keep most of the old site as is, at http://rankings.amath.unc.edu/old.

Apologies to those who click through and find themselves on the old site. The old sidebar now includes a link back to the "RWR Blog" on most pages. If you end up somewhere without this link, please make use of your friendly browser's back button.

Labels:

Press coverage (college football edition)

We're grateful for the positive attention about the random walker rankings as a means of ranking college football teams. We have particularly enjoyed the diversity of outlets interested in this project, including

ESPN the Magazine (issue dated Nov. 10, 2003),
Nature Science Update (Nov. 14, 2003),
Georgia Tech news releases [long and short] (Nov. 18, 2003),
The Chronicle of Higher Education (issue dated Nov. 28, 2003; subscription required),
CNN Headline News (Dec. 30, 2003),
La Recherche (Jan. 2004, subscription required),
Atlanta Business Chronicle (Jan. 16, 2004),
WGST AM640 (May 20, 2004),
The Atlanta Journal-Constitution (May 24, 2004; registration required),
WKY AM930 (May 28, 2004),
American Mathematical Society press release (Aug. 11, 2004),
Science News (week of Sept. 4, 2004),
The Washington Post (Sports columnist Sally Jenkins, Dec. 10, 2005), and
The Mathematical Tourist at MAA Online (Nov. 15, 2007).

Labels:

Our manuscripts about college football

In addition to the rants on this collection of web pages, we have written a pair of scholarly, academic articles about ranking with random walkers. The now somewhat amusingly misnamed "Division I-A Football" article (it was properly named when it was submitted; and the Michigan Wolverines hadn't famously lost to Appalachian State Mountaineers, an FCS née I-AA program) discusses a number of issues in greater depth than covered on this website, including the community structure of the football matchups network and its influence on rankings, ideas about choosing a good p value, and the improved properties of the RWFL ranking system.

''Random Walker Ranking for NCAA Division I-A Football,''
T. Callaghan, P. J. Mucha and M. A. Porter,
American Mathematical Monthly, 114, 761-777 (2007)
[originally made available as arxiv.org/physics/0310148].
Abstract: Each December, college football fans and pundits across America debate which two teams should meet in the NCAA Division I-A National Championship game. The Bowl Championship Series (BCS) standings employed to select the teams invited to this game are intended to provide an unequivocal #1 v. #2 game for the championship; however, this selection process has itself been highly controversial in four of the past six years. The computer algorithms that constitute one part of the BCS standings often act as lightning rods for the controversy, in part because they are inadequately explained to the public. We present an alternative algorithm that is simply explained yet remains effective at ranking the best teams. We define a ranking in terms of biased random walkers on the graph formed by the schedule of games played, with two teams (vertices) connected by an edge if they played each other. Each random walker moves from team to team by selecting a game and "voting" for its winner with probability p, tracing out a never-ending path motivated by the "my team beat your team" argument. We study the statistical properties of a collection of such walkers, relate the rankings to the community structure of the underlying network, and compare these rankings for recent NCAA Division I-A seasons. We also discuss the algorithm's asymptotic behavior, illustrated with some analytically tractable cases for round-robin tournaments, and discuss possible generalizations.

''The Bowl Championship Series: A Mathematical Review,''
T. Callaghan, P. J. Mucha and M. A. Porter,
Notices of the American Mathematical Society 51, 887-893 (2004).
Abstract: We discuss individual components of the college football Bowl Championship Series. Comparing with a simple algorithm defined by random walks on a biased graph, we attempt to predict whether the proposed changes will truly lead to increased BCS bowl access for non-BCS schools. We conclude by arguing that the true problem with the BCS Standings lies not in the computer rankings, but rather in misguided addition.

2008 Random Walker Rankings [link]

http://rankings.amath.unc.edu/old/2008.htm

2007 Random Walker Rankings [link]

http://rankings.amath.unc.edu/old/2007.htm

2006 Random Walker Rankings [link]

http://rankings.amath.unc.edu/old/2006.htm

2005 Random Walker Rankings [link]

http://rankings.amath.unc.edu/old/2005.htm

2004 Random Walker Rankings [link]

http://rankings.amath.unc.edu/old/2004.htm

How well did the monkeys do in 2003? [link]

http://rankings.amath.unc.edu/old/2003.htm