December 30, 2002

While there are many valid criticisms of the BCS system, a few that are popping up are wrong and need to be debunked.

1. "They should be averaging the scores, not the rankings." Suppose that Massey has #1 Miami with a score of 2.0 and #2 Ohio State with 1.8, while Sagarin has #1 Ohio State with 2.0 and #2 Miami with 1.9. Under the BCS system, the teams are tied. Shouldn't Miami be ahead since its total score (3.9) exceeds Ohio State's (3.8)?

The quick answer to this is that the seven BCS ratings are scaled differently and consider different sets of teams. While it would be possible to eliminate all non-IA teams from all of the ratings and scale them to a common mean and standard deviation, that would merely ensure that the majority of teams would be scored on common systems. In the BCS, one is looking only at the top few teams, where the scale differences would be the most severe. In short, there is no way to ensure that all seven ratings could be put on equal scales.

2. "If you're going to throw away the worst rating, you should throw away the best too!" The BCS averages the six highest of the seven ratings, which initially appears to skew the ratings. However, since we are only looking at the top teams, it is much more likely for a computer to give a screwy bad rating than a screwy good rating. For example, #2 Ohio State could have easily been ranked #5 in one of the computer ratings, but it would have been impossible for it to be ranked #-1. Consequently, it is only the spuriously poor ratings that one needs to be overly concerned about, and the way the BCS handles this is the simplest reasonable approach.


December 5, 2002

It looks as if the BCS is going to be sending undefeateds Miami and Ohio State to the title game this season, which will make everyone happy and take some of the heat off the BCS. This may or may not be a good thing, as the system is fundamentally flawed and perhaps a third straight debacle would have forced a major overhaul of the system. The fundamental problems result from the BCS's use of computer ratings.

When the system was first designed, the fear was that the polls were inaccurate and somewhat unfair. This is most readily apparent by the randomness in the ballots cast and discrepancies between the two major polls. However, after observing the systems at work, it is clear that the computer rating systems heavily oversimplify the ranking process. More to the point, it is clear that the difference between the polls and the computers is more because of the oversimplified computer algorithms than the uncertainties in the polls.

This isn't to say that there is no use for the computers. There will certainly be several cases in which the voting is very close, indicating ambiguity in the minds of the voters. In such a case, it is beneficial to invoke the statistically-based computer ratings, which can help clear up the ambiguities. The problem is when the computer ratings are allowed to veto the near-unanimous opinion of the voters, as happened in 2000 and 2001. It would be nice if the system were instead designed in such a way that if, say, 2/3 of the voters put team A ahead of team B, then team A will be ranked ahead of team B regardless of the computer ratings.

A more subtle problem is the makeup of the computer ratings themselves. The computer ratings are not the objective measures of team performance that many think they are. To be sure, the equations are followed objectively by the computer. However the creation of those equations requires the subjective judgment of the programmers. The biggest examples are the ratings of Billingsley, Anderson-Hester, and the NY Times, the equations of which have no mathematical basis are thus completely subjective. As an example, suppose that Billingsley decides that a win over the #40 team is worth 10 points. Why 10 and not 11 or 9? That choice is a subjective one on his part. Colley's system should be in this list as well, since his statistics are so bad as to destroy any statistical validity of his ratings. So while he may think his ratings are statistically-sound, they are not.

My objection isn't that there is anything wrong with these systems; everyone is entitled to his own opinion. (Actually I do have one problem -- Anderson-Hester fraudulently claim to compute the most accurate schedule strengths when they have among the least accurate.) However, it isn't clear why such systems should be used by the BCS, since the computer portion is supposed to give an objective counter to the subjective voters. Certainly giving a completely subjective computer rating system the the importance of 20 voters is a mistake.

This leaves only three acceptable rating systems -- Sagarin, Wolfe, and Massey. Unfortunately none is without its flaws. Sagarin does not use a maximum likelihood. Wolfe and Massey do not consider home field. Wolfe uses an incorrect game probability distribution. Sagarin and Massey (very slightly) penalize a team for a win over a weak opponent. Wolfe discriminates against midmajor I-A conferences. While all three of these systems are better than any of the other four, it is a shame that the BCS couldn't find or create a system without such flaws.


June 24, 2002

The latest news is that the BCS will be eliminating margin of victory from the computer ratings and decreasing the quality win component. The margin of victory removal is an overreaction to a real problem. When typical rational people (such as voters) watch games, they note how convincing of a win the game was, and that affects their rankings. In other words, a convincing win is better than a close call. Because of this, margin of victory should be considered by the computer ratings if they hope to give reasonable ranking. However, several of the computer ratings gave way too much weight to margin of victory, which caused some teams (i.e. Oregon) to get unfairly penalized in the 2001 BCS rankings. The backlash against those systems resulted in the rule we have now.

The other adjustment was a reduction of the quality win component, which is a positive move. The quality win component was created because of the Miami-FSU mess in 2000, but it had a large hand in creating the mess of 2001. The real problem between Miami and FSU was not the number of quality wins, but specifically that Miami had beaten FSU, a fact considered heavily by the voters but not by the computers. By adding the quality win component, Miami would have gotten "extra credit" for the win over FSU, which would have put them into the championship game. The pitfall of the quality win component was made clear in 2001, as the factor failed to put Colorado ahead of Nebraska (which it had beaten) and succeeded in putting both teams ahead of Oregon (which neither had beaten). Although it is too late to change it now, a better system would have been making a head-to-head adjustment only in the case that a team beat the team right ahead of it in the BCS standings.


Return to ratings main page

Note: if you use any of the facts, equations, or mathematical principles introduced here, you must give me credit.

copyright ©2002 Andrew Dolphin