Unlike the other ratings on my site, the RPI formulas and hockey pairwise rankings are not my own. They are used in real-life tournament selection processes, and thus are important if inaccurate. The football RPI, improved RPI, and reinvented RPI, however, are my own inventions and credit must be given me if they are used.
Contents of this page:
A key rating used in all college sports is the RPI (ratings percentage index). The RPI consists of four factors:
TWP = (wins + 1/2 ties) / (# games)
OWP = the average of a team's opponents' TWP values
OOWP = the average of a team's opponents' OWP values
ADJ = adjustments for quality wins, bad losses, etc.
Note that the OWP average does not count games played between the team and its opponents. Also, if an opponent is played twice, its WP is averaged in twice. Only games against division I opponents are counted.
The first three components are combined into a single value with different weights depending on the sport:
RPI = 0.25*TWP + 0.50*OWP + 0.25*OOWP (baseball, basketball, hockey)
RPI = 0.334*TWP + 0.444*OWP + 0.222*OOWP (football)
Note that RPI isn't officially used in football; however the "loss", "schedule strength", and "quality win" portions of the BCS formaula map extremely well into the RPI formalism. In BCS terms, a team's football RPI ranking is about 1/7 as important as its ranking in the poll or computer averages, meaning that one can compute a "poor-man's BCS" by adding the poll average, computer average, and 1/7 of the RPI rank. The football RPI formula presented here is thus my own invention; the others are created by the NCAA.
The final element of the formula is adjustments that are added, which vary from sport to sport. The real adjustments used in college basketball and baseball are secret; the values I provide here are estimates.
- +0.0024 if your median non-conference opponent's RPI rank was #50 or better
- +0.0012 for beating RPI #1-25 on the road
- +0.0009 for beating RPI #1-25 at a neutral site
- +0.0006 for beating RPI #1-25 at home
- +0.0008 for beating RPI #26-50 on the road
- +0.0006 for beating RPI #26-50 at a neutral site
- +0.0004 for beating RPI #26-50 at home
- -0.0002 for losing to RPI #163-250 on the road
- -0.0003 for losing to RPI #163-250 at a neutral site
- -0.0004 for losing to RPI #163-250 at home
- -0.0004 for losing to RPI #251-324 on the road
- -0.0006 for losing to RPI #251-324 at a neutral site
- -0.0008 for losing to RPI #251-324 at home
- -0.0006 for losing to a non-division I opponent on the road
- -0.0009 for losing to a non-division I opponent at a neutral site
- -0.0012 for losing to a non-division I opponent at home
- -0.0024 if your median non-conference opponent's RPI rank was in the bottom half of the league (#163 or worse in a 324-team league)
- +0.0024 for beating RPI #1-25 on the road
- +0.0018 for beating RPI #26-50 on the road
- +0.0012 for beating RPI #51-75 on the road
- -0.0012 for losing to a bottom 51-75 school at home
- -0.0018 for losing to a bottom 26-50 school at home
- -0.0024 for losing to a bottom 1-25 or non-division I school at home
Overall, the adjustment made to a baseball team's RPI is about twice that made to a basketball team's RPI. The larger bonuses are offset by the fact that bonuses are only given for road wins, but baseball teams play about twice as many games and thus have twice as many chances to pick up bonuses.
- +0.030 for beating #1 team
- +0.027 for beating #2 team
- +0.024 for beating #3 team
- +0.021 for beating #4 team
- +0.018 for beating #5 team
- +0.015 for beating #6 team
- +0.012 for beating #7 team
- +0.009 for beating #8 team
- +0.006 for beating #9 team
- +0.003 for beating #10 team
Again, the "football RPI" is my own invention. The adjustments here are the quality win component. It is clear why the quality win component is so controversial. A typical bonus of 0.01 points for beating a lower top-10 team would be the equivalent of a basketball team winning 11 games over a top-25 team at a neutral site. So while the bonus achievable in one football game is 25 times that achievable in one basketball game, the overall effect (considering that bonuses are only given out for top-10 wins and that the football season is 11 games instead of about 30) of the football quality win component is comparable to that in the other sports.
The Improved RPI
There are a few obvious shortcomings to the RPI. One is that it doesn't consider opponents' opponents' opponents or beyond. On the surface, this is a huge shortcoming since it appears to make the assumption that every team's opponents' opponents' opponents are of equal ability. This is not actually the case; one can statistically estimate the more distant relationships from just the opponents' opponents win percentage. Of course, it is better to actually compute the terms, which is what is done in my improved RPI.
The improved RPI is a self-consistent RPI-like rating scheme. There are two basic principles involved: (1) all games count equally [RPI-like] and (2) the rating of a team that goes 0.500 should equal the average of its opponents ratings [self-consistent]. I am thus looking for a solution where a team's rating is given by:
Rating = (WP-0.5) + average of opponents' ratings.
Since the ratings are both input and output data in this system, the solution must run iteratively until convergence is reached (i.e. the result from one iteration is indistinguishable from that from the next).
In order to produce something resembling an RPI rating, the final team ratings should be multiplied by 0.25 (basketball and baseball), 0.35 (hockey), or 0.334 (football) and increased by 0.5.
A second problem is that, aside from the bonuses, the RPI makes no distinction between home and road games. Since many prominent college teams are known for playing most of their non-conference games at home, this unfairly boosts their RPI. The obvious and simple fix is to add another bonus to the RPI:
HB = X * (#road games - #home games) / (# games),
where X is between 0.01 and 0.03, depending on the sport.
If you do not already have a home field advantage factor calculated, you should first calculate the team ratings with no home field advantage. Then, using all non-neutral site game, take the average of:
(road team rating) - (home team rating) + (outcome),
where outcome is +0.5 for a home win, 0 for a tie, or -0.5 for a road win. Multiply this by either 0.25 (basketball or baseball), 0.334 (football), or 0.35 (hockey) to determine the home advantage factor.
A final problem in the RPI is that, in the computation of schedule strength, a team will be counted as a tougher opponent to somebody that beat it than it will to somebody that it beat because the games between the teams are subtracted. This isn't a huge "problem"; it just means that a team's record counts more than face value in the RPI. In football, where a team's games count about 1/11 of its opponents records, the team's record thus counts about 37.4% of its RPI instead of 33.4%. It does become problematic when the number of games played against each opponent is not constant, however. A football team that plays one opponent twice accounts for 2/12 of that opponents' record, which is of course added twice into the OWP component for an increase of a factor of four. In other words, a game played against a team that you play twice is more important than a game played against a team you play only once. If you play somebody three times, each game is even more important in your RPI. I'm sure that this is unintentional on the part of those who invented the BCS. The obvious solution is to not subtract head-to-head gams from the opponents' record; ironically this would greatly simplify the RPI calculation. At any rate, this problem is fixed by the improved RPI's use of a self-consistent solution.
The "improved" RPI rating is thus the best one can do with this sort of rating system. To summarize, it differs from the real RPI equations in five important ways:
- The depth is infinite, so that it doesn't stop counting at a team's opponents' opponents.
- The formulas produce a self-consistent solution, so that a team's schedule strength equals the average rating of its opponents.
- Head-to-head games are not ignored, so that a team that goes 7-4 against a slate of opponents is not considered to have played a more difficult schedule than one that goes 4-7 against those same teams.
- Home field advantage is considered.
- To enhance early-season stability, priors are included.
The Reinvented RPI
With the "improved" RPI described, it is worth seeing how well one can do using an RPI-like formula. It may be surprising that it can be duplicated extremely well. Fixing the head-to-head and road/home problems are trivial and can be done as noted above. The final question is how to estimate a team's full schedule strength from just its opponents' and opponents' opponents' records.
Defining win bonus ("WB") as WP-0.5, it is straightforward to see that a team's rating is equal to:
Rating = WB + OWB + OOWB + OOOWB + OOOOWB + ...
Since teams play most of their games against conference opponents, there is a strong correlation between a team's opponents' record and its opponents' opponents' records. One can approximate this with:
OOWB = 1/x * OWB,
OOOWB = 1/x * OOWB,
OOOOWB = 1/x * OOOWB = 1/(x*x) * OOWB,
and so on. "1/x" is related to the fraction of games that are played against conference opponents; measured values of 1/x are 0.574 (basketball), 0.638 (baseball), 0.719 (football), and 0.742 (hockey). The mathematical solution to this problem is trival:
OOWB + OOOWB + OOOOWB + ... = x/(x-1) * OOWB.
This implies that a team's rating can be adequately approximated by:
RPI = WP + OWP + x/(x-1)*OOWP, or
RPI = WP + [1 + 1/(x*x-x)]*OWP + OOWP.
The first formula is more correct since a team's opponents' opponents' opponents and beyond are more correlated with opponents' opponents than with opponents; the second is more popular since teams prefer to have their rating more determined by factors they can control (whom they play). The exact value of x will vary from sport to sport, and is primarily a function of the fraction of games played against conference opponents. Given the values of x measured from real-life teams, the ideal RPI formulae would be:
RPI = 0.23*WP + 0.23*OWP + 0.54*OOWP or
RPI = 0.27*WP + 0.46*OWP + 0.27*OOWP (college basketball),
RPI = 0.21*WP + 0.21*OWP + 0.58*OOWP or
RPI = 0.24*WP + 0.52*OWP + 0.24*OOWP (college baseball),
RPI = 0.18*WP + 0.18*OWP + 0.64*OOWP or
RPI = 0.22*WP + 0.52*OWP + 0.26*OOWP (college football), and
RPI = 0.17*WP + 0.17*OWP + 0.66*OOWP or
RPI = 0.19*WP + 0.62*OWP + 0.19*OOWP (college hockey).
The basketball and baseball formulae above are thus extremely close to those used by the NCAA; the football and hockey ones place lower weight on winning percentage than do the NCAA versions. (Even so, college basketball and baseball would be better served by adopting the top versions with something like a 0.25/0.25/0.50 weighting than the formulae they actually use.)
As described in the poll description, human voters tend to underemphasize the importance of schedule by 25%. Thus, in terms of producing human-like rankings, the RPI formulae would be:
RPI = 0.35*WP + 0.41*OWP + 0.24*OOWP (college basketball),
RPI = 0.31*WP + 0.47*OWP + 0.22*OOWP (college baseball),
RPI = 0.28*WP + 0.48*OWP + 0.24*OOWP (college football), and
RPI = 0.25*WP + 0.46*OWP + 0.29*OOWP (college hockey).
It is worth noting that the "RPI" formulae used by the BCS and in college hockey are quite close to these ratios. The basketball and baseball RPIs are more "computer-like".
Hockey Pairwise Rankings
The final "standard" rating is the hockey pairwise ratings. The NCAA hockey tournament selection process is done by computer, but unlike the BCS the computer was programmed to think in human terms. Instead of giving a score for each team, a comparison is made between every pair of teams under consideration (0.500 or better), using the following points of comparison:
Each team gets 1 point for each head-to-head win and 1 point for winning each of the other four comparisons. The team with the most "points" wins the comparison, with a tie broken by RPI. Teams are then ordered based on the number of comparisons won; ties again broken by RPI rating. The top 6 at-large teams are selected. The pairwise rating shown on my ranking page is the percent of comparisons won rather than the number of comparisons.
- overall RPI
- record against other teams above 0.500
- record in last 16 games
- head-to-head results
- record vs. common opponents
I consider this pairwise system to be inferior to that I created for basketball and baseball, because of many simplifications such as use of integers only and limited amount of data considered. However, it is a huge step in the right direction (compared with RPI or BCS ratings), in that team-to-team comparisons are used rather than raw scores according to some ranking.
Return to ratings main page
Note: if you use any of the facts, equations, or mathematical principles introduced here, you must give me credit.
copyright ©2001-2002 Andrew Dolphin