The Rating Adjustment Algorithm
The MyPoolStats Ratings use the "Fargo" method - ratings are mathematically determined using the game scores of competitive matches. A description and the mathematics of the system can be found in the Fargo documentation. Here we will discuss the system on a more general level as that is useful for the understanding of the system by regular users.
Here is a simplified form of the algorithm.
Rating adjustment = (actual versus expected performance) * (fit modifier) * (scaling factor)
Actual Versus Expected Performance
The difference between the player's ratings is used to determine the expected match outcome. If the match score matches the expected result, then the players performed as expected and the rating change for both players is zero. Rating changes occur when the final match score differs from the expected result. The rating for the player that over-performed goes up, and the rating for the player that under-performed goes down. The amount of change depends on how different the final match score is from the expected result. If only one game is played between the players (e.g. for a league match), the winner's rating will go up and the loser's rating will go down. This may seem counter-intuitive at first, but keep in mind neither player is expected to win every game.
For example, consider a race to 5 where, according to the player’s ratings, you are expected to win 3 out of 9 (33%). When the match is played however, you actually win 4 (44%). The difference (11%) is what is used in the adjustment. Adjustments are proportional to how different the match outcome was from expected. So for example, if someone is rated incorrectly or plays exceptionally better or worse than average, then the rating adjustment will be larger.
The formula takes the number of games in the match into account because it looks at the fraction of games won out of the total games played in the match. Longer races provide slightly more accurate rating adjustments however; the overall effect is that the rating reflects your average performance. The ratings will move up and down from match to match but the overall trend is quite stable in the long run. You can see this by viewing a random selection of player's rating history graphs.
The system adjusts new players quickly at first and then tapers off to as the player becomes more established. So ratings for new players tend to fluctuate more as the system figures out where they belong. As the player becomes more established in the system, these fluctuations smooth out. Another reason for this approach is to prevent new players from affecting established players ratings quite as much. The system realizes the new player's rating could be less accurate so their rating will change more than the established player's rating.
The fit value is what the system uses to determine new players versus the established players. It also describes its reliability as a higher fit value indicates the system has more information to base the rating on.
For games like 8-ball, 9-ball, and 10-ball, the fit is the number of games the rating is based on. However, the system is capable of calculating ratings for other games as well. It works very well for straight pool and likely games like billiards, 1-pocket, and banks. In these cases, the fit value would be the number of balls the rating is based on.
The fit modifier is the same up to 40 and then steadily decreases until the player reaches 400 in the system, at which point it remains equal. These are not magic numbers; a good way to think of them is just a way to "tune" the system. These values make sure new players should be adjusted quickly to hone in on their rating but not so much that their rating overcompensates and moves too far from their ability. It also allows the system to respond to the natural slumps and hot streaks of established players but not so much that it makes it easy to manipulate their rating. Consider the case where an established player is out of the game for lengthy period of time. When they return, they will still be at the same rating but then their rating will adjust to their new level after a few events. Some may feel this is a weak point for sandbaggers (more on this later); however these natural trends in players' games have a very different fingerprint than malicious behaviors.
Ratings are color-coded to indicate their reliability: red indicates a preliminary rating, green is a well-established rating, and yellow is anywhere in between. In addition to the fit, the system also tracks how many events the player has participated in. The color coding is based on the fit, number of events, and the date the rating was last updated.
An important thing to keep in mind is that a player rating is just an average value of a wide range of ability. It is kind of like trying to measure temperature across an entire state. A person's game varies day to day, opponent to opponent, on and off season. Many players are wary of rating systems for this reason due to their prior experiences with other systems that look at very little information or are not updated regularly. This system addresses each of these concerns directly. One particular measurement could be close but the best way to figure it out is to look at as much information as possible. That is why the Fit value is such an important piece of information because it tells you how much information the rating is based on. It is also important to consider when the rating was last updated - maybe a player hasn't been playing or is actually playing quite a bit but their events haven't been submitted to the system.
The ratings are put into a logarithmic scale, like the Richter scale for earthquakes, which makes it easier to cover the wide range of skill present. The rule of thumb here is that if the rating difference between two players is 100, the higher rated player is twice as good. So they are expected to win twice as many games, or for example, in a race to 8, the better player is expected to win 8-4. A rating difference of 200 means the higher rated player is 4X better than the lower rated player. A rating difference of 300 means the higher rated player is 8X better which means they are expected to win 8-1 in a race to 8!
The game is 8-ball. Player 1 has a rating of 300 and a fit of 50. Player 2 has a rating of 200 and a fit of 200. The difference between the ratings is 100 so the better player is expected to win twice as many games. So, in a race to 6, player 1 is expected to win 6-3. If the final score turns out to be 6-4, then player 2’s rating would go up and player 1’s rating would go down. The amount of rating change also depends on the fit. Player 2 has a much more established rating so their rating will change less than player 2.
Check out the Algorithm Calculator to experiment with different ratings, fits, and match scores and find out how various combinations affects each player's rating adjustment.
A player is given an initial rating when their first match is entered in to the system. This value is the only subjective aspect of the system and is estimated from another system or known ability. It isn't important for the system that this value be exact as the system quickly adjusts new player's ratings. However, if it is a new player in a handicapped event, it is important from the tournament perspective. This is discussed in more detail in the handicapping section.
It is actually possible to start everyone at the same value; however, in practice it is better to have an appropriate start rating. This minimizes the adjustment period when a group of new players is introduced to the system.
The rating adjustment formula takes over from there. Manual adjustments are still possible but are rarely needed. Manual adjustments are always labeled so it is clear when and why it was performed.
The system functions best when there is a good mixing of players rather than having groups of players that only play in their local area. What can happen if a group of players is relatively isolated is that their ratings could become slightly over- or under- inflated compared to others in the system. These sorts of offsets are typically fairly small but it is something the system administrators need to consider and watch for.
For this reason, ratings may be "optimized" periodically to fine tune ratings on established players. To explain, during regular operation, the system will adjust the two player's ratings for each match. However, depending on who is playing who and how much, some player's ratings can be slightly different from their optimal value. The optimization process is able to look at all the matches at once and make rating adjustments accordingly.
In addition, there may be times when ALL the player ratings are adjusted up or down by the same amount. This system is excellent at determining relative differences between players. In other words, player A is better than player B and player B is better than player C. However, it cannot automatically determine where the entire group of players lies on an absolute scale. So it is necessary to periodically review and manually adjust ALL the ratings to align the system to general expectations for player ability. Because ALL the players are adjusted up or down BY THE SAME AMOUNT, it has no effect on the rating system as a whole.
Rating optimization updates are always labeled so it is clear when and why changes were made.
Resistance to Sandbagging
Most handicapping systems are easily manipulated and this is a serious problem. While most players are honest, a small number of unscrupulous players begin trying to cheat any new system. Then when other players hear about this they feel they need to join in or be played the fool. The best way to deal with this problem is to use a system like this one that is open, transparent, and naturally resistant to manipulation.
Here are the ways an unscrupulous player might attempt to manipulate the system and how they are managed.
A new player could manage to enter the system with an inappropriate initial rating. For an extreme example, suppose a player with an actual skill of 600 is put in as a 400. This is where the Fit Modifier portion of the formula comes into play as new players experience rapid rating adjustments. So the player would quickly be adjusted to an appropriate rating as their events are put into the system.
A scenario for which fraudulent entry could be a more serious problem is when the first or an early event is a high-entry-fee, high-payout handicapped event. There are two simple ways to deal with this. Any player with a Fit under a certain threshold may not enter or has their rating adjusted after each match. For larger, unhandicapped events limited to certain skill levels, such as an under 500 tournament, a minimum Fit can be required to enter.
An unscrupulous player may attempt to have matches reported to give him a slightly more favorable rating adjustment. The single best defense against fraudulent reporting is the simple fact a player cannot gain unfair advantage without causing direct unfair disadvantage to the winning player of the same match. The two players have opposing incentives.
Intentionally Losing Games
One strategy for the unscrupulous player is to lay down during league or casual tournaments only to come alive for the high-stakes tournaments. A simple way for the system to deal with this situation is to decrease rating adjustments for more casual events and increase it for more serious events.
Many players will have a rating for several different games. These ratings are calculated independently of each other, so any sandbagger would need to try to cheat all the ratings, not just one. It is perfectly normal for a player to be slightly stronger in some games than others, but it raises red flags when a player has drastically different ratings.
There are also statistical signatures that emerge for players attempting to manipulate the system. A score pattern that differs significantly from expectation is straightforward to detect.
Player reports of suspicious activity can generate an investigation as well as automated checks by the system. These problems can be quickly and easily corrected with manual rating adjustments or alerting the appropriate pool event organizer of the offending player.
While the possibility of manipulating the system can never completely be eliminated, the fact that every game contributes to a player’s rating makes that manipulation much more difficult. Sandbaggers find it difficult to prosper by hiding their true ability. It takes a lot of discipline, time, and cost in entry fees to really cheat the system and any gains are short-lived.
A huge advantage of the system is its ability to objectively determine detailed and accurate ratings. There are no issues with broad, subjective skill divisions with vague and easily abused boundaries that are common in manual systems. Skill is a big source of pride for pool players and having legitimate ratings and public rankings is a great way to encourage players to improve their game.