Dr. Tuen
Smash Lord
- Joined
- Apr 26, 2009
- Messages
- 1,396
- 3DS FC
- 0559-7294-8323
The TGL Ranking System
Table of Contents Introduction
Motivation
Theory: Placement System
Theory: Gain-Loss System
Automation
Beta Test Results
Future Work
Closing Remarks
INTRODUCTION
The TGL Power Ranking System is a new and innovative method for ranking individuals in the Super Smash Brothers community. Using a system which accounts for each win and loss in tournament play, the TGL power ranking system has the potential to exceed the ranking accuracy currently seen in panel based systems. While this accuracy has not yet been achieved, future work and constant dedication will steer this project toward the future of smash power rankings.
It should be noted here that the drawbacks of any given system, including the TGL PR System, are not overlooked. Instead, they are analyzed and solutions for future numerical power ranking systems are considered.
MOTIVATION
The original motivation for this project came from the lack of an Oregonian Power Ranking. General lack of interest made the formation of a panel difficult. Under the idea that numerical power ranking systems could be run by one person, various ideas were put forth and tested lightly. Eventually a gain-loss system was implemented and the system has been steadily improving every since.
The biggest issue being addressed here is the accuracy of panel based power rankings. On the regional level, it is difficult for a panel to discern, with high accuracy, the places of the players ranking 3rd and below. On higher scales, the error may even extend to the 2nd and 1st place positions. This error occurs because while the first few places may be obvious, the places below that become very dependent on several factors which must be accounted for simultaneously for accurate placement. These factors include: tournament attendance, average placement, highest placement, lowest placement, general reputation, and performance against other ranked players.
The benefits of a gain-loss system include the following factors: removing the reputation factor, tabulation of all wins and losses, account for new players, all while carrying over interactions between tournaments.
The drawbacks include the following: high work load, complicated factors, and difficulties in accounting for a lack of tournament attendance.
THEORY: Placement System
This section of the overview covers the theory behind the two numerical systems that have been considered. The first of the two is the Placement System. The placement system is an easy numerical system put into place and follows these principles:
**Players who place in tournament receive points
** Players who place under a certain threshold do not receive any points
**Points are carried over in between tournaments
**Players are ranked based on their accumulated points over the tabulation period
Benefits
**Easy to implement
**Accounts for player attendance
Drawbacks
**The difference between players placing 3rd and below can become very small, making the results somewhat unclear in some situations
**Players who create an early lead are not punished for huge losses
**Catching up to top players, even through beating them directly, is very difficult
Overall, the placement system does not yield very good results because the mobility of players below 3rd place is low. It is difficult to account for players who truly improve their average performance without resetting the entire power ranking.
THEORY: Gain-Loss System
The gain-loss system is the primary system used in the TGL Power Ranking System. This system acts on the idea that every match is important. Every single interaction in every single tournament is analyzed and taken into consideration when an event is tabulated.
Terms
WinnerScore = the winner’s starting score
LoserScore = the loser’s starting score
NewWinnerScore = the winner’s new score after a match
NewLoserScore = the loser’s new score after a match
NPI = New Player Index
These terms are useful for understanding the gain-loss equations. These are the backbone of the TGL power ranking system. Every match results in a new score for both players, and those new scores are calculated as follows:
NewWinnerScore = 100*(LoserNPI/20)*(LoserScore/WinnerScore)
NewLoserScore = 50*(WinnerNPI/20)*(LoserScore/WinnerScore)
There are a number of effects that must be described for these equations to make sense.
Primary Score Interaction
The ratio on the far right of each equation (LoserScore/WinnerScore) is the primary score interaction in these computations.
From the Winner’s Prospective
If the winner has an expected win (Loser Score < Winner Score), then the key ratio will be less than one and the winner will yield very little points. If the winner’s victory is unexpected (WinnerScore < LoserScore) then the winner will yield many points because the key ratio will be greater than one.
From the Loser’s Prospective
If the loser experiences an expected loss (Loser Score < Winner Score), then the key ratio will be less than one and the point loss will be minimal. If the loser’s loss is unexpected (WinnerScore < LoserScore) then the loser will sustain a large loss in points because the key ratio will be greater than one.
New Player Index
This is a number that ranges from 10 to 20. When a new player enters their first tournament which is analyzed by this PR system, they are assigned an NPI of 10. This index increases by 1 for every match they attend.
This index helps reduce any new player effects. The index is applied to the new player’s opponent, which can cut that player’s score change in half. Here is why this works:
Player A is new
Player B is a veteran (NPI of 10)
Player A is BETTER than player B
Without the NPI, Player A will beat Player B and strip hundreds of points from that player. As this occurs, Player B falls in the PR and Player A rises to his or her true score. When the tournament is over, Player A is at his or her appropriate score, and Player B’s score is lower than Player B’s true score.
With the NPI, Player A will beat Player B and the score loss will be reduced by the NPI. Since Player B’s NPI is 20, Player A’s score gain is unaffected. This allows Player A to rise to his or her appropriate score without negatively effecting Player B. This is the positive effect of the New Player Index.
Benefits
**Accounts for all player interactions on all skill levels, yielding accuracy throughout the rankings
**Can properly account for new players entering a previously established power ranking
**Allows for players to overtake first place, should their performance warrant such movement
**Scores tend to be far separated from each other, so scores and placements are clear
Drawbacks
**Attendance can affect placement
**The work load associated with tabulating each match is tremendous (hundreds of computations per tournament)
**Inconsistent gain-loss distribution
Automation
This assesses one of the main drawbacks of the gain-loss power ranking format. A code has been written that has a number of capabilities. First, a couple of explanations:
**The code is written in MATLAB. Actual coders will probably have a good laugh at that, but it’s true. When asking several coders at Oregon State, all of my requests for assistance were turned down and I learned how to program using the most common code for Chemical Engineers (the area I have a degree in).
**TIO files can be converted to text files for easy handling
**Inside the TIO file, each player is paired with an ID number, this is used for all events in the tournament
**An excel file is used for score archiving. A score progression from event to event can be viewed after the PR calculations are finished.
The program has a number of features included
**Events of any size can be tabulated
**Bye’s are properly accounted for, and have no effect on player score
**Grand Finals and Double Grand Finals are properly tabulated
**Names are checked individually to account for multiple pseudonyms and misspellings
**Names, Scores, ID numbers, and NPI are all properly paired before the tournament is analyzed
**New scores and names are recorded automatically
**New players are automatically given a new row in the Archive file, a score of 1000, and an NPI of 10
Drawbacks
**NAMES. The program operator must have a good social knowledge of the players in question to cover the possibility of multiple names, misspellings, and other mistakes. The largest source of errors is through extra pseudonyms created for fun or via a name change the player undergoes at one point or another. If this is not properly accounted for, the player is given a new score and the interactions from there on out are inaccurate.
Beta Test Results
The finished program underwent a beta test using pertinent tournaments from the North West Region (Oregon, Washington and Idaho). Below is a list of tournaments included:
TP3 Pools http://allisbrawl.com/ttournament.aspx?id=3482
TP3 Pro http://allisbrawl.com/ttournament.aspx?id=3482
GC November http://allisbrawl.com/ttournament.aspx?id=7726
GC December http://allisbrawl.com/ttournament.aspx?id=7916
GO 3.0 http://allisbrawl.com/ttournament.aspx?id=6534
GC January http://allisbrawl.com/ttournament.aspx?id=7917
PRI Smash II http://allisbrawl.com/ttournament.aspx?id=7931
TP4 http://allisbrawl.com/ttournament.aspx?id=7727
Using these tournaments, the following results were obtained:
1 felix 2714
2 jem 2256
3 nerd 1982
4 carlos 1940
5 pwneroni 1901
6 bladewise 1863
7 valdens 1845
8 gage 1729
9 zeionut 1704
10 itakio 1660
11 sagemoon 1624
12 chip 1616
13 weruop 1571
14 c!z 1556
15 eggz 1538
16 t1mmy 1537
17 mr.b0jangle 1498
18 uchiha78 1412
19 tuen 1383
20 dr.mario12 1373
Oregonian players occupy positions 5 (pwneroni), 8 (gage), 16 (t1mmy), and 19 (tuen). All other players are Washington players.
Looking through the tournaments, there are player-based arguments that can justify the scores of each player. To go through these separately would be lengthy and tedious. Feel free to inspect the performance of any individual player.
Noted Criticism
These numbers are not without error. Here are some of the issues that need to be worked out as this project moves forward:
Tournament Inclusion
The addition of PRI II (an Oregonian tournament) without including IES (a Washington tournament) has been noted. This prompted a discussion about tournament addition under context. It has been suggested that regional tournaments with a certain degree of region interaction are the only tournaments which warrant tabulation on a regional scale. Other arguments ask for inclusion of many region specific tournaments while keeping a balance between those regions in doing so.
Attendance
In this Power Ranking, C!Z is an anomaly. He has not attended an event since one of the first tournaments in the list and is still ranked 14th. Without a method for decaying a player’s score based on attendance, this kind of error can occur. Carlos has been noted as another attendance anomaly, having only attended two of the events listed. His performance at one of them was outstanding, (a run which includes TWO defeats of the current #1 ranked player), but his attendance record still calls for some kind of minor decay to his score, in some opinions.
Regional attendance with respect to regional tournament inclusion creates another layer of complication. If a player can only attend events in their state due to financial reasons, what is to be done about missing out of state regionals? For players who still attend their in-state events (regionals and state specific) do they still experience attendance decay?
The automation for this kind of decay is not a problem. But as discussed here, the appropriate implementation is.
Gain-Loss Distribution
This is an issue because the benefits a player experiences for beating someone of near-equal score to themselves as opposed to someone better than them is not balanced. This will be covered in more detail in “Future Work”.
Future Work
This section details the work that is currently being done on this project. The project will attempt to move forward whenever possible, seeking higher accuracy and a wider spectrum of application within the Smash Community.
Gain-Loss Distribution
This is a primary concern with this project. Two players which are of exactly equal score (this can happen if two new players play each other) experience high levels of fluctuation because their score interchange is not too different from a score exchange of two players whose scores differ by 1.5 times.
This occurs because the two are related linearly. As the score ratio (LoserScore/WinnerScore) increases (e.g. ratios over 1 imply an upset, the winner was worse than the loser; and ratios under 1 imply the winner was better than the loser), the points gained increases linearly as well. This creates very little difference between beating someone of equal score to you (exchange about 100 points) and someone who is 1.5 times your score (exchange about 150 points).
The other issue is the fact that point gain is UNBOUNDED. Upsets, while great, should not yield hundreds and hundreds of points. Statistically speaking, all players have some minute chance of beating the best in the world. If a player does, it may be an uncharacteristically good performance. If that player’s average tournament performance changes and that player starts beating all the best players consistently, then the number one spot is not unearned. Otherwise, it is a single non-repeating occurrence which can throw off future tournament data.
To fix this, the ratio can be put through the Logistic Function. Here is more information on this curve:
http://en.wikipedia.org/wiki/Logistic_function
This function can be adjusted so that different values occur at different places. For its initial tests, it will be set so that players of equal score (key ratio is one) exchange 30 points, and players who have a difference of 1.5 times (key ratio is 1.5) exchange 90 points. Score exchanges are capped at 100 points.
Note, these large score changes are for unexpected wins. Expected wins and expected losses yield very little points of gain and very little points of loss.
Tournament Categorization
In the ELO ranking system, which this is loosely based off of, tournaments are ranked based on the average score of the players attending. THIS IS NOT A FUNCTION OF ATTENDANCE. Here’s an example. Sakura Con holds a tournament every year, and gets a very large turnout (80+ I think). Under this system’s ranking, their scores (if estimated properly) would be around 500-700 each. The average score would be fairly low. If you compare this to PRI II, an Oregon tournament held last January, which had 30-someodd people attend, you’d get an average score of approximately 1200.
This tournament ranking would yield a small effect on the overall score change (5%), but it could be useful for improving accuracy.
Performance Rating
This is another ELO idea which allows statistically anomalous performances to be taken into account. Say, for instance, for one tournament only a player ranked 17th won a NorthWest regional. After this regional, this person’s performance didn’t rise above 10th. That one win is an outlier overall.
This information can be used to make that one anomalous performance affect the PR slightly less (while still benefitting the one-time winner). It should be noted that when a player’s average performance changes (in this case, getting 2nd and 3rd consistently after that one win), then scores will change appropriately.
The calculations necessary for this factor to be appropriately implemented are not yet fully understood.
Attendance Correction
The basic idea for an attendance correction is to detract points for missing a certain number of events in a row. However, with the ideas of analyzing multiple regions in consideration, doing this without taking points in an unreasonable fashion is troublesome. No ideas are yet put forth which seem reasonable.
Program Translation
The tournament tabulation program, the “TGL-9000”, is written in MATLAB. At some point, far in the future, this will be converted to Java, or something common like that, to facilitate wide spread usage.
Closing Remarks
This project has been in the works for nearly three quarters of a year and is still going strong. The hope is that once some of these new ideas are in place, the inaccuracies of the system will fall below that of a panel-based power ranking. In truth, the system is tending towards displaying widely accurate results with blatantly obvious inaccuracies. In a hybrid system, panel and numerical, the panel could easily pick out the anomaly and have it removed.
With more hard work and support from the community, this could become the future of smash power rankings. Whether or not that actually happens is of no consequence to me, I am working through this project for the sake of my own curiosity. Those who believe in the system will follow it, those who do not believe in the system will not. Either way, I intend on working on this until it reaches the efficiency and accessibility which will allow any player to tabulate interactions in their own regions.
A clear picture of who is ranked where can motivate players, invigorate a region, and create exciting rivalries. That clarity is all that this project seeks.
Until the next update, thanks for reading!
Best Regards,
Tuen
Head of the TGL Power Ranking Project
----
The original project page can be viewed here:
http://allisbrawl.com/group.aspx?id=9557
====
Updates:
4/15/2010
Programming
It looks like there are MATLAB clones out there available for download. I hear that none of them have 100% compatibility with MATLAB's ".m" files, but I don't think I used very complicated functions to begin with. After the equations settle out, this may be an accessible alternative to converting my near 1000 lines of code.
Efficiency
It may become useful to add this into the code: check for names that match the archive EXACTLY (100%). All of these names do not have to be double checked and re-typed by the user. Depending on how accurately the TO inputs names, this could save the number cruncher a lot of time
Drawback: when the scale of this project increases, the number of randoms willing to enter under their real name (say, eric), and the chances of their similarity, increases. It has also been shown that for fun and jokes, people like to enter tournaments under famous smash names (M2K, Larry, Ally, etc), which could be an issue to this lazy method.
Solution: Do a quick run through of the bracket (in TIO) before running it. If you have a good knowledge of your players, the fixes are easy.
---
4/2/2010
The variable gain loss parameter has been programmed into the automated tournament ranking system. With this adjustment in place, the TGL system becomes more and more alike to the ELO system, with some key differences. However, because of this, the system has inherited ELO's slow score convergence trait.
Scores in ELO can take YEARS to converge to their proper values. The smash community needs something that can adjust to players changing skill levels in the matter of MONTHS. Adjustments will be made the key equations until something suitable is set.
Once this is complete, an update for the Pac Northwest will be tabulated. After which, discussion for this project's expansion can begin.