Mathematically Calculated Tier List. SECOND RESULTS IN! First post updated.

Mogwai · Mar 27, 2007

I have an idea for computing the tier list using a complex analysis of matchups in a given metagame and I'm curious if people would be interested in this. What I'd like to do with this is take an initially perfectly diverse metagame (equal number of players playing each character), and then evaluate the respective power levels of every character using the following formula:

P = S(M * N)/T
P: Power
S: Sum of all characters
M: Matchup vs. a given character (Taken from Phanna chart)
N: Number of players playing a given character
T: Total number of Players

Then I plan to have some form of formula to determine how players will change their character decisions based on the relative powers of each character. I was thinking of something simple, along the lines of:

Version 1.0
N = T * (P / S)
N: Number of players playing the character
T: Total number of players
P: Relative power of the character
S: Sum of all character's powers

Version 1.1
N = T * (wP / wS)
N: Number of players playing the character
T: Total number of players
wP: Weighted power of the character (wP = e ^ P)
wS: Sum of the weighted powers of all chatacter's (sigma(wP))

Repeating this calculation for a large number of iterations should result in giving a fairly accurate assesment of the optimal power level and ranking of each character. What I'd like to know is how high the interest level for this sort of thing is and if anyone would like to see something done differently.

My main motivation in this sort of endevor would be to provide the comunity with some sort of concrete tier list and ranking of characters based on a purely mathematical approach so that when arguments arise, there is some reference. Of course it wouldn't be perfect, as it would be assuming correctness of the Phanna chart, as well as ignoring the fact that not all players select their character based on tournament viability, but I think it might go a long way to settling tier list disputes.

Stable Tiers Version 1.1
Falco Players: 17.96% Power Level: 5.80
Sheik Players: 15.68% Power Level: 5.66
Marth Players: 12.63% Power Level: 5.44
Fox Players: 12.16% Power Level: 5.41
Peach Players: 8.87% Power Level: 5.09
Ice Climbers Players: 5.98% Power Level: 4.70
Samus Players: 5.88% Power Level: 4.68
Jigglypuff Players: 4.49% Power Level: 4.41
Ganon Players: 2.97% Power Level: 4.00
C. Falcon Players: 2.60% Power Level: 3.86
Doc Players: 2.25% Power Level: 3.72
Luigi Players: 1.57% Power Level: 3.36
Mario Players: 1.20% Power Level: 3.09
Y. Link Players: 0.95% Power Level: 2.86
Link Players: 0.76% Power Level: 2.64
DK Players: 0.61% Power Level: 2.41
Roy Players: 0.50% Power Level: 2.22
Ness Players: 0.49% Power Level: 2.20
Zelda Players: 0.48% Power Level: 2.17
Pikachu Players: 0.45% Power Level: 2.11
Kirby Players: 0.38% Power Level: 1.93
Yoshi Players: 0.37% Power Level: 1.90
Mr.G&W Players: 0.28% Power Level: 1.62
Pichu Players: 0.20% Power Level: 1.29
Mewtwo Players: 0.18% Power Level: 1.20
Bowser Players: 0.11% Power Level: 0.70

Also, for reference here is the All-Characters Matchup chart, created by Phanna (Thanks a bunch to Phanna for permission to use)

I think some slight changes may need to be made, but I'm going to go ahead and use the un-altered chart for the first run through the program. If something upsets you on this chart and you want it changed permanantly, here's the link to Phanna's Thread: http://smashboards.com/showthread.php?t=92025

sonicPearltsl25 · Mar 27, 2007

woah...this sounds really cool....and ide like to see it all worked out and everything

idk how you came up with those formulas and stuff though.....props

tarheeljks · Mar 27, 2007

What is exactly is S and how do you plan to observe T? Something similar to this was discussed w/n the matchup chart. I like the idea in theory but I'm not sure how you will get your hands on the data necessary to "test" the model.

Zankoku · Mar 27, 2007

The idea sounds interesting, though the two equations seem to be rather recursive. I'd like to know how you get out of the loop.

The source of data will be major tournament brackets, I'm guessing?

quak · Mar 27, 2007

i suggest that you hang on ur time instead, while it is a good idea, people just really won't care, and will still refer to the tier list that is currently used.

Mogwai · Mar 27, 2007

tarheeljks said:
What is exactly is S and how do you plan to observe T? Something similar to this was discussed w/n the matchup chart. I like the idea in theory but I'm not sure how you will get your hands on the data necessary to "test" the model.

S in the first formula would normally be a sigma, but seeing as how I don't know to make such a thing on the boards, I did what I could. That function is just the sum of all the matchups * the number of times they would expect to see that matchup. T is just going to be a set number, I'm not going to be basing this off of the world as we know it, I will be setting up an easily observed world of say, 10000 smashers or so.

Ankoku said:
The idea sounds interesting, though the two equations seem to be rather recursive. I'd like to know how you get out of the loop.

The source of data will be major tournament brackets, I'm guessing?

Well, it is recursive, the idea is that after a certain number of iterations you would reach a stable number of each characters that would result in the optimal tier list and power rankings.

As I first stated, I would use the Phanna chart as the source for character matchups, if you have beef with a rating, we can alter rating as necessary in this thread to make people as happy as possible with the matchups.

quak said:
i suggest that you hang on ur time instead, while it is a good idea, people just really won't care, and will still refer to the tier list that is currently used.

Meh, it actually won't be too hard to do at all, and honestly, as long as just 1 other person on the boards cares about it, I figure it's worth posting. Mostly I wanted to hear whether people thought the general process in this calculation was OK or if major changes needed to be made before I go ahead and do it.

MARIOWNAGE · Mar 28, 2007

Protip: you're spelling tier wrong

Mogwai · Mar 28, 2007

MARIOWNAGE said:
Protip: you're spelling tier wrong

Point well taken. . . Silly, "'i' before 'e' except after 'c'" rule. Guess I should've payed more attention in elementary school. I've fixed it in all my posts, but I can't fix it in the tread title, drat.

D20 · Mar 28, 2007

Wesley, this sounds like it's something out of NUMB3RS (that TV show that I always miss because of Super Smash Fridays). If anyone has ever seen that show, then they would know that mathematics can solve any crime, bring about world peace, and create an accurate tier list. I'd like to see you run with this, and I'd be glad to help you out if you needed it.

*Thumbs up*

KevinM · Mar 28, 2007

Yea wes i wouldn't mind helping you with this at all its a really cool project

Linkster47 · Mar 28, 2007

This looks awesome, and I'm glad to know Link isn't the worst character! I'm willing to help if you need anything.

Mogwai · Mar 28, 2007

KevinM said:
Yea wes i wouldn't mind helping you with this at all its a really cool project

Linkster47 said:
This looks awesome, and I'm glad to know Link isn't the worst character! I'm willing to help if you need anything.

D20 said:
Wesley, this sounds like it's something out of NUMB3RS (that TV show that I always miss because of Super Smash Fridays). If anyone has ever seen that show, then they would know that mathematics can solve any crime, bring about world peace, and create an accurate tier list. I'd like to see you run with this, and I'd be glad to help you out if you needed it.

*Thumbs up*

Thanks for the positive feedback guys. Currently, I don't think I'll need any help unless you see something that looks fishy in the process I'll be using.

Some of the stuff in the Phanna chart looks a little wrong to me, mostly with spacies vs. Low Teirs (Falco is equally good at beating Captain Falcon as Pichu? And Mewtwo has a 7 vs. Fox? ect ect.) But I don't think these should affect the end result too badly. As the Phanna chart gets more refined, my calculations should become better and better, but in the meantime, I've decided to let it be.

The only really relevant matchups that I think might need a 2nd look are Fox vs. Marth, Marth vs. Sheik, and Sheik vs. CF. From my experience, Fox vs. Marth is even (5), Marth vs. Sheik is a very slight edge to Sheik (4/6), and CF vs. Sheik is a very slight edge to Sheik (4/6). Again, my current plan is to leave the chart alone, but I think the end results might be a little messed up due to these matchups.

Mogwai · Mar 29, 2007

My first attempts at this are complete, however, I'm dissatisfied with a few things and will want input on how to better help this process later. Before I go into the flaws, here are the end results of a system starting with 100000 smashers playing each character after 10000 iterations:

Sheik Players: 145619 Power Level: 6.818546538461538
Falco Players: 138387 Power Level: 6.479876153846154
Marth Players: 136117 Power Level: 6.373618461538461
Fox Players: 134746 Power Level: 6.309395
Peach Players: 126350 Power Level: 5.916252307692307
Ice Climbers Players: 126084 Power Level: 5.903832307692308
Samus Players: 122487 Power Level: 5.735366153846154
C. Falcon Players: 118838 Power Level: 5.564524230769231
Ganon Players: 117155 Power Level: 5.485741153846154
Jigglypuff Players: 114412 Power Level: 5.357301538461538
Doc Players: 108730 Power Level: 5.091223076923077
Mario Players: 104260 Power Level: 4.88190076923077
Luigi Players: 95463 Power Level: 4.470020384615385
Ness Players: 94331 Power Level: 4.417008076923077
Y. Link Players: 92755 Power Level: 4.343205769230769
Pikachu Players: 90349 Power Level: 4.230528461538461
Link Players: 90055 Power Level: 4.216781538461539
Zelda Players: 83248 Power Level: 3.8980726923076925
DK Players: 83115 Power Level: 3.891828846153846
Yoshi Players: 80712 Power Level: 3.7793138461538462
Roy Players: 78245 Power Level: 3.6637769230769233
Mr.G&W Players: 78245 Power Level: 3.663766153846154
Kirby Players: 69053 Power Level: 3.2333638461538463
Mewtwo Players: 63127 Power Level: 2.9558957692307692
Pichu Players: 56604 Power Level: 2.6504542307692307
Bowser Players: 51500 Power Level: 2.4114884615384615

The first problem I see is that I think the matchups are currently flawed in a lot of places. Kirby still has a 4 vs. Sheik which drastically boosts his numbers, despite Kirby not actually standing a chance vs. Sheik. Y. Link's matchups are basically strictly better than Links on the chart, whereas Link should have many better matchups like Marth, C.F., and Ganon. Sheik's Low Tier matchups on the chart are all **** while spacies and peach are severly underestimated in these matchups. Ness is given ungodly good numbers in the low tier and even some very favorable numbers in the high tiers. I suppose these are only my opinions, but I do think these numbers need work to give a moderately accurate system.

The next problem is in the direct relationship between the power level and number of people selecting a character. As I'm currently doing it, if one character has a power of 6 and another has a power of 3, the character with a 6 will only get twice as many players as the one with a 3, even though the 6 is far far better than the 3. I need to find a better way to do this, perhaps a logrithmic scale? I also was thinking of perhaps just subtracting a constant from power level in this calculation so that in the above example, the 6 would become say a 4 and the 3 would become a 1, making it a 4:1 ratio rather than 2:1. I think there are several ways to do this, all with pros and cons, so any input would appreciated.

I'm sure other problems will arrise with further versions, but these are the current 2 glaring ones.

Dan2 · Mar 29, 2007

First of all, this is a really cool project. Math strikes again! Second, I think your right on two points. The matchup list is pretty skewed in some areas and above and below five need different magnitudes. One thing you may want to think about incorporating is a positive/negative scale on the power level. I'm not sure if this is even possible for you, but I'll explain what I mean. First, you assign a 5 in the chart, a value of zero in the corresponding area in the formula. then you can make 0-4 negative, and 6-10 positive. Again, I'm not sure if you could use this, or even if it's a good idea at all. Just wanted to throw out a possibility ^_^

Banks · Mar 29, 2007

I guess it would represent the tiers based on some things, but not on true potential. If everyone quit fox and started playing pichu it still wouldnt mean pichu is better than fox.

ArticulacyFTW · Mar 29, 2007

I think that for this analysis you're going to actually have to do, you know, research. If you want it to be as accurate as possible anyway. Something like look at the big tournaments and see how many of each character are actually being played. Yeah, rough, I know. But that's the only way to come up with an accurate statistic other than just guessing at which characters have certain popularity or just purely going by how good they are. You have to start at the correct level, otherwise being good against unpopular characters will help that character out too much every time. I guess the best you could do at short notice without too much effort is just have a poll and ask for it to be stickied, "What character do you use at tournaments?" Obviously it would be imperfect, as people use multiple characters (which, by the way, this doesn't take into account), etc., but the complexity of that sort of poll is impossible with this kind of thing.

Of course, since this is a purely mathematical exercise in the "What if?" then I guess you don't need to worry about that.

The other thing I think you have to ask yourself, are the match up point values weighted correctly? Is a 5-3 really worth 70% of a 5-0? Certainly it is difficult to put an exact value on the difference between the two. I think it would be a better approach to use the probability of victory at equal skill level, rather than this system, because while there is certainly a relationship between these two things, there certainly isn't a DIRECT correlation. A probability of victory would make this system much more accurate. This, of course, is a much more difficult statistic to obtain, but...yeah. It's better.

The other problem, which may be too complex to deal with, is that I'm pretty sure that at the low levels of tournaments there is a different mix of characters than at high levels. This means that different characters will do better in the initial rounds than the characters that will do better in the later rounds. While it probably isn't a huge factor, it's still there.

So, that's my input.

PS: Very good job, great idea.

PPS: You mention that Mewtwo has a 7 vs. Fox. Err, you meant the other way around right?

moogle · Mar 29, 2007

My suggestion: you should probably get a new model for how many people play each character. The way you have it now, it shows that there are about 2.7 times more Falco players than Bowser players in tournaments. It's closer to 50 times more Falcos.

I'll throw out some numbers that may or may not be correct, but it should get you on the right path. About 75% of the characters you see in mid-to-high level play are Fox, Falco, Sheik, Marth, or Peach. I'll say it's about 40-50% top tier, 25-35% high tier, 15-20% mid tier, ~5% low and bottom tier.

Okay, P seems to be a character's chance to win, on a scale of 1 to 10 (or 0 to 9). Based on your results, max is about 6.8, min is about 2.4. Now when it comes to N, the number of players using that character, you have N linearly related to P, but it really shouldn't be. I think N should be directly proportional to some power of P... either 2 or 3 (or somewhere in between). Just experiment and find a good way to map "power" onto "usage."

Good luck! I am hoping your results will be useful to the community.

p.s. to change your topic's title, find your topic in Melee Discussion, then double click on some empty space near your topic (without clicking on the link).

EDIT: I didn't read very closely.

You addressed this in the post with results. Yeah, a logarithmic relation would probably work... just gotta find the right base. log_b_(N) ~ P. Or my method N ~ P^(b). Same thing.

Mogwai · Mar 29, 2007

ArticulacyFTW said:
I think that for this analysis you're going to actually have to do, you know, research. If you want it to be as accurate as possible anyway. Something like look at the big tournaments and see how many of each character are actually being played. Yeah, rough, I know. But that's the only way to come up with an accurate statistic other than just guessing at which characters have certain popularity or just purely going by how good they are. You have to start at the correct level, otherwise being good against unpopular characters will help that character out too much every time. I guess the best you could do at short notice without too much effort is just have a poll and ask for it to be stickied, "What character do you use at tournaments?" Obviously it would be imperfect, as people use multiple characters (which, by the way, this doesn't take into account), etc., but the complexity of that sort of poll is impossible with this kind of thing.

Of course, since this is a purely mathematical exercise in the "What if?" then I guess you don't need to worry about that.

Well, you definately have the right idea, but the theory I'm working with here is that the less popular characters should naturally become less popular due to their innately low power levels, and then from there, the power weights should slowly get better and better until reaching a stable system. If we had some sort of accurate statistic for numbers of players playing each character in tournaments, calculating power would be as simple as using the power formula I have once. However, such a system would be very very heavily influenced by the current tier list, as many players select characters based on this list. I'm trying to use a perfect world to figure out the natural order of all characters.

ArticulacyFTW said:
The other thing I think you have to ask yourself, are the match up point values weighted correctly? Is a 5-3 really worth 70% of a 5-0? Certainly it is difficult to put an exact value on the difference between the two. I think it would be a better approach to use the probability of victory at equal skill level, rather than this system, because while there is certainly a relationship between these two things, there certainly isn't a DIRECT correlation. A probability of victory would make this system much more accurate. This, of course, is a much more difficult statistic to obtain, but...yeah. It's better.

Excellent point! I actually thought of this last night in bed, but I couldn't figure out how precisely to do it. This is really a tough problem, but I suppose I could use the predicted match result to generate a percentage expected to win any given game and then use that to determine the chance to win a best 2 out of 3 set and use that as the "matchup" value, rather than the raw number. I'll put up how this approach would map the matchup values later when I have time to compute them.

ArticulacyFTW said:
The other problem, which may be too complex to deal with, is that I'm pretty sure that at the low levels of tournaments there is a different mix of characters than at high levels. This means that different characters will do better in the initial rounds than the characters that will do better in the later rounds. While it probably isn't a huge factor, it's still there.

I think this could be handled by creating a lower level matchup chart that represents players who aren't fulled comfortable with the high level game. Other than that, I'm pretty sure this one is over our heads. However, doing something like calculating low tiers for an entirely low tier tournament would be fairly simple, I would simply exclude the high teir characters and let the program run, which would produce meaningful results I think,

ArticulacyFTW said:
So, that's my input.

Thanks a bunch. I love getting meaningful input on my project.

ArticulacyFTW said:
PS: Very good job, great idea.

Thanks!

ArticulacyFTW said:
PPS: You mention that Mewtwo has a 7 vs. Fox. Err, you meant the other way around right?

Yea, other way around, sorry. btw, I'm sure this matchup is at best a 1. I've played Foxes with M2 and it's no fun getting uaired off at like, 50%

.

moogle said:
My suggestion: you should probably get a new model for how many people play each character. The way you have it now, it shows that there are about 2.7 times more Falco players than Bowser players in tournaments. It's closer to 50 times more Falcos.

I'll throw out some numbers that may or may not be correct, but it should get you on the right path. About 75% of the characters you see in mid-to-high level play are Fox, Falco, Sheik, Marth, or Peach. I'll say it's about 40-50% top tier, 25-35% high tier, 15-20% mid tier, ~5% low and bottom tier.

Okay, P seems to be a character's chance to win, on a scale of 1 to 10 (or 0 to 9). Based on your results, max is about 6.8, min is about 2.4. Now when it comes to N, the number of players using that character, you have N linearly related to P, but it really shouldn't be. I think N should be directly proportional to some power of P... either 2 or 3 (or somewhere in between). Just experiment and find a good way to map "power" onto "usage."

Good luck! I am hoping your results will be useful to the community.

p.s. to change your topic's title, find your topic in Melee Discussion, then double click on some empty space near your topic (without clicking on the link).

EDIT: I didn't read very closely. You addressed this in the post with results. Yeah, a logarithmic relation would probably work... just gotta find the right base. log_b_(N) ~ P. Or my method N ~ P^(b). Same thing.

Yea, I think I've addressed everything here between my previous post and addressing Articulacy's post. For the record, the scale is 0-10, in a world of all sheiks and 1 bowser, that bowser would have a power of 0 and in a world of all bowsers and 1 sheik, that sheik would have a power of 10.

I think you might be on to a better theory with the power scale rather than logrithmic, though I think both might work. I'll try both out and post results when I have the time.

Thanks a lot for your input and for telling me how to change topic title.

ArticulacyFTW · Mar 29, 2007

Wesley said:
Excellent point! I actually thought of this last night in bed, but I couldn't figure out how precisely to do it. This is really a tough problem, but I suppose I could use the predicted match result to generate a percentage expected to win any given game and then use that to determine the chance to win a best 2 out of 3 set and use that as the "matchup" value, rather than the raw number. I'll put up how this approach would map the matchup values later when I have time to compute them.

This process should be fairly simple, if we only had the correct data to start with. The problem is that it's not clear that a 5-1 match-up really means that there's a 5/6 chance in winning any particular game, but I think that's the best we can go on right now. Unless I'm just being dumb.

If I'm not being dumb, then here is the process. In a best two of three, there are six possibilities (W=win, L=Lose): W-W; W-L-W; W-L-L; L-L; L-W-L; L-W-W. From this there is a simple way of calculating the probability of winning. Simply calculate the probabilities of the winning cases happening, and add them together (you get these probabilities by multiplying the probability of each win or loss together).

The winning cases, of course, are W-W, W-L-W, and L-W-W. The probability of W-L-W and L-W-W happening are the same (of course, ignoring any psychological influence) so we get a formula of W^2+(W^2*L)*2. The probability of L, of course, is (1-W).

Using this formula, we get probabilities of:

Match-up 0= 0%
Match-up 1= 7.4074074%
Match-up 2= 19.8250729%
Match-up 3= 31.6406250%
Match-up 4= 41.7009602%
Match-up 5= 50%
Match-up 6= 58.2990398%
Match-up 7= 68.3593750%
Match-up 8= 80.1749271%
Match-up 9= 92.5925926%
Match-up 10= 100%

So yeah, if you figure out a better way of calculating the initial probabilites, then that's the way to go anyhow.

Mogwai · Mar 29, 2007

ArticulacyFTW said:
This process should be fairly simple, if we only had the correct data to start with. The problem is that it's not clear that a 5-1 match-up really means that there's a 5/6 chance in winning any particular game, but I think that's the best we can go on right now. Unless I'm just being dumb.

If I'm not being dumb, then here is the process. In a best two of three, there are six possibilities (W=win, L=Lose): W-W; W-L-W; W-L-L; L-L; L-W-L; L-W-W. From this there is a simple way of calculating the probability of winning. Simply calculate the probabilities of the winning cases happening, and add them together (you get these probabilities by multiplying the probability of each win or loss together).

The winning cases, of course, are W-W, W-L-W, and L-W-W. The probability of W-L-W and L-W-W happening are the same (of course, ignoring any psychological influence) so we get a formula of W^2+(W^2*L)*2. The probability of L, of course, is (1-W).

Using this formula, we get probabilities of:

Match-up 0= 0%
Match-up 1= 7.4074074%
Match-up 2= 19.8250729%
Match-up 3= 31.6406250%
Match-up 4= 41.7009602%
Match-up 5= 50%
Match-up 6= 58.2990398%
Match-up 7= 68.3593750%
Match-up 8= 80.1749271%
Match-up 9= 92.5925926%
Match-up 10= 100%

So yeah, if you figure out a better way of calculating the initial probabilites, then that's the way to go anyhow.

Looks correct to me. These numbers don't seem like they'll affect the formula too much though, as the change in ratios of matchups is very subtle. I went through and did the same calculations using a 0 on the chart to represent 0-10, a 1 on the chart to represent 1-9, etcetera, which results in the following for best two out of three:

0 - 0%
1 - 2.8%
2 - 10.4%
3 - 21.6%
4 - 35.2%
5 - 50%
6 - 64.8%
7 - 78.4%
8 - 89.6%
9 - 97.2%
10 - 100%

I'm going to try using both these sets of mappings and see how they affect the end result. While I don't think either of them is truly correct, this is the best option we have using the current matchup chart right now. However, I think this really points out the lack of flexibility in a chart on a 0-10 scale. Perhaps I should start more precise matchup discussion threads within character specific forums so that we can get more useful numbers for the purpose of this simulation.

jaywinner · Mar 29, 2007

ArticulacyFTW said:
The problem is that it's not clear that a 5-1 match-up really means that there's a 5/6 chance in winning any particular game, but I think that's the best we can go on right now.

While calculating 5-1 as a 5/6 chance may be the best method, it is also clear that the first 5 games cannot be won by the same character since that would result in a 5-0 win. Would 4/5 be more accurate? It seems odd but would account for a 4-1 match most likely followed up by a 5th win to finish off the series. Somewhere in between?

Mogwai · Mar 29, 2007

jaywinner said:
While calculating 5-1 as a 5/6 chance may be the best method, it is also clear that the first 5 games cannot be won by the same character since that would result in a 5-0 win. Would 4/5 be more accurate? It seems odd but would account for a 4-1 match most likely followed up by a 5th win to finish off the series. Somewhere in between?

Not true. The first 5 games could be won by the same character in a 5-1 matchup, it's just that this is not the most likely outcome. Remember, the outcomes on the Phanna chart are not set in stone, they simply represent the expected result. Unexpected results may occur, but based on the strengths and weaknesses of characters in given matchups, the most likely results are posted in the Phanna chart. Since we expect a 5-1 matchup to be 5-1, it naturally follows that the advantageous character in this matchup had a 5 in 6 chance of winning each game. The key here is that we're not observing a single match for each matchup, we're estimating the most likely outcome.

jaywinner · Mar 29, 2007

Wesley, you are correct.

What I meant was that while a 5-1 character's most likely outcome is not to win the first 5 games, it might happen too often for statistical calculations. If this is not the case then by all means continue as planned.

Mogwai · Mar 29, 2007

jaywinner said:
Wesley, you are correct.

What I meant was that while a 5-1 character's most likely outcome is not to win the first 5 games, it might happen too often for statistical calculations. If this is not the case then by all means continue as planned.

Well, the trick is that there is a wide range of possible game winning percentages that result in the most likely outcome being a 5-1 match. Both the values 4/5 and 5/6 result in an expected 5-1 match, but the 5/6 maximizes the chance of that particular outcome.

I agree 100% that this system is flawed, but the only better option I see is to discuss every matchup on every character specific board until we get some form of consensus on game winning percentages for each matchup. While I personally like this idea a lot and think it will do wonders for the accuracy of my system, it would prove fairly time consuming and difficult. However, if there is enough interest in getting this data and people are willing to share the load of gathering it, I say it's worth a try. Thoughts?

pockyD · Mar 30, 2007

*nerd alert!*

Zankoku · Mar 30, 2007

Wesley said:
Well, the trick is that there is a wide range of possible game winning percentages that result in the most likely outcome being a 5-1 match. Both the values 4/5 and 5/6 result in an expected 5-1 match, but the 5/6 maximizes the chance of that particular outcome.

I agree 100% that this system is flawed, but the only better option I see is to discuss every matchup on every character specific board until we get some form of consensus on game winning percentages for each matchup. While I personally like this idea a lot and think it will do wonders for the accuracy of my system, it would prove fairly time consuming and difficult. However, if there is enough interest in getting this data and people are willing to share the load of gathering it, I say it's worth a try. Thoughts?

It's always a possibility, though it may take a while before people on Smashboards agree on something as definite as game-winning percentages.

Mogwai · Mar 30, 2007

pockyD said:
*nerd alert!*

d(-'.'-)> Sure thing big guy. btw, thanks for the input over aim, I won't let anyone know about your secret dabblings in nerdiness.

Ankoku said:
It's always a possibility, though it may take a while before people on Smashboards agree on something as definite as game-winning percentages.

True nuff. . . Getting people on smashboards to agree about anything is d@mn near impossible. Getting reasonable numbers that a majority can accept is probably the best we could hope for, and even that might prove brutal, but might be worth a shot. Currently, I lack the initiative to start an undertaking, but it's an idea for the future.

<π · Mar 30, 2007

cool idea, keep up the good work.

pockyD · Mar 30, 2007

Wesley said:
d(-'.'-)> Sure thing big guy. btw, thanks for the input over aim, I won't let anyone know about your secret dabblings in nerdiness.

lol i'm not ashamed, i just have a crappy time explaining myself without a back-and-forth

also ideas occur to me in regular intervals of 93 seconds so i don't want to spam

but in summary: exponential scale for power levels (e^P), percentage representation of the # of players per character, put whatever your most current findings are in your first post

Zankoku · Mar 30, 2007

I attempted to implement these formulas in a spreadsheet, using some revised stuff.

First, I used the weighted matchup numbers:
0 - 0%
1 - 2.8%
2 - 10.4%
3 - 21.6%
4 - 35.2%
5 - 50%
6 - 64.8%
7 - 78.4%
8 - 89.6%
9 - 97.2%
10 - 100%

Then, for population, I used the formula:
N = T * (P² / S)
N: Number of players playing the character
T: Total number of players
P: Relative power of the character
S: Sum of the squares of all character's powers

Here's the list I arrived at:
Tier List
Sheik - 9.51% - 6.58
Falco - 9.01% - 6.4
Fox - 8.31% - 6.15
Marth - 8.30% - 6.14
Peach - 6.83% - 5.57
ICs - 6.16% - 5.29
Samus - 6.14% - 5.28
Ganon - 5.28% - 4.9
J. Puff - 5.23% - 4.88
C. Falcon - 5.20% - 4.86
Doc - 4.14% - 4.34
Mario - 3.35% - 3.9
Luigi - 2.96% - 3.67
Y. Link - 2.55% - 3.4
Ness - 2.30% - 3.23
Link - 2.24% - 3.19
Pikachu - 2.17% - 3.14
Zelda - 1.81% - 2.87
DK - 1.79% - 2.86
Roy - 1.48% - 2.59
Yoshi - 1.42% - 2.54
Mr. G&W - 1.27% - 2.4
Kirby - 1.09% - 2.23
Mewtwo - 0.65% - 1.72
Pichu - 0.52% - 1.53
Bowser - 0.28% - 1.13

Again, for some inexplicable reason Sheik makes it to the top. Also of note is that my spreadsheet tells me the numbers stabilized after a mere 7 iterations.

takieddine · Mar 30, 2007

OMG wesely

can you say TLDR?!

Edit: good job, but does the data talk about all of smash or just competitive?
If its competitive, I think Fox deserves a higher spot than Marth.

pockyD · Mar 30, 2007

you should also scale up the power levels so that some of them can be OVER 9000

ArticulacyFTW · Mar 30, 2007

takieddine said:
If its competitive, I think Fox deserves a higher spot than Marth.

Err, I don't know if you understood the purpose of this correctly?

It's a calculation of what the tiers should be (or perhaps to see if the tier list itself can be recreated through sound mathematics), based on the match-up data created by Phanna and using mathematical formulae. He can't exactly just decide to make Fox go higher--something in the match-up data or methods would have to change, and he can't really change that to arbitrarily make Fox perform better.

Endless Nightmares · Mar 30, 2007

Sheik will probably always be at the top because of her HUGE advantages against the lower tiers...

moogle · Mar 30, 2007

Here's a model for power level vs % use:
%use = (3^P)/70

P=6.5, %use = 18.0% (Ex: Falco)
P=6.0, %use = 10.4% (Ex: Marth)
P=5.5, %use = 6.0% (Ex: Falcon)
P=5.0, %use = 3.5% (Ex: Jiggly)
P=4.5, %use = 2.0% (Ex: Luigi)
P=4.0, %use = 1.1% (Ex: Pikachu)

Maybe the base should be higher than 3... doing that would make it more likely Fox/Falco would be above Sheik.

Maybe try %use = (3.5^P)/175.

ArticulacyFTW · Mar 30, 2007

Err, the numbers you came up with, while interesting, seem a bit, I dunno, arbitrary?

How did you pick them?

pockyD · Mar 30, 2007

yeah, seems like you're just trying to get the formula to produce the current tier list, which is really not the purpose of this project x.x

moogle · Mar 30, 2007

Like.. the 18.0% use? I aimed for top tier characters having about 20% each of the share, high tier having 10% each, mid tier having 2-5% each, low tier 0.5-1.5% each, bottom tier 0.1-0.5% each. Yeah, that's kinda guesswork from me. Maybe more people could supply their estimates on what those values should be.

If you mean 6.5, 6.0, etc... that seemed to be the upper range of power levels that you guys were generating, so that's what I chose to correlate with my percentages for top tier, high tier, etc.

@pockyD: Let me try an analogy. Say you guys are trying to draw an apple. So far, you have a square drawn on your paper. I'm telling you guys to try drawing a circle instead, cause that's closer to the shape of an apple. I'm just offering a new model to replace an old one.

Mogwai · Mar 30, 2007

pockyD said:
you should also scale up the power levels so that some of them can be OVER 9000

Lol, already thought about that, I'll see what I can do to get some style points later, right now I need to work the kinks out.

56k said:
Sheik will probably always be at the top because of her HUGE advantages against the lower tiers...

Well, as the chart currently stands, yes, but honestly I think that other top tier characters are underrated in these matchups on the Phanna chart, especially Fox.

moogle said:
Here's a model for power level vs % use:
%use = (3^P)/70

P=6.5, %use = 18.0% (Ex: Falco)
P=6.0, %use = 10.4% (Ex: Marth)
P=5.5, %use = 6.0% (Ex: Falcon)
P=5.0, %use = 3.5% (Ex: Jiggly)
P=4.5, %use = 2.0% (Ex: Luigi)
P=4.0, %use = 1.1% (Ex: Pikachu)

Maybe the base should be higher than 3... doing that would make it more likely Fox/Falco would be above Sheik.

Maybe try %use = (3.5^P)/175.

hmmm, sounds intriguing, I'll be trying this out along with raising power level to a set exponent and observe the effects. One question though, why the constant in the denominator? No need to scale them all down since we're not using this to replace power level, just to calculate new use, so a constant multiplier or devider will do nothing in the calculation. Thanks for the suggestion though, I think this idea has a lot of merit.

ArticulacyFTW said:
Err, the numbers you came up with, while interesting, seem a bit, I dunno, arbitrary?

How did you pick them?

Which numbers are you talking about? Sometimes arbitrary constant decisious must be made, but outside of those, all numbers in the process have a reason. If you're talking about moogle's numbers, some constants need to be used and a little guesswork is necessary. His process described above seems as reasonable as we could hope for.

moogle said:
@pockyD: Let me try an analogy. Say you guys are trying to draw an apple. So far, you have a square drawn on your paper. I'm telling you guys to try drawing a circle instead, cause that's closer to the shape of an apple. I'm just offering a new model to replace an old one.

Lol, I'm 100% sure he's just joking around. Anyone who knows how to multiply could easily manipulate power levels onto a scale that would result in 1 or more of them being OVER 9000!!!!!!!!.

pockyD · Mar 30, 2007

well if you don't use a summation to go along with judging % use exclusively by power level, you won't get a total of 100% for all characters combined

instead of putting a constant in the denominator (70? 175?) you have to just sum everyone's effective power level (sum all 3^P over each character)

Mathematically Calculated Tier List. SECOND RESULTS IN! First post updated.

Smash Gizmo

Smash Journeyman

Smash Lord

Never Knows Best

Smash Champion

Smash Gizmo

Smash Lord

Smash Gizmo

Smash Lord

TB12 TB12 TB12

Smash Apprentice

Smash Gizmo

Smash Gizmo

Smash Apprentice

Smash Hero

Smash Cadet

Smash Ace

Smash Gizmo

Smash Cadet

Smash Gizmo

Smash Cadet

Smash Gizmo

Smash Cadet

Smash Gizmo

Smash Legend

Never Knows Best

Smash Gizmo

BRoomer

Smash Legend

Never Knows Best

Smash Master

Smash Legend

Smash Cadet

Smash Master

Smash Ace

Smash Cadet

Smash Legend

Smash Ace

Smash Gizmo

Smash Legend

Information

Network