Quantify skill with math

2»

Comments

  • iNv|mikEy2kiNv|mikEy2k Posts: 96Player
    You could do what ESEA does and rank players by their ADR (Average Damage per Round).
  • SacchoSaccho Posts: 1,577Player
    I thought ESEA used RWS, round win share. Did they change that?

    https://play.esea.net/index.php?s=support&d=faq&id=212
  • iNv|mikEy2kiNv|mikEy2k Posts: 96Player
    edited March 2016
    They do, I meant RWS. I've been concentrating on raising my ADR. But I do believe that ADR is calculated in the RWS if im not mistaken.
  • iNv|eKCommiNv|eKComm Posts: 394Player
    So i got a basic elo algorithm up and running.

    Now lets talk about applying this to a team game.

    Say the following two teams play one another and have the following elo ratings.
    Team 1 :
    Player 1 - 1500
    player 2 - 1560
    player 3 - 1420
    player 4 - 1550

    Team 2 :
    Player 5 - 1600
    player 6 - 1400
    player 7 - 1550
    player 8 - 1540

    I was thinking that you can first average the team rating getting 1507.5 and 1522.5 respectively. Those are the two scores that i would put though the elo algorithm. So if team 1 wins lets just say for example that they elo alg returns 1547.5 (a 40 point increase).

    I was thinking i could do one of two things.

    1) take the difference and apply it to each player based on how far their personal elo rating is from the average elo of the losing team. So if you're on the winning team but your personal rating is high above the losing team you will get less of that 40 points than players on your team that are below the other teams average rating.

    For example, player 2 has the highest rating on team 1. So since he is higher above the other teams average rating he may only get 20 points but since player 3 is below the average maybe he gets th efull 40 (not 100% sure how i would get the exact amount mathematically yet but I'm just spit-balling here)

    2) take the difference and apply it to each player based on their match accolades. For simplicity sake lets say the more kills you got the higher the percentage of the 40 points you get.

    Thoughts?
    2016 Flank Gaming Network Season 1 Champion
    2016 FraggedNation Season 4 Champion
    2015 FraggedNation Season 2 Main Champion
    2015 ESL Test Cup Second place
    2014 FraggedNation UMM Tournament Champion
    2014 FraggedNation Old School map Tournament Champion
    2014 TWL Season 1 Second Place
    2013 TWL 7v7 Beta Tournament Champion
    2013 TWL 5v5 Beta Tournament Second Place
  • SacchoSaccho Posts: 1,577Player
    I'd advise against Elo since the system was designed for 1v1 and doesn't have a clear update scheme for team games. But moving on.

    Averaging each team's rating to determine quality of match in this scheme likely isn't the best way to go. An alternative is to do a comparison for each 1v1 between teams (A1 v B1, A1 v B2, A1 v B3, ... An v Bn) and average *that* result.
    For example, in a {1200, 1200, 1200} v {800, 800, 2000} 3v3, simple averaging would say Ea = Eb = 0.5. The handshake method would say Eb is only 0.39 and the {1200, 1200, 1200} team should be favored to win.


    What I've seen one system do (Lehman Rating in Bridge) for assigning win share update is with proportionality based on player strength. Part of the logic is that stronger players are more responsible for the win and deserve more credit. Consider a {10, 10, 3000} vs {1000, 1000, 1000}. If the first team wins, it likely says more about the strong player's ability to carry than the 10-rated players contributing more than expected.

    In your modified Elo, this could be implemented as
    R1' = R1 + K (Sa - Ea) * (R1 / Ra)
    where R1 is the player's original rating, Ra is the team's rating, K is the Elo K-factor, and Sa and Ea are
    the final and expected team outcomes.


    You could also just update each player's personal Elo by
    R1' = R1 + K (Sa - Ea)
    and forget any assignment of per-player performance. I don't think the system would be losing much with that approach.


    I think you're liking to inflict more harm than good by trying to incorporate something like kills into the algorithm.
  • iNv|eKCommiNv|eKComm Posts: 394Player
    I like that. I guess my fear was once a player is so high it will be easier for them to shoot up faster creating a sort of exponential increase in potential gain for winning. But i guess thats limited by the K-factor. So it may not be a big deal. All in all i like this. I'm going to try and apply it to what I have and see how it sims
    2016 Flank Gaming Network Season 1 Champion
    2016 FraggedNation Season 4 Champion
    2015 FraggedNation Season 2 Main Champion
    2015 ESL Test Cup Second place
    2014 FraggedNation UMM Tournament Champion
    2014 FraggedNation Old School map Tournament Champion
    2014 TWL Season 1 Second Place
    2013 TWL 7v7 Beta Tournament Champion
    2013 TWL 5v5 Beta Tournament Second Place
  • SacchoSaccho Posts: 1,577Player
    iNv|eKComm wrote: »
    I like that. I guess my fear was once a player is so high it will be easier for them to shoot up faster creating a sort of exponential increase in potential gain for winning. But i guess thats limited by the K-factor. So it may not be a big deal. All in all i like this. I'm going to try and apply it to what I have and see how it sims

    If a player is *so* good that they can basically carry in a 1vN across many of their matches, the algorithm will struggle. The bigger concern there is actually that your matchmaking pool doesn't have a good opponent to place them against. You could do the system you proposed, where the update rule is
    R1' = R1 + K (Sa - E1)
    and E1 is calculated from player Elo vs enemy team Elo, but now you also have the problem where all the weakest players on the team get huge jumps in their ratings because they were heavily expected to lose.

    The Bayesian systems handle that by decreasing uncertainty without increasing skill rating much -- "you've proven you're definitely better, but not if you're a lot better or a little better". Elo-like systems just raise the skill rating and hope that the higher win expectation balances out the points gained.

    The K-factor is more about score volatility than setting a cap. Some systems do adjust K-factor individually over time, lowering it as players accrue games or are believed to be closer to their "true" rating, but it's done more heuristically (eg, K=40 if you've played less than 25 matches, K=20 if your Elo < 2000, K = 10 if your Elo > 2000). This is a weaker version of the player sigma value in the Bayesian approach, where mu represents average player skill and sigma represents variation or uncertainty in that rating.
  • iNv|eKCommiNv|eKComm Posts: 394Player
    I don't yet have the insight into Elo to really respond to every point. I get the gist of what you're saying, though I'm still leaning toward

    R1' = R1 + K (Sa - E1) where E1 is the player Elo vs enemy team Elo. It just makes the most sense to me. But let me address your concern about weak players potentially gaining huge jumps on rating if they win. This is why i wanted to take the Elo rating delta and modify it further based on performance. Lets leave exactly what performance means for a bit later in the discussion. But ideally if i got the performance part right then players who would score the best would be the players who were not expected to do well and also played very well. Isn't that what we want?

    My intention wouldn't be to increase the Elo rating delta beyond the k-factor (i know im still using it as an upper bound, its just how I understand it still) I would only apply percentage of decreases.

    Basically, I want elo to provide me a base increase or decrease in rating. But then I want performance to be able to mitigate that rating in cases where a bad player gets carried or a good player does poorly.

    I know you have experience here, this conversation has been extremely helpful for me so far.
    2016 Flank Gaming Network Season 1 Champion
    2016 FraggedNation Season 4 Champion
    2015 FraggedNation Season 2 Main Champion
    2015 ESL Test Cup Second place
    2014 FraggedNation UMM Tournament Champion
    2014 FraggedNation Old School map Tournament Champion
    2014 TWL Season 1 Second Place
    2013 TWL 7v7 Beta Tournament Champion
    2013 TWL 5v5 Beta Tournament Second Place
  • SacchoSaccho Posts: 1,577Player
    edited March 2016
    K-factor is an upper bound, but the only time anybody'll actually be hitting that upper bound is in a major upset (expected score ~0, final score 1). Most of the time you'll hopefully have expected score ~0.5 (even teams) so you generally expect to be incrementing by around 1/2 K. That's where how you choose K informs, essentially, how much your system remembers the past. If your K is huge, your last few games have a huge impact on your score and it's like it forgets the past more quickly. With a small K, an unlucky streak can have less impact. That's the motivation behind systems altering K based on number of games played and so on. Quickly find a player's true skill and then reduce the impact of luck on rating volatility.

    The big challenge with trying to skew the ratings based on "performance" (however you choose to define it), especially for a team game, is in the intangibles. A great in-game leader might coach every team he plays with to victory, but a kills-based metric might only see his own "poor" results. That's why most of these systems only care about the final "win or lose?" outcome. It's also got the advantage of being simpler to handle. Elo was developed for Chess, but it doesn't try to count how many pieces up the winner is or anything like that because it's irrelevant. I could win down on material. I shouldn't get more ratings points for devastating an opponent by slowly eating the rest of his board after I'm already dominant and able to mate (bad sportsmanship if nothing else).

    I mentioned the Ghost Recon: Phantoms approach to team balance earlier. Their system actually did try to look at all of the more intangible elements of player behavior to try to make "fun" matches instead of "even" matches, considering whether players were campy snipers or rushy assaulters as well as things like K/D stats. They would survey players on how fun each match was on a 1-5 scale and engineered the system to maximize that score, not how even the results were. It didn't matter if one team had better players if the compositions meant everyone enjoyed themselves anyway. It's a really cool twist on how matchmaking and rating's normally been done.
Sign In or Register to comment.