# Topic: Rating system

Posted at: 2019-07-08, 04:08

einstein13 wrote:

Hey!

I was able to make first attempt to 2 vs 2 game problem with small calculations for 1 vs 2 problem too. Everything you can find in this file:
http://wuatek.no-ip.org/~rak/widelands/docs/MultiplayerRatingSystem/MultiplayerRatingSystem_0.1.pdf
(also available on my site)

sorry to be contrary here, but I question the premises of your mathematical model. while i do believe that the mathematics is sound, i think the model would need experimental validation. You can't assign the value of how much difference is so much that the weaker player is virtually negligible, because that is different for every game, and cannot be calculated a priori.

I think it boils down to how much difference is enough to win 1v2, and how much damage a weaker player can inflict to the stonger one. in chess it is very easy to force a 1-for-1 exchange, and that's why a world champion would never be able to 1v2 even mediocre club players. in widelands, hero soldiers can face a lot of weaker enemies with no losses, so it's much easier to face multiple weak enemies. the difficulty is also dependant on the map (on a small map, no time to upgrade soldiers, fight of untrained soldiers mean weaker players can still deal lots of damage. on a large map, weaker players would have time to upgrade soldiers. on medium map it's easier for stronger player).

ultimately, I don't think the question of 2v2 can be tackled without some hard experimental data to set boundaries for a mathematical model.

Posted at: 2019-07-08, 08:05

einstein13 wrote:

What about the map the players took? I think that it is not needed for ratings, but it can be useful to see which maps are used the most and what positions are better than others. Just an idea .

Although I am not very keen to be ranked in any ranking I have no objections against it. Only thing that would really be interesting is getting some statisitcs from the system about used maps, used tribes, and so on. we could use them for determining or prove balance and other things of the tribes.

Posted at: 2019-07-08, 11:24

king_of_nowhere wrote:

sorry to be contrary here, but I question the premises of your mathematical model. while i do believe that the mathematics is sound, i think the model would need experimental validation. (...)

I have to say: "You're 100% right!". And it is not a contrary here, but rather complement. I agree that we need experimental situations and see if any model works, but from our (widelands) experience, we need a pretty good model to start.

Remember trees growth model? It was introduced and mathematically well designed, but it didn't work in the game. We (you) have changed some values, but the major model stayed. Now the trees are growing OK. I think that here is a similar situation: somebody proposed a change, addition to the game, somebody else is trying to solve the problems that occurs with it. If it doesn't work, we can stick to any model Widelands needs. My model doesn't affect Glicko or Elo, it is only solving the problem of more than 2 players at once. And I am trying to get the proper values of that. If it will not work, we can just change the model OR prohibit games other than 1 vs 1 for ranking.

einstein13
calculations & maps packages: http://wuatek.no-ip.org/~rak/widelands/

Posted at: 2019-07-09, 01:51

Today I was able to expand a model a bit: now it covers calculating R and RD for all games with 2 teams only. New file available on my site:
http://wuatek.no-ip.org/~rak/widelands/docs/MultiplayerRatingSystem/MultiplayerRatingSystem_0.2.pdf

I know that some of you can be sceptic about it, but hey, if it is possible, we can try it and then decide if it is OK or not.

Next step is to think about three or more teams... Is it possible to make a model for that?

einstein13
calculations & maps packages: http://wuatek.no-ip.org/~rak/widelands/

Posted at: 2019-07-10, 00:14

einstein13 wrote:

Today I was able to expand a model a bit: now it covers calculating R and RD for all games with 2 teams only. New file available on my site:

http://wuatek.no-ip.org/~rak/widelands/docs/MultiplayerRatingSystem/MultiplayerRatingSystem_0.2.pdf

Quite impressive, but is it just me - or is the most important information missing: How the rating and the RD will change for each player?

And another thing looks weird: A 50 RD guy and a 130 RD guy form a team with less than 50 RD? Shouldn't it be in between?

I know that some of you can be sceptic about it, but hey, if it is possible, we can try it and then decide if it is OK or not.

Next step is to think about three or more teams... Is it possible to make a model for that?

No, why should it be?

einstein13 wrote: Now the trees are growing OK.

That's questionable

trimard wrote:

So at least makes a bottom of 600 if that's seem fair in other games!

Yes, why not. This doesn't have to be the limit forever.

I think end of the tournament would make a good starting data table btw!

At the other hand it's not so good to declare games as rated after they have been played, especially if not every player agrees on get a ranking.

Huh, ok I'll try in the next few days!

Nice

BoeseKaiser wrote:

On the other hand, it is possible to win 1v2 against decent opponents, while in chess that would be unthinkable even in the case of the world champion against regular club players.

How would such a chess match look like? 16 pieces vs 32, on which kind of board, which moving rules?

https://en.wikipedia.org/wiki/Three-player_chess

Ah, thanks

I played that a couple of times but I find it rather uninteresting. I'd like seeing a GM (preferably Eric Hansen) play against 2 random opponents though.

If the opponents are a team but not completely incompetent they might exchange most pieces and leave the GM no chance.

Wanted to save the world, then I got widetracked

Posted at: 2019-07-10, 01:06

WorldSavior wrote:

How the rating and the RD will change for each player?

As simple as possible: according to Glicko-2 system you calculate for team scores the gains and loses for the game. Then you add the results to all winners and subtract the loses from the opposite side. If you recalculate again (with new scores) the team ranks, they will be as expected: higher or lower by given values. This behaviour was proven in the document in point 4. d).

And another thing looks weird: A 50 RD guy and a 130 RD guy form a team with less than 50 RD? Shouldn't it be in between?

Yes, I was a bit surprised too, but the standard deviation is sometimes counter intuitive. Let's make an example. Take a wooden stick of length 1 meter. Then you pick something big, like stadium (football, baseball, whatever). Try to measure the size of this stadium by the stick. You will get some number (let it be 535 sticks) with quite high possible error. But you can measure the same thing again (new result would be 523 sticks). And if you collect many of those experiments with high standard deviation, you will be pretty sure that the stadium has length of 530 m with standard deviation less than 1 meter. That is the power of collecting many (independent!) data. Also that is why in our case we get less RD than initial RDs - the system is pretty sure that the new value is correct. I have experimented with second RD and I have found that if it is very high (f.e. 300), the result RD is higher than 50.

einstein13
calculations & maps packages: http://wuatek.no-ip.org/~rak/widelands/

Posted at: 2019-07-12, 17:24

## Glicko test in 1vs1

I used the data from the 2017 tournament

I don't think it's really necessary to ask for permission, because the result are already available to everyone and it's only for test. It's not the data that will actually be used for the rating system.

### Precedure

I didn't want to redo the calculus, because I'm not as good as einstein for these kind of things. So I used this script. I didn't compact series of games together as is recommended in the glicko2 paper. I actually calculated each map 1 by 1. It's not yet clear to me how to do otherwise. Anyone knows btw?

• Some players didn't play every match (they forfeited) --> no problem just don't change the rating at that time
• Missing data for round 4, 5 and 6 and had to deduce the score of each map from the general score --> minor nuisance, but maybe I made some mistakes
• Didn't remember how the last matches were played between kind of nowhere and nemesis so didn't included it in the dataset. Might explain the differences in the results

Constants used (recommended in the initial glicko2 paper):

• Starting rating: 1500
• Starting deviation: 350
• player volatility: 0.06
• Tau: 1.0 (completely arbitrary, cause I have no idea how to determine which value would best fit. It was by default in the script I used, so I sticked with it)

### Result

Player rating deviation
worldsavior 2000.690 171.696
nemesis 26 1725.854 164.302
king of nowhere 1697.907 169.930
mars 1640.066 156.209
einstein13 1639.184 168.751
kaputtnik 1564.446 162.224
trimard 1462.397 167.161
tando 1426.643 156.455
Hasi50 1382.564 171.562
guncheloc 1331.528 213.339
animohim 1133.031 171.307
LAZA 1107.468 195.517

## Non 1vs1 game

Einstein13

I was able to make first attempt to 2 vs 2 game problem with small calculations for 1 vs 2 problem too. Everything you can find in this file: http://wuatek.no-ip.org/~rak/widelands/docs/MultiplayerRatingSystem/MultiplayerRatingSystem_0.1.pdf (also available on my site)

I don't know what to say, so happy you were able to do that. I really want to test these equations. You're totally right, it will easily be done by a computer!

I agree with your whole reasoning, though I haven't done math since so long, I can't comment your equations. Yes it's exponential and not linear, that's for sure.

Today I was able to expand a model a bit: now it covers calculating R and RD for all games with 2 teams only. New file available on my site: http://wuatek.no-ip.org/~rak/widelands/docs/MultiplayerRatingSystem/MultiplayerRatingSystem_0.2.pdf

I'm so hyped to test these too

Kind of Nowhere

i think the model would need experimental validation. You can't assign the value of how much difference is so much that the weaker player is virtually negligible, because that is different for every game, and cannot be calculated a priori.

Yes totally, we need A LOT of test. But the problem is. Currently, we have no data to test. And we have no data to test, because people don't play and then report their results (except during tournament). So integrating this system, even if using "false" assumptions, will give us enough data to make better equations. It's a first stage. And it's good to have some equations to help for this first stage.

## Storing datas

What about the map the players took?

Yes, and yes hessenfarmer we totally should use these data for balance discussions. That would be super useful!

Edited: 2019-07-12, 19:58

Posted at: 2019-07-12, 18:26

trimard wrote:

## Glicko test in 1vs1

I used the data from the 2017 tournament

I don't think it's really necessary to ask for permission, because the result are already available to everyone and it's only for test. It's not the data that will actually be used for the rating system.

### Precedure

I didn't want to redo the calculus, because I'm not as good as einstein for these kind of things. So I used this script. I didn't compact series of games together as is recommended in the glicko2 paper. I actually calculated each map 1 by 1. It's not yet clear to me how to do otherwise. Anyone knows btw?

• Some players didn't play every match (they forfeited) --> no problem just don't change the rating at that time
• Missing data for round 4, 5 and 6 and had to deduce the score of each map from the general score --> minor nuisance, but maybe I made some mistakes
• Didn't remember how the last matches were played between kind of nowhere and nemesis so didn't included it in the dataset. Might explain the differences in the results

there are no missing data. probably you only looked at the pairings and rankings, not all of which are present in the tournament thread.

the table shows, for every player, all his opponents and results, and it is complete: https://www.widelands.org/forum/topic/2912/?page=1#post-21254

Posted at: 2019-07-12, 18:54

the table shows, for every player, all his opponents and results, and it is complete: https://www.widelands.org/forum/topic/2912/?page=1#post-21254

Damn didn't scroll till that point, I knew you had a more detailed table! I'll check back latter. Do you have an idea about the added matches?

• In round 5 nemesis26, einstein13 and mars played 2 more matches?

• In round 6 you played against Nemesis?

Really easy to integrate, but I prefer to be sure when these matches were played.

Posted at: 2019-07-12, 19:13

trimard wrote:

• In round 5 nemesis26, einstein13 and mars played 2 more matches?

No. the 1/1, 1/2, 0/1 are the directmatch tiebreak. It means that when two or more players have equal score and bucholz, the one who won a direct game (if one was played) is first in the ranking. after round 5 there were 3 people at equal score and buchholz, so i looked at all their matches. einstein had played against both mars and nemesis, won one game and lost one, so i gave him 1 out of two in direct match. mars had only played against einstein by that point, and he lost, so he got 0/1. nemesis had defeated einstein, so he got 1/1.

but no additional matches were played.

• In round 6 you played against Nemesis?

no, i played against worldsavior in round 6. the table shows that I faced nemesis in round 1