Latest Posts

Topic: Rating system

trimard
Avatar
Topic Opener
Joined: 2009-03-05, 22:40
Posts: 230
Ranking
Widelands-Forum-Junkie
Location: Paris
Posted at: 2019-07-12, 19:23

aaaahh ok! I get it! Well my table is correct then face-smile.png


Top Quote
trimard
Avatar
Topic Opener
Joined: 2009-03-05, 22:40
Posts: 230
Ranking
Widelands-Forum-Junkie
Location: Paris
Posted at: 2019-07-15, 11:38

Ok, I've been thinking the last few days. Let's talk a bit about implementation and deadlines. Otherwise the idea will die soon.

First implementation stage

Entirely web based. Allowing us to test the rating system as well without asking too much coding time from the game dev.

Deadline: October fest. I will have only a few hours on sundays, but it should be manageable

A few options:

  • we remove "playtime scheduler" and put something like "let's play" --> index page with "current ratings" and rest of playtime buttons.

  • We add a new "rating" button somewhere on the top right

  • We add a new rating button in the main menu, because we're badasses

Rating page

4 subpages, accessible like in the profile page, on a small menu:

  • 1vs1 rating

  • 2vs2 rating

  • all mode rating

  • upload a game

ratings pages

One table with current rating of everyone from best to least best.

If you are ranked your name appears green and it gives your RD.

If you're not ranked a message "you are not yet ranked in this mode" on top of table.

Upload page

Or maybe change the name, we might not need people to upload the replay everytime?

A form:

  • map

  • type of game

  • player/s against whom you played.

  • player/s in your team

  • upload replay or replays if needed

  • Win? Tie? Lose?

  • exact start game time (the one shown in the first replay)

From map, type and other plays, automatically infer which type of game this is (1vs1, 2vs2, all modes)

--> send message to other player when game has been submitted.

Arbiter pages

He will need a special role on the website.

He sees one more button in the index page for moderation

Set map pool

Get 2 tables, 1vs1 and 2vs2 on which he can add/edit/remove lines for each map with:

  • exact map name in the game

  • Game mode

Arbiter games

List of last submitted game from newest to oldest:

  • submitted by

  • type of game

  • map

  • result

  • link to replay (and number of replays)

  • start time

Add red color when same game was submitted twice?

Options for each game:

  • Delete game from table --> send message to both players

  • punish a player (can't submit game for x hours)

  • send a message to both player

Second implementation stage

If we found our rating system works that's when we add all that has been discussed

Game options

  • surrender button

  • set game for latter

game info send to server after game is played

  • unique hash
  • info of both players their tribes rating at start of the game? (Harder to implement though?)
    • standard deviation, while we're at it
  • status (currently being played, in pause for reschedule, in pause for connection problem)
  • number of interuption by player:
    • player1
    • player2
    • playerN (if needed)
  • result for:
    • player1
    • player2
    • playerN (if needed)
  • Time: When started
  • Time: When ended
  • Time: Last update
  • How many parts of the game (more if the game was interrupted)
  • How long the game last:
    • Gametime(all the part added between themselves)
    • Real time
  • Substatus: win by surrender, win by manual arbiter, ...
  • Widelands version played
  • Map played
  • We'll certainly think of other things to put here in the meantime

automatic

  • Automatically add game result to current rating

  • If one player leaves game and doesn't play after 20 min --> he loses

  • Arbiter get all the infos from the game on a table as described before but now more complete.

  • Arbiter get general game info like best tribes, most played map, etc

Edit: I think I will try integrating the glicko system in python first. It will help me train integrating math for Einstein's calculation afterward

Edited: 2019-07-16, 00:43

Top Quote
WorldSavior
Avatar
Joined: 2016-10-15, 04:10
Posts: 2091
OS: Linux
Version: Recent tournament version
Ranking
One Elder of Players
Location: Germany
Posted at: 2019-07-16, 14:22

trimard wrote:

I'm impressed by your posts, you seem to do a lot for this.

  • we remove "playtime scheduler" and put something like "let's play" --> index page with "current ratings" and rest of playtime buttons.

Why removing the playtime scheduler? I'm rather against that.

  • We add a new "rating" button somewhere on the top right

Sounds like the best option to me.

Upload page

Or maybe change the name, we might not need people to upload the replay everytime?

I think that replays are more or less necessary, to prevent abuse...

Add red color when same game was submitted twice?

Isn't it easier to make it impossible that games get submitted twice?

trimard wrote:

I didn't want to redo the calculus, because I'm not as good as einstein for these kind of things. So I used this script. I didn't compact series of games together as is recommended in the glicko2 paper. I actually calculated each map 1 by 1. It's not yet clear to me how to do otherwise. Anyone knows btw?

Calculating each map 1 by 1 seems to make a lot of sense.

Constants used (recommended in the initial glicko2 paper):

  • Starting rating: 1500
  • Starting deviation: 350
  • player volatility: 0.06
  • Tau: 1.0 (completely arbitrary, cause I have no idea how to determine which value would best fit. It was by default in the script I used, so I sticked with it)

Result

Player rating deviation
worldsavior 2000.690 171.696
nemesis 26 1725.854 164.302
king of nowhere 1697.907 169.930
mars 1640.066 156.209
einstein13 1639.184 168.751
kaputtnik 1564.446 162.224
trimard 1462.397 167.161
tando 1426.643 156.455
Hasi50 1382.564 171.562
guncheloc 1331.528 213.339
animohim 1133.031 171.307
LAZA 1107.468 195.517

Looks expectable, good. Usually 6 games are not enough to bring a "stable" rating. The chess website Lichess.org considers ratings as "provisonal" (unstable) if the rating deviation is above 110, and the players get a question mark behind their rating in that case, and the ratings cannot appear in leaderboards because they are too uncertain.

einstein13 wrote:

WorldSavior wrote:

How the rating and the RD will change for each player?

As simple as possible: according to Glicko-2 system you calculate for team scores the gains and loses for the game. Then you add the results to all winners and subtract the loses from the opposite side. If you recalculate again (with new scores) the team ranks, they will be as expected: higher or lower by given values. This behaviour was proven in the document in point 4. d).

This sounds unfair to me. Should somebody who is ranked 1000 points below his teammate really get as many points as his teammate, even though he didn't have to do anything for the victory? face-wink.png

And another thing looks weird: A 50 RD guy and a 130 RD guy form a team with less than 50 RD? Shouldn't it be in between?

Yes, I was a bit surprised too, but the standard deviation is sometimes counter intuitive. Let's make an example. Take a wooden stick of length 1 meter. Then you pick something big, like stadium (football, baseball, whatever). Try to measure the size of this stadium by the stick. You will get some number (let it be 535 sticks) with quite high possible error. But you can measure the same thing again (new result would be 523 sticks). And if you collect many of those experiments with high standard deviation, you will be pretty sure that the stadium has length of 530 m with standard deviation less than 1 meter. That is the power of collecting many (independent!) data. Also that is why in our case we get less RD than initial RDs - the system is pretty sure that the new value is correct. I have experimented with second RD and I have found that if it is very high (f.e. 300), the result RD is higher than 50.

Okay, maybe you are right.


Wanted to save the world, then I got widetracked

Top Quote
einstein13
Avatar
Joined: 2013-07-29, 00:01
Posts: 1118
Ranking
One Elder of Players
Location: Poland
Posted at: 2019-07-23, 01:42

Hi,

The topic of creating the rank points model for any multiplayer games are finished for now.

http://wuatek.no-ip.org/~rak/widelands/docs/MultiplayerRatingSystem/MultiplayerRatingSystem_1.0.pdf

If you apply this to the Widelands, it will be great. If not, not a problem. face-smile.png

@WordlSavior

This sounds unfair to me. Should somebody who is ranked 1000 points below his teammate really get as many points as his teammate, even though he didn't have to do anything for the victory?

Yes, since they were a team. The rank points should show not only power of a player as an economist or military strategist in Widelands, but also a politician who can make friends around. If you don't want to play with a player 1000 pts below your rank, just don't do it. face-wink.png


einstein13
calculations & maps packages: http://wuatek.no-ip.org/~rak/widelands/
backup website files: http://kartezjusz.ddns.net/upload/widelands/

Top Quote
trimard
Avatar
Topic Opener
Joined: 2009-03-05, 22:40
Posts: 230
Ranking
Widelands-Forum-Junkie
Location: Paris
Posted at: 2019-07-23, 02:18

I'm impressed by your posts, you seem to do a lot for this.

Thanks! I like this project, but I'm still afraid I will lack time for all this

Why removing the playtime scheduler? I'm rather against that.

I meant merging the two on the same page, we need to limit the number of link on the front page. For starter, I used your idea though

Isn't it easier to make it impossible that games get submitted twice?

By different players, if there is a time factor, like only one part sent by one player.

Calculating each map 1 by 1 seems to make a lot of sense.

No, I meant about the time factor. See below.

Looks expectable, good. Usually 6 games are not enough to bring a "stable" rating. The chess website Lichess.org considers ratings as "provisonal" (unstable) if the rating deviation is above 110, and the players get a question mark behind their rating in that case, and the ratings cannot appear in leaderboards because they are too uncertain.

I really like this idea

This sounds unfair to me. Should somebody who is ranked 1000 points below his teammate really get as many points as his teammate, even though he didn't have to do anything for the victory? face-wink.png

Yes, at some point, to create any rating at all, we have to make the assumption that the teamate participated as much. But yes, it's an assumption that isn't really true. That's why in some games they prefer to do a fixed team in 2vs2. It's more logical to put a rating for a certain team than on someone from that team. It's not the player who wins, it's the team.

The topic of creating the rank points model for any multiplayer games are finished for now.

http://wuatek.no-ip.org/~rak/widelands/docs/MultiplayerRatingSystem/MultiplayerRatingSystem_1.0.pdf

Very nice!

TBH, I don't yet grasp the whole exactitude of your calculus, although it's better now that I studied a bit the glicko paper.

What is really cool, is that you included some examples. We'll be able to use those to test if the implementation works well too face-smile.png

If you apply this to the Widelands, it will be great. If not, not a problem.

It doesn't seem like much more work than implementing glicko and it's a nice feature to test. Why shouldn't we apply it?

About the constants

Ok I think I see few starting options to implement for the different constant from glicko.

  • Starting rating score: 1500 (don't see a reason to change)

  • Starting deviation: 200 (don't see a reason to change)

  • starting volatility: 0.060 . Seems a bit tough to change, but we should test other values when we have the ocasion. Skill don't change the same way with time in different games. For example, I think fast paced game like couter strike have a skill that changes a lot if you don't play for a few weeks, but slower game like civilization have a skill that decrease more slowly. Only my opinion here.

  • change of volality, tau: 1.0. Same remark as previously.

  • time between score calculation or cycle: after each game (not the default Glicko implementation, not sure how to do it)? after each day? After a week? After a month? It seems we should evaluate how many games are played by week in average and take so that they have 5 to 15 games between each "cycle"


Top Quote
einstein13
Avatar
Joined: 2013-07-29, 00:01
Posts: 1118
Ranking
One Elder of Players
Location: Poland
Posted at: 2019-07-23, 02:46

trimard wrote:

  • time between score calculation or cycle: after each game (not the default Glicko implementation, not sure how to do it)? after each day? After a week? After a month? It seems we should evaluate how many games are played by week in average and take so that they have 5 to 15 games between each "cycle"

I have to add something to my model: it is working outside the rating system calculations. I haven't gone deep into Glicko system, rather assumed that it is implemented into each battle alone. Maybe that was wrong assumption and my model will not work at all with Glicko?

My guess is that if we implement Glicko after each game, it will be almost like Elo one. But it is only a guess - I really don't know Glicko, only main assumptions.

Another thing is that there will be some players (f.e. me) who will play rank game not very often, even once a year. And some players who will play every week (~50 times more often). Is Glicko handling with that well?


einstein13
calculations & maps packages: http://wuatek.no-ip.org/~rak/widelands/
backup website files: http://kartezjusz.ddns.net/upload/widelands/

Top Quote
trimard
Avatar
Topic Opener
Joined: 2009-03-05, 22:40
Posts: 230
Ranking
Widelands-Forum-Junkie
Location: Paris
Posted at: 2019-07-23, 12:27

I haven't gone deep into Glicko system, rather assumed that it is implemented into each battle alone. Maybe that was wrong assumption and my model will not work at all with Glicko?

No no you can keep this calculus after each game, that's not on that part the the number of games plays a role. Well from what I understand.

My guess is that if we implement Glicko after each game, it will be almost like Elo one. But it is only a guess - I really don't know Glicko, only main assumptions.

Yes and no. Glicko just isn't made for that but we could adapt it. And it would sitll be different than ELO because the rating deviation would still change with time.

Another thing is that there will be some players (f.e. me) who will play rank game not very often, even once a year. And some players who will play every week (~50 times more often). Is Glicko handling with that well?

For the player that play rarely, no rating system will be efficient anyway. Glicko will make just his standard deviation is high though.

for the player that play a huge lot, no problem. Just that he won't be able to see his score change just after his game. He would have to wait the next day or the end of the week depending of when we decide to have the cycle going


Top Quote
WorldSavior
Avatar
Joined: 2016-10-15, 04:10
Posts: 2091
OS: Linux
Version: Recent tournament version
Ranking
One Elder of Players
Location: Germany
Posted at: 2019-07-25, 19:42

trimard wrote:

I haven't gone deep into Glicko system, rather assumed that it is implemented into each battle alone. Maybe that was wrong assumption and my model will not work at all with Glicko?

No no you can keep this calculus after each game, that's not on that part the the number of games plays a role. Well from what I understand.

My guess is that if we implement Glicko after each game, it will be almost like Elo one. But it is only a guess - I really don't know Glicko, only main assumptions.

Yes and no. Glicko just isn't made for that but we could adapt it. And it would sitll be different than ELO because the rating deviation would still change with time.

Another thing is that there will be some players (f.e. me) who will play rank game not very often, even once a year. And some players who will play every week (~50 times more often). Is Glicko handling with that well?

For the player that play rarely, no rating system will be efficient anyway. Glicko will make just his standard deviation is high though.

I don't know how fast the deviation grows during inactivity, but my practial experience says that it's a really slow growth...

for the player that play a huge lot, no problem. Just that he won't be able to see his score change just after his game. He would have to wait the next day or the end of the week depending of when we decide to have the cycle going

The Cycle could also be 1 game, maybe that's easier, I don't know

einstein13 wrote:

trimard wrote:

  • time between score calculation or cycle: after each game (not the default Glicko implementation, not sure how to do it)? after each day? After a week? After a month? It seems we should evaluate how many games are played by week in average and take so that they have 5 to 15 games between each "cycle"

I have to add something to my model: it is working outside the rating system calculations. I haven't gone deep into Glicko system, rather assumed that it is implemented into each battle alone. Maybe that was wrong assumption and my model will not work at all with Glicko?

No, it's not a wrong assumption, Glicko can work like this

trimard wrote:

I'm impressed by your posts, you seem to do a lot for this.

Thanks! I like this project, but I'm still afraid I will lack time for all this

In that case, maybe one or two possible simplications could be useful, for example:

1) Instead of doing some webdesign, just use a forum thread first

2) Instead of handling the Glicko system, make first of all only a rating list for 1-player-collectors-matches, just sorted by the collectors points. (For example the challenge could be to play collectors alone on a chosen map and to gain as much points as possible)

  • time between score calculation or cycle: after each game (not the default Glicko implementation, not sure how to do it)? after each day? After a week? After a month? It seems we should evaluate how many games are played by week in average and take so that they have 5 to 15 games between each "cycle"

After each game could be possible, but if other cases are easier to calculate, you could do it like that.

einstein13 wrote:

@WordlSavior

This sounds unfair to me. Should somebody who is ranked 1000 points below his teammate really get as many points as his teammate, even though he didn't have to do anything for the victory?

Yes, since they were a team. The rank points should show not only power of a player as an economist or military strategist in Widelands, but also a politician who can make friends around. If you don't want to play with a player 1000 pts below your rank, just don't do it. face-wink.png

I don't really agree, but it's no big problem.

By the way, are 1vs2 matches allowed? There, the player without teammate could be considered as in a team with one of your so called "null players".


Wanted to save the world, then I got widetracked

Top Quote
einstein13
Avatar
Joined: 2013-07-29, 00:01
Posts: 1118
Ranking
One Elder of Players
Location: Poland
Posted at: 2019-07-25, 22:12

WorldSavior wrote:

The Cycle could also be 1 game, maybe that's easier, I don't know

Yesterday I had some time to look into equations on Glicko (from Wikipedia page) and Elo too. I was surprised, but Elo was designed to handle multiple matches in a cycle, not only one. But part of the ranks are updated every match and the equation for that is very simple. Unfortunately if you pick Glicko-2 and try to get one "simple" equation, it is not a simple one in any of case. It is much better to calculate several values before calculating the main rank change.

My initial model will work with one match update only, but now I am working on expanding it also for multiple matches in a cycle. It is doable. face-smile.png

In that case, maybe one or two possible simplications could be useful, for example:

1) Instead of doing some webdesign, just use a forum thread first

That is a good idea. No coding included, some work on handling the thread, and you can see if Widelanders are interested in rank games.

2) Instead of handling the Glicko system, make first of all only a rating list for 1-player-collectors-matches, just sorted by the collectors points. (For example the challenge could be to play collectors alone on a chosen map and to gain as much points as possible)

Another economy tournament? face-smile.png

By the way, are 1vs2 matches allowed? There, the player without teammate could be considered as in a team with one of your so called "null players".

In my system they are allowed. And if you are a "1", then your effective score is lowered by about 120.4 points, because "null player" is with you. And of course after your win or lose, null player is not gaining or losing points. It is all to you face-smile.png .

So if you don't want to share the prize and you want to gain a bit more, you can play alone.


EDIT:
I have finished updating my model. Now in the paper (version 1.1) you can find two possibilities of applying it, plus example of usage of the model (last step).

Edited: 2019-07-26, 01:21

einstein13
calculations & maps packages: http://wuatek.no-ip.org/~rak/widelands/
backup website files: http://kartezjusz.ddns.net/upload/widelands/

Top Quote
trimard
Avatar
Topic Opener
Joined: 2009-03-05, 22:40
Posts: 230
Ranking
Widelands-Forum-Junkie
Location: Paris
Posted at: 2019-07-26, 10:44

I don't know how fast the deviation grows during inactivity, but my practial experience says that it's a really slow growth...

Except that totally depends on the values we set for the constants. If we feel like it should move quicker we can test some new values for tau and sigma.

Unfortunately if you pick Glicko-2 and try to get one "simple" equation, it is not a simple one in any of case. It is much better to calculate several values before calculating the main rank change.

Yes definitely. But it still works with one value! The important aspect is that we cannot really make "after each game", otherwise that would mean inventing a new ranking system. Which is an option though :P.

In that case, maybe one or two possible simplications could be useful, for example:

1) Instead of doing some webdesign, just use a forum thread first

2) Instead of handling the Glicko system, make first of all only a rating list for 1-player-collectors-matches, just sorted by the collectors points. (For example the challenge could be to play collectors alone on a chosen map and to gain as much points as possible)

The point isn't only to test if people are interested in the system but more to motivate people that don't play a lot to play too. I wouldn't be motivated by a post on a forum. More so by a constant automatic rating.

Moreover, those are still good ideas, but I think king of nowhere is the most experienced at organizing tournaments in that format. And I don't want to replace the tournaments. I want to build something on the side of it.

And finally, I already started coding for the thing face-tongue.png I'm focusing on having a working arbiter page, where the arbiter can add as many game as he likes, and we automatically calculate the score from that. It's the stage 0, I didn't mention in the big post....


Top Quote