Posted at: 2023-08-19, 22:14
As I discussed with Nordfriese, I will write down my idea for online widelands AI training.
The concept is that every game that is played against an AI will improve the AI a bit. The "normal" AI must become as strong as possible to be a challenge for humans. This will involve two parts: a) some extra logic (solver?) to micro-manage scarcity situations and b) improvement of the training weights. Once the AI is perfect, it makes sense for beginners to choose the "weaker" versions.
To improve the training, I suggest the following algorithm:
The game ships with 4 standard AIs for offline users. As soon as an internet connection is available, the game downloads the best available AIs from a server. This is fine according to GDPR because AIs are not considered "personal data".
Every AI has a hashsum to identify it over the network. Every game with an AI must measure the AIs performance but should also mutate it. This is a conflict: Either we mutate it, but then it gets a different hashsum. Or we evaluate it, but then evolution does not step forward. So my suggestion is, to both at the same time: Everytime an AI player is spawned, two random AI are picked from the hard disk (probability weighted by its performance characteristics) and then mutated. The resulting performance metrics at the end of the game of the AI are used for both: evaluating the AI and it's parents. This way, we collect enough performance data for the AIs. Then, the mutated AI (also containing the parents hash) is uploaded to the server together with its performance metrics.
Why two parents and not just one? Recombination of genes! When a good mutation is on one branch, it can be transferred to another branch by chance.
In order to improve the AI, we have to collect some performance metrics. The following factors should be weighted in:
- did he win or lose
- how steep did the territory statistic climb on average
- how steep did the military statistic climb on average
The "climb" metrics should be both maxed over all 15min slices of the game (best climb rate during the game), as well as averaged over the game (did he get stuck or did he continuously advance)
AI upload and mutation
- The AI dataset has to be extended by a field that tells the hashsum of its parents (2 parents)
- The AI dataset itself must be hashed
- When creating a new AI (because an AI player joins), two random (weighted randomness) parents are picked
- The two parents are merged (for every neuron roll a dice whether you take the mothers or fathers value)
- Do mutation on the resulting child
- play the game
- Calculate the performance metrics
- report the AI to the server + metrics and store to harddisk
- on server and hdd: metric = old_metric * 0.8 + new_metric * 0.2
- for every parent: metric = old_metric * 0.9 + new_child_metric * 0.1 (so parents are weighted according to the performance of their children)
- (also add the metric to the grand parents??)
The server will get all AIs and all AI metrics reported. Now it has to sort out:
- keep the best 100 AIs in the pool (AIs with lowest score are purged)
- whenever a widelands instance gets online, it will download the current 10 best AIs in the pool
- whenever a widelands game is played, the server will update the family tree performance of its AIs
The weighted probability algorithm works as follows:
- build a list of all available AIs (fields: hashsum, score)
- calculate the sum of all scores of all AIs
- draw a random number between 0 and sum(score)-1
- int random = randomInt(0, scoresum);
- for (int i = 0; random > list(i).score; i++) random -= list(i).score;
- There will be constant AI recombination and training
- Strong AIs will be shared over network
- Every single game will improve the AI gene pool
- On our server, we will keep only 100 AIs alive
- On each HDD of each player, there will be one new AI per AI-player per game (which is around the same amount as the recordings)
Edited: 2023-08-19, 22:18