Topic: Suggested algorithm for continuous crowdourced AI enhancement

carli2
Avatar
Topic Opener
Joined: 2023-08-17, 20:13 UTC+2.0
Posts: 26
OS: Linux
Version: git
Ranking
Pry about Widelands
Posted at: 2023-08-19, 23:14 UTC+2.0

As I discussed with Nordfriese, I will write down my idea for online widelands AI training.

The concept is that every game that is played against an AI will improve the AI a bit. The "normal" AI must become as strong as possible to be a challenge for humans. This will involve two parts: a) some extra logic (solver?) to micro-manage scarcity situations and b) improvement of the training weights. Once the AI is perfect, it makes sense for beginners to choose the "weaker" versions.

To improve the training, I suggest the following algorithm:

The game ships with 4 standard AIs for offline users. As soon as an internet connection is available, the game downloads the best available AIs from a server. This is fine according to GDPR because AIs are not considered "personal data".

Every AI has a hashsum to identify it over the network. Every game with an AI must measure the AIs performance but should also mutate it. This is a conflict: Either we mutate it, but then it gets a different hashsum. Or we evaluate it, but then evolution does not step forward. So my suggestion is, to both at the same time: Everytime an AI player is spawned, two random AI are picked from the hard disk (probability weighted by its performance characteristics) and then mutated. The resulting performance metrics at the end of the game of the AI are used for both: evaluating the AI and it's parents. This way, we collect enough performance data for the AIs. Then, the mutated AI (also containing the parents hash) is uploaded to the server together with its performance metrics.

Why two parents and not just one? Recombination of genes! When a good mutation is on one branch, it can be transferred to another branch by chance.

Performance metrics
In order to improve the AI, we have to collect some performance metrics. The following factors should be weighted in:
- did he win or lose
- how steep did the territory statistic climb on average
- how steep did the military statistic climb on average

The "climb" metrics should be both maxed over all 15min slices of the game (best climb rate during the game), as well as averaged over the game (did he get stuck or did he continuously advance)

AI upload and mutation
- The AI dataset has to be extended by a field that tells the hashsum of its parents (2 parents)
- The AI dataset itself must be hashed
- When creating a new AI (because an AI player joins), two random (weighted randomness) parents are picked
- The two parents are merged (for every neuron roll a dice whether you take the mothers or fathers value)
- Do mutation on the resulting child
- play the game
- Calculate the performance metrics
- report the AI to the server + metrics and store to harddisk
- on server and hdd: metric = old_metric * 0.8 + new_metric * 0.2
- for every parent: metric = old_metric * 0.9 + new_child_metric * 0.1 (so parents are weighted according to the performance of their children)
- (also add the metric to the grand parents??)

Network interaction
The server will get all AIs and all AI metrics reported. Now it has to sort out:
- keep the best 100 AIs in the pool (AIs with lowest score are purged)
- whenever a widelands instance gets online, it will download the current 10 best AIs in the pool
- whenever a widelands game is played, the server will update the family tree performance of its AIs

Weighted probability
The weighted probability algorithm works as follows:
- build a list of all available AIs (fields: hashsum, score)
- calculate the sum of all scores of all AIs
- draw a random number between 0 and sum(score)-1
- int random = randomInt(0, scoresum);
- for (int i = 0; random > list(i).score; i++) random -= list(i).score;
- load(list(i).hashsum)

Implications
- There will be constant AI recombination and training
- Strong AIs will be shared over network
- Every single game will improve the AI gene pool
- On our server, we will keep only 100 AIs alive
- On each HDD of each player, there will be one new AI per AI-player per game (which is around the same amount as the recordings)

Feedback?

Edited: 2023-08-19, 23:18 UTC+2.0

Top Quote
hessenfarmer
Avatar
Joined: 2014-12-11, 23:16 UTC+1.0
Posts: 2875
Version: always the latest
Ranking
One Elder of Players
Location: Bavaria
Posted at: 2023-08-20, 00:13 UTC+2.0

Well,
here are my concerns.
Only measuring 2 metrics and compare the results from single games might lead to biased AI's. For example some maps are more AI friendly then others, Scores from such maps may be higher then from other maps alkthough the AI might fail on an AI unfriendly map. Or we might end up with AI's optimized for the most popular maps, Or they might be biased for seafaring or non seafaring. So we might need different pools for different map characteristics.
The same applies for our tribes. Ai's might get optimized for more popular tribes while getting worse for other tribes. So we might need the map pools permutated with the tribes.

Currently we do write some values in the log to manually evaluate them.To implement such a continuous training mechanism we would need to define a proper firness function which seems to be the hardest part as it need to consider much more paramters then the steepness of the proposed values. If this would be an easy task we would have iomplemented it already but we always came back to manually evaluating them as we are not fully aware of all weights we (humans) take into account while watching an AI doing its job.

Currently not everything in the AI is based on genetics, I believe we have a lot of hardcoded stuff in it and maybe still a lot of wrong algorithms. I tried my best to fix wrong behaviours in the AI code where training could not have cured a wrong algorithm in the code. And I am far from contesting I have found all of the hidden issues.


Top Quote