
One example is for instance um I love this story is uh when one model was tested they would use uh one model against a specialized um chass model and the one model started to recognize it's losing so it tried to kill the process of this other model it didn't succeed so it recognized hm the chess bot is just a foul so I could change the foul I can change the rules of the game and make impossible moves right remember this models are just optimized to get the reward and the reward was you have to win the game. So if you're training, if you're setting up this gym environments and setting up this mechanical verifiers, you need to ensure that
you're training on the right things. Otherwise, the model will try to cheat all the time. And this is not something you want to have.