Comment reward (Score 1) 30
One of the things that made our task hard was the unpredictability of the Quake world. In other words, when the ANN bot misses a shot completely it is unable to get any useful info from it, potentially confusing it. The situation where it shoots the right gun at its enemy and hits the other bot sometimes arose as well. This made locking on to the right dimension difficult for the model because of this unreliable reward schedule. If we gave too much reward for a hit then the ANN bot could get stuck on something random and never give it up. If we didnt give enough reward it would never be convinced that it was doing well. Also not knowing when the ANNbot will score its next hit made things tricky.
We havent given much thought to applying the model to a different task yet. We simply wanted to see if the model could overcome the complexity of the QIIIa world. Since i recently graduated I'm no longer working on research projects as much as trying desperately to get a job.
Speaking of positive and negative feedback, one of my earlier forays into NN's in QUake IIIa was a simple mod where I controlled where the bots aimed different weapons with TD learning (a technique where a NN gets trained through rewards based on the action the agent took). Not having the right rewards produced some interesting bots: When I only gave positive reward for hitting an enemy the bots learned to spin in circles either to the left or right because this behavior guaranteed they would at least get a shot in on someone once a rotation since there were so many bots on the map. To fix that behavior I had to give small negative reward for turning away from an opponent and larger positive reward for killing the enemy. This encouraged the bots to finish the job instead of spinning around whipping rockets at enemies across the room.
Another thing that was wierd is I would start like 10 of these bots training, all equals with the same untrained network, and after a few hours some of them were completely hopeless and others were brilliant. It seems that their experieces during training were responsible for the difference. Some bots learned themselves into corners, eventually expecting their own failure and essentially giving up. Others would get better and better the longer I trained them. It's difficult to try to think of what these experiences were that caused some bots to get depressed and others to succeed.
-t
We havent given much thought to applying the model to a different task yet. We simply wanted to see if the model could overcome the complexity of the QIIIa world. Since i recently graduated I'm no longer working on research projects as much as trying desperately to get a job.
Speaking of positive and negative feedback, one of my earlier forays into NN's in QUake IIIa was a simple mod where I controlled where the bots aimed different weapons with TD learning (a technique where a NN gets trained through rewards based on the action the agent took). Not having the right rewards produced some interesting bots: When I only gave positive reward for hitting an enemy the bots learned to spin in circles either to the left or right because this behavior guaranteed they would at least get a shot in on someone once a rotation since there were so many bots on the map. To fix that behavior I had to give small negative reward for turning away from an opponent and larger positive reward for killing the enemy. This encouraged the bots to finish the job instead of spinning around whipping rockets at enemies across the room.
Another thing that was wierd is I would start like 10 of these bots training, all equals with the same untrained network, and after a few hours some of them were completely hopeless and others were brilliant. It seems that their experieces during training were responsible for the difference. Some bots learned themselves into corners, eventually expecting their own failure and essentially giving up. Others would get better and better the longer I trained them. It's difficult to try to think of what these experiences were that caused some bots to get depressed and others to succeed.
-t