Slashdot Log In
'Tit for Tat' Defeated In Prisoner's Dilemma Challenge
Posted by
michael
on Thu Oct 14, 2004 09:06 AM
from the taking-one-for-the-team dept.
from the taking-one-for-the-team dept.
colonist writes "Tit for Tat, the reigning champion of the Iterated Prisoner's Dilemma Competition, has been defeated by a group of cooperating programs from the University of Southampton. The Prisoner's Dilemma is a game with two players and two possible moves: cooperate or defect. If the two players cooperate, they both have small wins. If one player cooperates and the other defects, the cooperator has a big loss and the defector has a big win. If both players defect, they both have small losses. Tit for Tat cooperates in the first round and imitates its opponent's previous move for the rest of the game. Tit for Tat is similar to the Mutual Assured Destruction strategy used by the two nuclear superpowers during the Cold War. Southampton's programs executed a known series of 5 to 10 moves which allowed them to recognize each other. After recognition, the two Southampton programs became 'master and slave': one program would keep defecting and the other would keep cooperating. If a Southampton program determined that another program was non-Southampton, it would defect."
Update: 10/14 15:08 GMT by J : If anyone wants to try writing their own PD strategy and see how it fares in a Darwinian contest, I'll host a tournament of Slashdot readers. Here are the docs, sample code, notes on previous runs, and my email address.
This discussion has been archived.
No new comments can be posted.
'Tit for Tat' Defeated In Prisoner's Dilemma Challenge
|
Log In/Create an Account
| Top
| 356 comments
(Spill at 50!) | Index Only
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Scary Stuff (Score:5, Funny)
(http://put-your-mone...r-mouth-is.com/blog/ | Last Journal: Monday January 29 2007, @02:44PM)
- If you confess and your partner denies taking part in the crime, you go free and your partner goes to prison for five years.
- If your partner confesses and you deny participating in the crime, you go to prison for five years and yor [sic] partner goes free.
- If you both confess you will serve four years each.
- If you both deny taking part in the crime, you both go to prison for two years.
This sounds pretty much like the RIAA might be involved. I would deny everything if I were you!Re:Practicality (Score:5, Insightful)
For
If we both share our source code, we will both will be more productive.
If I share my source code, and you don't, you can be more productive. (Assuming you can use mine.)
If neither of us share, we both will have to re-create other's work...
Re:Practicality (Score:5, Insightful)
(http://mysite.verizon.net/spitzak)
The GPL is an attempt to make it *not* the prisoner delimma by forcing the other side to cooperate if you do. This eliminates the losing part of the cooperation choice and thus it is no longer a delimma.
That's not really so special (Score:5, Interesting)
(http://tenebrion.livejournal.com/)
Re:That's not really so special (Score:5, Funny)
(http://www.neverwhen.net/)
Re:That's not really so special (Score:5, Informative)
As an analogy of unprofitable collusion, I could win the World Series of Poker by hiring enough shills and paying their time and entry fees. I would lose money by doing this, probably more than I could recoup with post-tournament income via endorsements/books/whatever.
The parent is correct. Tit-for-Tat is still superior in equal numbers, and a modified Tit-for-Tat that can spoof the recognition algorithm of colluders will trounce them.
Did the same thing a few years ago... (Score:5, Informative)
Re:That's not really so special (Score:5, Insightful)
(http://www.covenantspice.com/)
Had this been an actual prisoner's dilemma, this winning strategy would require recruiting a large number of thugs who LIKE going to prison and are willing to "take one for the team."
Although cooperation is not explicitly defined as being against the rules, IMHO, it goes against the "spirit" of the competition. The point is that each algorithm is supposed to act in a greedy manner.
This will no doubt spark a LOT of discussion, but to me, they "cheated." (OK. Maybe "worked the system" is a better phrase).
Don't you see the beauty? (Score:4, Interesting)
-Serpent
Re:Don't you see the beauty? (Score:5, Insightful)
(http://kisrael.com/)
Well, they kind of went for a win on the "metalevel", utilizing the circumstances of the competition rather than solving the originally stated issue in an abstract way. On the one hand that's cool because evolution can work like that sometimes, but on the other hand, it really isn't answering the original question any more. (the question is "what's probably the best strategy for any given individual in Prisoner's Dilemna" and they changed the question to "how can we get some individuals to be super-players with the way this prisoner's dilemna simulator is setup"
Re:That's not really so special (Score:4, Informative)
(http://mccarthy.vg/ | Last Journal: Wednesday October 24, @09:09AM)
I've written perl scripts to run Prisoner's Dilemma tournaments [mccarthy.vg] on my website (the perl source code for all the species in the last run is on that page) and new submissions for the next tournament are welcome. Since my tournament does Darwinian selection on every agent that plays in it -- agents which don't earn points eventually die off, those which earn a lot of points reproduce -- self-sacrificing pairs of strategies won't do very well.
I haven't yet written code which can trickle in new members over time, but that really wouldn't be very hard to add (I could probably set it up in an hour if you are really interested :)
That's why... (Score:5, Funny)
(http://www.chimairaworld.com/ | Last Journal: Wednesday February 14 2007, @11:50AM)
I'm off to join the Freemasons. Be back in a few.
Yes, actually (Score:4, Informative)
(http://tenebrion.livejournal.com/)
In short, if tit for tat is isolated, it won't do so well since everyone is fucking with it. If there are just a few tit for tats out there, their power increases significantly with each one added.
only thing I can say is... (Score:3, Funny)
Lameness filter encountered. Post aborted!
Reason: Don't use so many caps. It's like YELLING. It's not at all like a TALKING COMPUTER. You are a bad man. Go away.
Uh, isn't that just cheating? (Score:5, Interesting)
(http://www.rumorsdaily.com/)
The real trick is to find a program that can beat other DIFFERENT programs, not beat itself. This seems really stupid, or am I missing something?
Re:Uh, isn't that just cheating? (Score:5, Insightful)
Interestingly, this strategy is also fairly "brittle", I think, in that simple rule-changes could foil it. Requiring only one submission per team, for example, or scoring teams according to the total (or average) scores of all their programs, would complicate any strategy of collusion.
Re:Uh, isn't that just cheating? (Score:4, Insightful)
That's why this isn't a good answer to the problem -- not because it's somehow "cheating", but because its a strategy that only works in limited circumstances and fails spectacularly in others. Kind of like chasing down losses when gambling.
Re:Uh, isn't that just cheating? (Score:5, Insightful)
(http://www.neverwhen.net/)
Does this defeat the purpose? (Score:5, Interesting)
All this has done is make a meta-PD game in which the two programs create a meta-game in which they agree to cooperate. That is to say, this is a solution to the PD problem that relies on the cooperation of a cohort (Someone to keep choosing loyalty while you defect and get all the money). Which is exactly not the point of PD.
So the real headline, I think, is "Trivial flaw found in definition of Prisoner's Dillema problem. University of Southhampton wastes money demonstrating flaw instead of writing a goddamn paper like a normal person would."
...by cheating! (Score:5, Insightful)
(http://www.fantasticdamage.com/)
Yeah, that's not the Prisoner's Dilemma. Or even the Iterated PD [wikipedia.org]. This whole "signaligng Morse code" on the prison walls is nonsense, because it was not part of the original plan. Just because it's not in the rules doesn't mean you can do it. In Chess there's no rule specifically against me bringing a SuperGrape(TM) onto the board. The SuperGrape(TM) immediately destroys all pawns on a color of my choosing.
No, it doesn't work that way.
While this is an interesting experiment, it's not a true victory.
Re:...by cheating! (Score:5, Informative)
Evolutionarily stable? (Score:5, Insightful)
(http://ingles.homeunix.net/)
It's not clear to me how the entries determined who would be the 'master' and who would be the 'slave'. It seems that if you had lots of 'colluders' around who could be induced to 'suicide' for another's benefit, you'd very quickly get cheaters who worked to be the 'master' in all situations.
This strikes me as a lot more reminiscient of the Hawk/Dove [washington.edu] situation.
Tit for Tat (Score:5, Funny)
(Last Journal: Wednesday December 01 2004, @10:15PM)
Spirit of the PD (Score:5, Interesting)
Real prisoners only get to choose ONCE.
By taking advantage of the multiple-iteration aspect of the simulation with this sort of 'portknocking' strategy, the winning programs kind of take a cheap shot at the original PD.
Of course, it's all hypothetical anyway, and come to think of it Tit For Tat technically takes advantage of the multiple-iteration aspect as well by doing whatever the opponent did the last time...
Ah well, at least the Wikipedia entry makes a distinction between regular "Prisoner's Dilemma" and "Iterated Prisoner's Dilemma".
Re:Spirit of the PD (Score:4, Informative)
The important codicil to the story is... (Score:5, Insightful)
(http://www.aug24.co.uk/)
J.
The article got it wrong (Score:5, Informative)
(http://geocities.com/nelstomlinson/index.html)
Repeated games have radically different outcomes than one-time games. It's long been known that where cooperation is possible, cooperation can beat solitary strategies in repeated games. I really don't think there's anything surprising here.
Re:The article got it wrong (Score:5, Insightful)
The nuclear MAD comparison is apt, because of the time lag between launch detection and detonation. During the flight time of the first launch, there is time for several rounds to occur.
Actually, the nuclear standoff could be considered an ongoing PD game with both sides playing Tit-For-Tat strategies. The rounds occur every few minutes with both sides asking "did the other side screw us yet" and responding "no, so we won't screw them yet". This PD game has consisted of millions and millions of turns already, with both players using historical knowledge to influence their current choices.
Asian mentality (Score:4, Interesting)
ok, here's a weird thought. In many Asian countries, the mentality is to work as a group, rather than individually, with the individual sacrificing themselves for the group if necessary. In the USA and most of the "western" world, we tend to act more as individuals. We tend to think think our system is better, but what if we're wrong? Perhaps, as this experiment shows, the Asian mentality may actually be the superior strategy?
China has been most consistently the biggest superpower over mankinds history, and it looks like it's going to be that way again in a couple of decades. Perhaps these things are related...
Re:Asian mentality (Score:4, Insightful)
(http://tenebrion.livejournal.com/)
It is not the first (Score:5, Interesting)
It is easy to score better than Tit-for-Tat in Axelrod's (original) tournament. He included a program that played random moves. It is not difficult to recognise this program after, say, ten moves have been played. You can always defect against random, because its moves are unrelated to its history. So, a program that plays Tit-for-Tat by default, but always defects against Random, scores better than Tit-for-Tat.
Does this dillute Tit-for-Tat's accomplishment? Of course not. Tit-for-Tat still plays well. And it is such a simple strategy that it can be programmed in two lines ("C on move 1, then copy opponent's previous move"), which none of the other programs achieve. Tit-for-Tat is simple, elegant, and strong. It's beautiful.
Southamptom entries, on the other hand, are complex, sneaky, and cheating against (perhaps unwritten, but nonetheless agreed-upon) rules. They're ugly. They only prove that backstabbing cheating bastards may defeat just-and-fair if the referee is looking the other way for a moment.
It always works for card games (Score:3, Insightful)
(http://www.garbett.org/)
Kin Selection in Genetic Algorithms (Score:4, Informative)
(http://www.geocities.com/jim_bowery | Last Journal: Tuesday September 19 2006, @10:20PM)
A mathematical treatment of population genetics in groups was given by W. D. Hamilton in "Innate Social Aptitudes of Man [geocities.com]". In the last sentence of that paper, Hamilton, the originator of modern kin selection theory, states:
What Hamilton is referring to is the fact that in any structure of components vs composite, there is the opportunity to defect. An individual gene can defect against the organism within which it resides via, say, meiotic drive [ucl.ac.uk]. An individual may defect against his tribe made up of his close relatives. A tribe may defect against the others making up a nation. A nation may defect against others making up a geographic race. A geographic race may defect against others making up humanity as a whole.It is indeed a dilemma but it isn't without a rigorous treatement within genetic theory.
Steve Sailer [isteve.com] has written an an excellent review [vdare.com] of the politically touchy issue of ethnic nepotism given from Hamilton's group selective perspective.
Secret societies & paranoia (Score:3, Insightful)
If the agents in the game were capable of higher order reasoning and could see these coordinated actions between members, then they would become paranoid -- all the Southampton team members were "out to get them."
Tit for Tat always has been beatable... (Score:3, Insightful)
(http://slashdot.org/ | Last Journal: Thursday August 12 2004, @10:57AM)
Being that, beating Tit for Tat isn't that big of a deal. Doing BETTER than Tit for Tat consistently _IS_ a big deal.
The game is a positive sum game, so it pays off to end up in a cooperative (or semi-cooperative) sequence over repeated "defections".
For some good reading on the Prisoner's Dilemma Game and how it fits in some biological systems read;
"The Evolution of Cooperation" by Robert Axelrod (and newer books)
"The Selfish-Gene" by Richard Dawkins
There may be more recent books too, it's been while since I studied the subject.
Having one plan that can beat Tit for Tat
I must be missing something (Score:4, Interesting)
(http://slashdot.org/ | Last Journal: Sunday January 20 2002, @01:58PM)
1st iteration - Traitor defects, TfT cooperates, TfT loses and Traitor wins.
Nth iteration - both defect, minor losses for both
Thus Traitor beats TfT... What am I missing?
That's right (Score:4, Informative)
BUT, in an environment made up of a few players playing each strategy, then you have the following matchups:
Hawk vs Hawk. Horrible horrible loss for both of them.
TfT vs Hawk. Hawk wins, but only by a single round.
TfT vs TfT. Both TfT 'win' - neither betray the other.
So, overall, TfT does better than hawk.
The interesting part isn't beating TfT (which, as you point out, isn't THAT hard to do) but in doing consistently better than it against a wide variety of programs. Which is what TfT has long been the baseline for.
The winner basically cheated (good for him :) (Score:5, Interesting)
(http://mccarthy.vg/ | Last Journal: Wednesday October 24, @09:09AM)
Once I experimented with letting the agents recognize which "species" they were in and which "species" their opponent was. The runaway winner, of course, was the one which always cooperated with itself, and was less nice to every other species. (In my version, "less nice" meant playing Tit-For-Tat, but the idea's the same.)
Being able to do this is like having the teacher's edition. If recognizing which species other agents belong to is allowed, that's a pretty trivial strategy. It's not called cooperation. It's called xenophobia, or to put it into the most familiar anthropomorphization, racism.
(The life lesson, if I may go out on a limb, is that in an environment where some recognize a quality called "race" and discriminate based on it, being unable to see that quality is a liability. Being truly color-blind means you are unable to recognize not only race but racism, which means you will be taken advantage of.)
When I ran my first tournament [mccarthy.vg] and got some interesting results based on this, I realized that knowledge of what "species" an agent belongs to is too powerful, it throws a monkey wrench into the works. So I scrapped it and moved on to stuff I found more interesting [mccarthy.vg].
But the winner of this PD tournament was even craftier; he submitted a ton of entries, all of which were xenophobic in this way, except that they all recognized one "species" as the top dog. The other "species" essentially committed suicide to give the highest score to the top dog. That wouldn't have worked in my tournament, since they literally would have committed suicide (my agents starve to death if they don't score high enough) and that would have shaped the resulting environment. Every tournament is artificial in some way, and the human submitting entries to this one was clever enough to take advantage of these particular artificialities.
Since it's now been shown that inter-agent communication is possible, that's going to be fair game for every tournament from now on. The next step is going to be designing tournaments to work with this trick, not against it. As I wrote to this tournament's organizers:
Tit for Tat has already been beaten (Score:3, Informative)
Except that's not the "prisoner's dilemma" at all! (Score:3, Insightful)
(http://trolltalk.com/ | Last Journal: Thursday November 22, @03:36AM)
By using this "recognition system", the program is capable of "knowing" in a deterministic fashion what some of the other programs will do in advance.
In other words, at the very least, a cheat.
Not a cheat (Score:3, Interesting)
(http://www.dougshaw.com | Last Journal: Tuesday August 20 2002, @05:24PM)
It is kind of an orthodoxy in the literature: Tit for Tat always ties or loses by a little bit, but in tournaments, it is the best strategy.
Well - it ain't. Someone found a way around it. Instead of urging rule-changes to prevent this new challenger, we should all be happy and excited that PD tournaments have just got MORE INTERESTING.
I can't wait to see what happens next - what new programs will emerge to have the advantages of Tit for Tat but also the ability to defend against Master-Slave programs that communicate with each other.
The game has changed - now let's leave it alone and watch.
nested bogosity (Score:3, Interesting)
(Last Journal: Friday November 05 2004, @09:04PM)
But even if you don't agree with that view, another important question is:
in what meaningful sense is this new strategy a "victory"?
After all, it achieves "victory" for half of the cooperators, at the cost of sacrificing the other half.
To use one nuclear-war analogy, it's a choice between strategy "A",
where you acquiesce to the death of half of your populace, with the reward that the remaining populace is completely unaffected --
and strategy "B", with the guaranteed result that no one dies but everyone is injured.
Which populace would *you* choose to join on the eve of war?
The next game (Score:3, Interesting)
Creativity is cheating (Score:3)
(http://marciandgreg.com/ | Last Journal: Wednesday January 07 2004, @07:30PM)
Creativity is "just cheating." Creativity is breaking the rules in a novel way that sheds new light on reality. And isn't that the holy grail of AI?
So, was this just cheating? Hell yes. And it's fantastic.
Missing option... (Score:5, Insightful)
(Last Journal: Sunday May 08 2005, @01:52AM)
What I find disturbing this is the way that the problem is framed presupposes no underlying system of ethics. To wit....
* If you confess and your partner denies taking part in the crime, you go free and your partner goes to prison for five years. * If your partner confesses and you deny participating in the crime, you go to prison for five years and yor partner goes free. * If you both confess you will serve four years each. * If you both deny taking part in the crime, you both go to prison for two years. What do you do?
How about: Tell the truth? Regardless of what your partner does, tell the truth. I find it disturbing that the problem is framed in a way that the actual truth of the matter is irrelevant. (i.e. the problem would be unchanged if I replaced "You and your partner have committed a crime and are caught" with "You and a friend have been accused of a crime which you may or may not have committed.")
I'm not trolling or off-topic here. I'm dead serious. This formulation of the PD is ethically doomed from the get-go, and thus the results of the experiment may be of interest to mathematical game theorists of this particular game, but I find it unwise to think the results make any significant implications about ethics (or anything else for that matter).
Someone will counter that since this is a "Prisoner's" dilemma the person involved must be a criminal with no "ethical" principles other than an interest in self-preservation (i.e. the person is already debased as can contribute nothing meaningful on the subject of ethics!
Pavlov, Grim, and the other strategies. (Score:4, Interesting)
http://www.ncbi.nlm.nih.gov/entrez/query.f
It was also a bit dismaying to see how well "Grim" (hold a grudge forever) did in both games. In evolutionary versions of the game, Pavlov helps keep down the population of "suckers" (thereby decreasing the food supply for more predatory and parasitic strategies) while still rewarding "provokable" cooperators (thereby increasing the total aggregate "reward" of the ecosystem.
Also, one essential part of the payoff structure that deserves emphasis is that the payoff for cooperating has to be more than half the average of the winner and loser's payoff for defection, else one benefits by simply alternating each turn. This is a little bit like the winners did here, where they got the top spots at the cost of a lower total take for their "team". One real world example of slashdot interest where this might make sense is if you take these losses in order to eliminate your rivals from the game and then reap monopoly benefits once you control the game (not to mention any names...).
Maybe someone who has analyzed the results in more detail could comment on how the various well known strategies fared and why.
Key element - guaranteed draw strategy (Score:4, Interesting)
(http://pikus.net/)
Compare this with, for example, a chess tournament. You could secretly enter a team and have them all lose to you. While this will keep you from ending last, it won't assure victory, unless all players are roughly equal. If there is a very strong player, he'll win against all your team, yourself included. So you can cheat by redistributing players of comparable strenghts, but at least you can't rob a clear champion of his deserved victory.
This is not the case in the PD tournament. But let's redefine the problem slightly: say, if both sides cooperate, each gets a dollar. If then defect, each pays a dollar. Sucker's reward is paying 10 dollars. Now the Southampton team's strategy boils down to using the tournament to give all their money to one player, while paying a hefty tax in the process. There is a cheaper way to do this, just give all money to one guy outside the tournament
Does this really apply to human behaviour? (Score:3, Interesting)
(http://tachyphrenia.blogspot.com/)
The Southampton strategy is dependent upon large numbers of people who will sacrifice all for the good of the other, and not for the good of their community (the collective performance is worse than OTFT.) I can see sacrifice for the greater good, but this is sacrifice to another person without hope of recompensation or an increase in general wellbeing. This does happen in human societies (I think it's happening now in some political systems), but only when the winner has managed to convince the losers that its all in everyone's best interest. What Southampton has added to this mix is a capacity for extreme self-delusion that directly contravenes the economic assumption of informed choice and self-interest. For purposes of economic modelling, Southampton should probably be disqualified, or these assumptions dropped. But this should also tell you something about what could happen to those nice economic models when they hit the messy world of human beings, who for the most part aren't very informed and often work against their own best interests as a result.
The consequence for a societal group running Southhampton against an OTFT group would be the defeat of the Southhampton group every time. Selection works at individual AND group levels. So the challenge should probably be two-tier: run the programs individually against each other, and run them as tribes against each other.
Kobiyashi Maru (Score:3, Insightful)
(http://slashdot.org/)