Right there in step 1 is the problem. By requiring a link to a sentence someone read months ago, the burden on the user is raised unacceptably. Users won't bother policing when it's difficult, unless the case is severe enough to stir up an outrage - which would already result in more damage than just flagging a user's tweets.
Well yes, that's correct, if nobody ever notices the duplication, then the plagiarizer won't get caught. But that's not a flaw in the algorithm because I think that's an unsolvable problem -- if nobody ever notices the similar jokes, there's nothing anyone can do. What my algorithm ensures is that if just one person notices the plagiarized joke, then at least it will get flagged (and after it's flagged, the random-sample-vote determines whether it really is a duplicate). If the original joke-writer and the joke-duplicator have non-trivial-sized audiences, then that increases the chances that at least one person will notice.
Of course, the potential for abuse is also high. Changing a single word can parody an original post, yet changing a different single word may not avoid plagiarizing.
Yes that's a good point, the system doesn't take into account the idea of making a small change for the purposes of parody. (For example, when "On the Internet, nobody knows your a dog" got changed to, "On the Internet, nobody knows your a god -- Jerry Garcia.")
So, here's a proposed change: If a user flags your joke as a "duplicate" of an earlier joke, and you don't agree, you should have the opportunity to respond with a "rebuttal" and explain, "No, this alters the original and adds such-and-such which makes it into a new joke." To avoid ruining your joke by having to explain it, that "rebuttal" would not, by default, be displayed alongside your original joke (to your Twitter followers or wherever else people view the original). But, if the "flagger" does not agree with your rebuttal, and it gets pushed to a random-sample-vote anyway, then your rebuttal is displayed alongside the original, and the voters can take it into account when deciding if you really created a new joke or not. (My Jerry Garcia example isn't a very good one, because most voters would figure out that that's a genuinely new joke, even without having to read a "rebuttal". But there may be other examples where the difference is subtle enough that it has to be spelled out explicitly.)
Do you think that would take care of that problem? If not, why not?
An automated algorithm won't likely be able to tell the difference, so it will fall to manual effort to identify which flagged duplicates are actually malicious.
True, but no part of my proposal involves an automated algorithm anyway.
Shakespeare plagiarized. Plato plagiarized. Tom Lehrer penned many verses praising plagiarism. The bottom line is that plagiarism goes hand-in-hand with creation, and it should always be evaluated only in the entire context of both works - the plagiarizing and the plagiarized. What is being said is often not what's being written.
All true, but also involved authors adding new creative elements, to the point where nobody seriously disputes that they deserve credit for the resulting work. I'm talking about taking care of low-hanging fruit where someone just steals another person's 140-character joke and pretends they made it up.
Sendmail may be safely run set-user-id to root. -- Eric Allman, "Sendmail Installation Guide"