Cheating Detector from Georgia Tech 941
brightboy writes "According to this Yahoo! News article, Georgia Tech has developed and implemented a "cheating detector"; that is, a program which compares students' coding assignments to each other and detects exact matches. This was used for two undergraduate classes: "Introduction to Computing" (required for any student in the College of Computing) and "Object Oriented Programming" (required for Computer Science majors)." Cuz
remember programmers: in the real world you are fired if you consult
with a co-worker ;)
This is not news. (Score:2, Informative)
UCLA had this in 1985 (Score:2, Informative)
So you might obfuscate your copied code by moving it around, changing variable names, etc. but it would still catch you.
Re:I can make one of those (Score:3, Informative)
ln -s
Info on cheating detector (Score:5, Informative)
1. The cheating detector is not new. It's been in place for years. When I took intro programming in 1994 they mentioned it, and it wasn't new then.
2. Everybody at Tech knows about it. They tell you about this script the first day of class. Nobody here should be suprised they were caught. The fact that they were caught only shows them to be some of the stupidest people at Tech.
3. It catches people every term. Usual numbers are below 5% range. The fact that it caught someone isn't news. The fact that it caught 10% of a class is news.
4. These classes are cake. There is no reason anyone should need to cheat to pass these classes. They are the most basic concepts of programming.
We've been using for 8+ years whats the problem? (Score:3, Informative)
The students are told it is ok to discuss the homeworks and project with each other and that it is ok to discuss the concepts. However it is NOT ok to copy each other's code.
The program does not just compare the text of each student's homework which is what some people seem to think it does. The program gets rid of variable names, function names and things like that because a person cheating can simply change those. It compares the style of the code and it is not given common code to look at. The only code checked is the code from problems that generally generate unique solutions.
In the time I spent there I know of over a hundred cheating cases caught by the program. In some of those cases if you had of given me the 2 pieces of code I never would have said the people were cheating but when asked the students confessed. I have never heard of someone being falsely accused. Most of the time when the 2 cheaters are asked separately they admitt to it.
Once again, Tech does not have any problem with people helping each other understand concepts like the way pointers or a vector works or the differences between stacks and queues. What they have a problem with is when each studen does not do his own work on an individual homework.
Eventhough some of the problems may seem not worth it, like writing your own version of strcpy, it is still necessary so that students understand how the library functions work even if they will never be writing library functions in their life.
Re:You're caught (Score:4, Informative)
Berkeley cheating detection (MOSS) (Score:2, Informative)
Re:Wow! (Score:1, Informative)
I actually got "caught" by one of these programs (Score:1, Informative)
Apoptosis
Re:How exact? (Score:5, Informative)
Besides, even if the system turns up a high match between two programs falsely, it is ultimately a human who gets to review the case and make the call, after (presumably) discussing the matter with the student before actually doing anything that would leave a mark on the record.
And as an answer to the knee-jerk reaction of "that's not how it works in the real world!" I tend to agree, but not completely. As an instructor of mine once said you have to learn to dribble before you can play with other people in a team in Basketball, and as such one needs to develop his or her own personal programming skills independently before he or she may work effectively in teams.
Of course, some could argue that learning in teams would be more effective and perhaps more useful, but the point is there needs to be a mix of team and independent projects. Without independent projects at all, it is difficult to be sure that everyone is competent to pull their own weight, and part of the role of Universities in the world of business is to certify that a graduate possesses a good skillset, and without both team and individual assignments, this is impossible.
Of course, as is the case with everything, this doesn't stop cheating. If one collaborates with someone completely unrelated to the class, it can't catch that, but then again, there aren't that many people inclined to work their butt off at no benefit to them just to help some other person get a good grade.... Of course, I have seen the case where a guy goes way out of his way to help a pretty girl, but that is another story entirely...
A better approach (Score:5, Informative)
Ok, check for exact match: diff source1.c source2.c
great, I just wrote a program to check for exact matches in source code, and it took me three seconds. Maybe I should apply for a patent for my ingenious approach (maybe I'd get it!!)
At my organisation, (in India) we've been developing something like this for quite some time for our internal tests.
While most of the work isn't (and probably won't be) publicly released, we can look at a systematic approach to building a better detector.
indent -i8 -kr
sed -e 's,\([^ ^I]\)[ ^I]\+,\1
sed -e 'g/^[ ^I]\+$/d'
i=i;
You may also want to first strip all #include <> statements (not #include ""), and run the code through the C preprocessor first to take care of #define, and conditional compilation
There's more obviously that I'm not sharing with you. These are the basics that anyone could figure out in a few minutes - not years.
Re:I remember when my school did this... (Score:2, Informative)
From what I understand the method used involved comparing source and generated assembly code for similarities.
And while I'm on my soapbox, this is another article posting a supposed "new and newsworthy" technology to slashot thats really not so new. Check your facts, and find out if this is really a "first" why don't you?
Not always accurate results (Score:2, Informative)
It turns out that about half of the people cheating really weren't- they just all happened to independently come up with a seperate working implementation than the Professors originally intended, and hadn't even thought of themselves.
All that ended up coming of this was that the Professors apologized on the class newsgroup- I think they still check the code using the same program.
diff don't do it (Score:5, Informative)
I was a grader for the C++ and data structures class back when i was in school. And I saw my share of cheating. One instance that stands out is when a bunch of kids had variables called "dude" and "funtime". Problem was, they had enough differences elsewhere in their code, that an automated diff wouldn't have worked. For a while, I was going to write some fancy perl that would look for certain cheating patterns that I was seeing, but then I got lazy.
One deeper way to check for cheating is to pass code through the front end of a compiler and check what comes out. if there are too many simmilarities, they will stand out even if kids change paramater names and the like.
Finaly consider this: Checking for cheaters in a class isn't just doing a diff of two files. For every student in the class, you have to check his code against everyone else's. This is a O(n^2)problem. My class had around 350 people in it
so that's 122500 checks to do. If it is anything more complex than a diff (multiple files, compiler front-end, fancy perl parcing) this can take a mad amount of computing.
This is old (Score:3, Informative)
Re:Cheating (Score:2, Informative)
Re:Sounds like understaffed or just bad professors (Score:2, Informative)
Re:A better approach (Score:2, Informative)
--Xantho
Head TA Elaborates (Score:5, Informative)
No, I have no current affiliation with Georgia Tech.
Yes, the cheatfinder really, really, honest-to-God exists. We used it every quarter that I was associated with the class and caught _lots_ of people. You'd be stunned how many people thought we were just making it up to scare them into not cheating.
Yes, it actually works. It examines mostly source code, although some versions of it were twiddled to look at "in-between" assembler to help catch those who just change variable names and such. It scans for patterns in the logical constructs of code blocks, even if they've been rearranged or altered in other "cosmetic" ways. It also looks for exact matches in text (like the "commas in same places" mentioned by Kurt in the article), but this is misleading -- it does a whole lot more than that.
Yes, depending on how you run it, it can generate a boatload of false positives, but it contains several tweakable threshold levels that let you control how "suspicious" a pair-match has to be before it gets flagged, and these thresholds are made looser for simple programs where there's really only one way to do it.
No, no action is *ever* taken based on the output of the cheatfinder directly. It merely alerts the TA who's responsible for cheatfinder that quarter and he/she then manually reads the source code to see if it looks like a case of cheating. If so, it gets sent on to the professor for a final verification (and possible discussion with the student if it is a borderline case), before being forwarded to Kurt for examination and possible disciplinary action.
Finally, yes, it's an old and very "evolved" codebase. You wouldn't want to be the one to maintain it, but on the other hand, it has been tweaked to the point where you'd be really surprised at the sort of clever cheating it can detect. (i.e. it works a lot better than diffing the source code
Anyway, figured I should throw in my $0.02 on this one, since I used to run that class.
If anybody has any specific questions, please post to this comment and I'll reply. (Questions from current Tech students asking how to "get around" the cheatfinder will be happily ignored, of course.
Only one step.... (Score:1, Informative)
It should also be noted, for those people whining about the innocent suffering at the hands of the automated, zero-tolerance faculty, that having your project and someone else's tagged as possible cheats simply means that both potential offenders are referred to the dean of the College of Computing, or to the instructor of the course. It isn't as if the algorithm is the first, last, and only say in whether or not you are tagged/punished for cheating. That would be stupid.
One thing I've learned as I've become more and more versed in technology is that it should never affect major events, such as one's potential expulsion from school, without a very close degree of human oversight on a per-case basis.
Another Cheating Detector (Score:2, Informative)
Measure Of Software Similarity [berkeley.edu]
We got it too... (Score:2, Informative)
TURNITIN.COM - Villanova now using this service... (Score:2, Informative)
I think this is a GOOD thing. Plagarism at the undergrad level runs rampant. If you're not smart enough to do the work - perhaps you should consider an educational path less taxing on the mind.
Subject: Academic Integrity
Importance: High
Welcome back! I hope that you had a wonderful holiday season. I am writing this note to you to give you a "heads up" regarding a new process for which Villanova has contracted to help us enforce our academic integrity policy.
At Villanova, class papers can now go to "Turnitin.com," which is a search engine that compares papers with others from Villanova and with thousands of websites to determine whether the material is the same. Once the search is complete, faculty receive a detailed report of what materials have been copied and from where.
I am telling you this to help you avoid academic integrity violations. Please be VERY careful and provide complete citations for your work; if your professor has indicated that you are to do your work individually, then do your OWN work; and so on. If you have ANY questions AT ALL, please seek clarification from your professor PRIOR to submitting your work to him/her!
I sincerely wish you a very successful semester!
Best regards,
Dr. Victoria McWilliams
Associate Dean C&F
this can bite innocent parties right in the ass (Score:2, Informative)
the prof called us in separately a couple weeks after the assignment was due and i honestly had no idea what was going on. despite my explanations of what happened etc, he decided that it wasn't his job to decide if i was telling the truth, what should be done, etc and so he turned us BOTH over to the honor council. we were tried separately and with my roommate's testimony i was found innocent, and never again gave my l/p to the guy so he could play games on my box when i wasn't around. the other guy got off too, but that was because he was a 2nd semester senior with 2 weeks left and they just decided to get him out of there.
there were similar examples to this (where innocent parties are in trouble unfairly) due to people stealing printouts of peoples code in a shared lab, taking printouts from the garbage, stealing floppy discs w/code, stealing code from
when a similar cheating detector was used in the cs101 intro to c class, something like 20% of the class got in trouble. it was a real mess for the honor council. groups of people would steal code from smarter people and then share it around. amazing...
wayne
I took the course (Score:2, Informative)
On a sidenote...this is actually not new...it happens there EVERY semester, it's just the first time it was announced in mass to the press.
Face it, in real life, these students will have to collaborate on projects and problems. Telling them that they can't even give each other hints (I'm not joking,they devoted an entire lecture to what constitutes cheating!) is moronic in my opinion. And no, I was not one of the 186 students, I was so bored in the course that I never even went to class, I only bothered to show up for tests.
how we use such systems (Score:1, Informative)
We have been using a system to detect cheating for years---it started before I got here. The one we use is Moss [berkeley.edu] (from Berkeley). How Moss works? I'm not sure, except that it does examine program structure, at least to an extent. I can comment on how it's used.
In an intro-sized course, 200-400 students, it's impossible to check the programs by hand, especially when they are graded by different TAs. Moss is very useful as a first pass in detecting cheating. When Moss flaggs a pair of assignments that are very similar, we examine them by hand and make a judgement.
If there's any error the process errors on the side of the student. If there was plagarism that is not caught by Moss, then the students will probably get away with it, since the chance that the TAs will discover it is small (although it does happen). No accusation of plagarism relies on Moss---Moss is only used as a tool to narrow the manual comparison process.
Re:Erm. (Score:2, Informative)
I started at The Univerisity of Iowa [uiowa.edu] College of Engineering [uiowa.edu] in 1993 and, as far as I know, the professors teaching programming courses had something like this from day one.
Incidently, an academic environment should be partially different from an industrial one. Sure, you won't get fired in industry for getting help from a collegue, but if you've never developed the basic skills needed to do your job (because you always relied on your more dedicated classmates to complete your coursework), you won't last long.
Duplications can happen naturally (Score:2, Informative)
I remember taking a class (I think in BASIC) at a community college. I already new BASIC, had been programming in it for several years. (I was only taking the class for the easy credits.)
After a test was turned in, the teacher called me over. He showed me a code fragment submitted by another student. It was practically identical, even to the variable names. (Of course, in this old dialect of BASIC, variables were single letters.)
How did this happen? Outside this class, the other student and I were collaborating on an astrophysics simulator (also in BASIC) for another class. Today, our style of coding is called Extreme Programming [extremeprogramming.org]. In the course of this we had tacitly developed common coding conventions and styles.
Even so, I was surprised how similar our independent output was.
Fortunately for me, the teacher was a friend of mine and he believed my explanation. Even so, I sensed some doubt on his part. Were he a relative stranger, things might have gotten messy.
Berkeley's MOSS (Score:1, Informative)
GaTech program (Score:2, Informative)
The reason that so many different people get caught is that they only review the cheating at the end of the semester, so it gives everybody who wants to cheat the opportunity before they are caught. EVERY student is told all of these details in lecture at the beginning of the semester, so it should not be a shock, but some people don't believe that it actually exists and don't even try to change things. Some people put their CS programs on the network, or leave them on a shared computer and other people steal them without even knowing the other person. The administration is generally pretty good about finding those who are guilty, and those who are merely ignorant. But as the article indicates, most people are just plain cheating.
Re:GT CS Cheating Detector Is Actually New (Score:2, Informative)
That is incorrect. I was an STA (supervisor teacher's assistant) at Tech from 1997 - 2000 and I even modified some of the code for the cheat finder (which is a perl script). It is not an urban legend and has been around since about 1992.
The professors have always described the cheat finder as a white-space-eliminating, pattern-matching, we-will-catch-you-every-time cheat detector.
The code does in fact throw away ALL variable names, function names, indentions, white spaces, braces, and other irrelevant items. It then does a comparison and anything below 97.9% similarity is thrown out as a non-cheater. Everything else is FLAGGED and then reviewed by 2 TA's and ALL the professors currently teaching that course.
They finally deployed the legendary cheat finder once and for all at the end of last semester, and caught a significant number of students.
As I said I've been an STA for a long time and I personally have sent over 10 students during that time to the dean and every semester we averaged about 20 to 40 cheaters. Of that 3 year period ONLY 2 people were exhonorated and 7 people claimed they did not do it. The rest even admitted to cheating.
The reason last semester so many were caught is because there were many students who believe that cheat finder is a legend. When you go around telling people that there has been a cheat finder since the early 90's and very few people can actually confirm this people do believe it to be an urban legend to scare kids and people are just wanting to take more chances the more they believe things are just scare tactics and nothing more.