Cheating Detector from Georgia Tech 941
brightboy writes "According to this Yahoo! News article, Georgia Tech has developed and implemented a "cheating detector"; that is, a program which compares students' coding assignments to each other and detects exact matches. This was used for two undergraduate classes: "Introduction to Computing" (required for any student in the College of Computing) and "Object Oriented Programming" (required for Computer Science majors)." Cuz
remember programmers: in the real world you are fired if you consult
with a co-worker ;)
Erm. (Score:5, Interesting)
How exact? (Score:2, Interesting)
It better check for exact duplicates only, down to the variable names. Many undergraduate CS assignments are programs so basic that there are really only a few ways to implement them. It would suck to be a student who from scratch used the same algorithm as another student, and have them both flagged as cheaters.
I remember when my school did this... (Score:5, Interesting)
Prior to that year, VT had an average of 75 cheating violations for the WHOLE university (25000+ students). For that one class, on one assignment, 150 students were found cheating by the cheating detector... out of the 500 or so students in the class.
Funny as hell
A Sad Fact (Score:2, Interesting)
I was one of the better students in my comp-sci classes and so other students looked for me for help etc. I would routinely point them to my own finished assignments as example of how to do something or provide listings in which we would discuess the assignment and how to do things.
This worked well until I got called before the teacher in regards to two students having taken my listings and typed them in ( with practically no modification whatso-ever ). I explained the truth - that I provided it for purposes of instruction not stealing and managed to escape. The other students were forced to retake the course.
After this incident I kept my eyes wider open and noticed more students "copying"...
It happens. Whether this program is really needed or not I think is more an indication of how well the teacher stresses the students on final exams and such.
Re:Real-world vs. school (Score:3, Interesting)
Good point. I know a 4th year student who doesn't know C, but watch out his Counter-Strike skills are amazing.
Sigh... when will schools implement the other kind of cheating detector [evenbalance.com]?
Cheating (Score:5, Interesting)
Cuz remember programmers: in the real world you are fired if you consult with a co-worker
As someone who TAed classes at GA Tech, I take a lot of offense at this comment. There is a difference between working as a team on project based classes (of which GA Tech has a good number off including classes where we got to hack the Linux kernel and another where we got to deliver a product to a customer) once you've shown you understand the basics of programming and wholesale copying of other people's work in entry level classes where you are supposed to be learning to program on your own.
Beginning programmers need to learn how to program, find information from MAN pages & API docs, and come up with solutions on their own before being introduced into team based environments. If not they never learn how to be self sufficient or even if they are cut out for programming at all.
It is true that in the real world no man is an island but on the flip side, how many people have worked with co-workers who completely clueless about how to perform their jobs but held degrees or certifications that implied they shoould be knowledgeable about programming? These are the kind of people who hid behind the work of others in team based projects and submitted others work on individual projects.
A few words from a GA Tech grad (aka "flame bait") (Score:2, Interesting)
Not new, but not "diff" either... (Score:2, Interesting)
It caught a few guys that I know. When confronted they tried to say that they didn't cheat. So the prof does the only sensible thing that a CS prof should do when dealing with cheating intro students: Single out a common line of code in their programs and ask them what it did. Hint: How many of you knew the ternary operator in your first forays into C?
Slashdot Boggles Me Again... (Score:5, Interesting)
A University degree is supposed to signify that you demonstrated knowledge in certain areas.
Cheating is not demonstrating knowledge.
Undergraduate level programming assignments do not require even consultation with other students, IMHO. They are too simple. If you can't code an undergraduate programming project without extensive "consulting", then you can't program. Period.
I am sickened by the number of people with CS degrees only because of "teamwork" and "consulting". I would guess, from my experience, 95% of people with CS degrees can't write a sort routine. Widespread use of these kinds of programs might fix some of this. As would harsher grading. In the real world, you don't get partial credit for a program that only dumps core or doesn't meet any of the design objectives. (in my opinion, any program which doesn't properly run a set of tests, provided to the students in the project instructions, should receive an "F" grade)
No wonder the software industry is such a mess. I've seem CS *GRADUATE* students who couldn't use malloc(). Note that I did not say "who use malloc() wrong - no, these students could not even figure out how to call malloc() nor explain what it did. There's something strange happening (I call it cheating) when someone can graduate with a CS degree yet never use dynamic memory allocation knowingly...
school != real world (Score:2, Interesting)
There's a much better and more accurate article on the topic at the AJC [accessatlanta.com]. Take the AP version with a grain of salt.
The fact that GA Tech uses software to detect possible cheating should not come as a surprise to anyone. Such systems have been in use at many schools across the country for many different disciplines besides CS. Nor should anyone be disturbed by the use of such systems: their purpose is to detect possible cheating, which according the AJC article was clearly verboten to the students in the class.
In the real world, a completely different set of rules may exist, but the fact remains that if your boss tells you he wants you to do something on your own, then you'd damn well better do it on your own. When a teacher instructs a student to perform a task on his own, he so instructs not to make life more difficult for the student, but to ensure that the student is capable of independently executing the skills necessary for the completion of the assignment. When that student eventually enters the real world, he has demonstrated the ability to perform the skills to be expected of him in the real world, so when he then has the ability to collaborate with his peers, he can actually contribure to the group's performance. A student who has always relied on others to get by will offer minimal assistance to a group and will typically act as a hinderance.
So sure, in the real world you won't be fired for collaborating with your peers, but you will be if you can't get anything done without collaborating with your peers.
More Info (Score:5, Interesting)
With regards to the cheater-detecter program (called 'cheatfinder'), it's significantly more complicated than diff(1). It involves checking the structure of the code (ignoring variable names , indentation, and whatnot). Admittedly, I've never seen the source for it (very few people have), but it's been around since at least 1997. The output of the program is a single number indicating the probability that two people colloborated on an assignment. The threshold is typically set fairly high (0.90+), so false-positives are less likely. 187 students, the number caught this time around, is definitely the highest I've heard of, but it's definitely not the first time we've hit a large number -- just the first time it made the cover of the local newspaper.
Interestingly, many students (including myself before becoming a TA) think (well, thought now) cheatfinder is just something the profs made up to scare students.
Re:How exact? (Score:4, Interesting)
He gave us a long complicated piece of c code to do this, but instead I just used a stack (we didn't "learn" about those in class untill a few weeks later). Well, it just so happened 1 other student felt like writing the 11 line stack implementation, rather than the 100+ line one the teacher recommended. The teacher then said we cheated.
Fortunatly we were both able to explain how our code worked
I teach a CS course... (Score:2, Interesting)
One assignment: modify this open source program to fix an assigned bug.
Another assignment: Modify this (Same) open source program and add a feature that they've been wanting for a while.
It's not like the solutions aren't readily available and well documented, it's just like in the real world: it hasn't been done here yet.
As long as intro courses use textbook problems with textbook solutions, students should be penalized for not doing their own work. The point in these classes is to provide an educational foundation. As soon as the foundations are laid, the students should be given work that isn't straight out of a textbook, and should be allowed to use any legal method of getting a solution. It's not like they'll find an exact solution, so they will have to do some tweaking and patching to get it to work anyway.
This was the philosophy my teaching cohort and I presented to the college Provost and the head of the CS department. They bought into it and gave us the class.
Similar program at Stanford (Score:2, Interesting)
Re:How exact? (Score:3, Interesting)
Computer cases were the most common (4 of the 5 cases I sat in on). One day, we had three cases, with different defendants in each one. All programs, from about 15 students were essentially identical. What were the differences? Capitalization of variable names, and indenting style. That's it. So, while they were not 'exact' copies, they were close enough in my mind to merit guilt.
They were fairly trivial programs. I think a total of maybe 150 lines of code or so. Can't remember if it was some form of basic, or C (I really think it was the former). There were a few ways to do the problem (I think it sorted words or something). But the striking thing is that the variables were typical CS100 nonsense names (variablefoo, variablebar, but NOT simply 'i' for iterator or 'x') of four-five characters in length, differing only in that some students had all uppercase, and others all lowercase.
Now, I suppose that if the instructor had said 'use these variable names' there is a defense. But that was never mentioned.
I think the ultimate answer was that almost everyone admitted that they did some amount of copying, and all got zeroes on the assignment. I can't remember if any failed the class (and no, nobody was tossed from school).
But this is the interesting thing: Each of the three cases was about the same instructor, with the same program. But they were brought as three cases. We were presented the hard copy evidence for all three cases at the beginning of the morning. During a break after the first case, I flipped through the other evidence packs. I saw that the copying was very, VERY similar in all three cases. In fact, there were more similarities between program A in case 1 and program B in case 2 than between Program A in case 1 and Program B in case 2. To my mind, it was clear that the cheating was much broader than indicated. However, I was ignored. Our power was only as petit jury, judge, and executioner. We had no room to act as grand jury. (In addition, this was my first real world experience with a judicial system unable to understand technical issues. I was a chem major. Roommate was a CS major. I was the only hard-science guy on the board. The others were various history/business majors.)
Anyway, the point is: exact copies are probably always cheating. But near copies are also sometimes cheating.
Re:I remember when my school did this... (Score:3, Interesting)
In the teaching trenches... (Score:4, Interesting)
As to the the differenece between "consulting" with another and "cheating", I've found that the "explain your own code" is a pretty good yardstick. If I spend 2-3 hours preparing to teach a lecture, I have no sympathy with someone who doesn't spend enough time to do the assigned work but instead cheats.
Re:Erm. (Score:2, Interesting)
Like they need a computer program to do this?
Um, when I was at the University of Illinois in the early '80s, they had a decidely low-tech but nonetheless effective technique: very lightweight paper and bright lights.
Fully half of the class of 400 failed the final project -- a particularly tough one at that -- in the assembly language class I took. The TA's noted an awful lot of assignments that looked similar. Sure enough, a bright light was all it took to see that there were a lot of identical printouts. There were even a dozen or so that didn't even have sense enough to remove the name of the author of the code from the comments!
Testing Methods (Score:2, Interesting)
I agree, so why not use our ability with networking to solve the problem. The goal is to see that students can code their way out of a paper bag.
Why not have a class of networked boxes and then 'test' the students by having them come in and write their code while the network is shut down, preventing the students from getting access to any help during the test. Take the floppys and CDROMs out and they can't bring in outside help. There could be bonus marks for speed.
Thoughts of another Georgia Tech alumnus (Score:3, Interesting)
I took Intro to Computing in the Spring of 1996. It was cake for me because I was a Computer Science major and I dig this stuff. But a lot of non-CS people dreaded that class above all others, especially Management, International Affairs, and Architecture majors, but also some engineering people, such as Aerospace and Industrial Engineering.
(And can you really blame them? How many civil engineers really need to know how to sort numbers in O(N log N) time? Or insert into a linked list for that matter? They write hacked-up FORTRAN if they write anything at all.)
Kurt Eiselt came to the first lecture and gave us a scare speech about Cheatfinder. Knowing that it looks for similarities between two students' works, I was worried constantly about my homework answers. A typical problem was to write an inorder binary search tree traversal routine in pseudocode. Honestly, how many different ways are there to do this? And there are 500 people in all sections of the class?
Fortunately, I was never flagged, but I have heard a few stories (which may not be true, you know how that goes) of people who were flagged, and were only vindicated after losing student jobs and failing classes.
I don't think an automated cheat detection system is applicable to small problem sets like binary search, stacks, and Mergesort. For the later classes, say Sophomore level, I have no problem with it though.
Besides, many Greek orders and clubs on campus have extensive "word" banks--archives of previous homeworks and tests, with solutions, from previous class offerings. Are they going to check against all previous students' work too?
Talk about cheating (Score:2, Interesting)
Those dumb enough to carbon copy stuff deserve to be caught.
Re:diff don't do it (Score:1, Interesting)
Diff is a high cost way of doing it. There are more intelligent ways to create a cheating detector:
Voila. The top of your list of distances now tells you the most similar submissions. Get your TAs to compare them by hand, and get ready to deal with the consequences of catching people...
Is it possible to beat such a checker? Of course. Is beating it easier than doing the assignment itself? Highly unlikely. As a student you never know what statistics are being used, nor do you know what similarity thresholds are used, so anything you copy off of someone else may trigger such a checker.
Does it work? Yes. I have caught many cheaters using this technique...
OT: reversing a list (Score:3, Interesting)
int list[] = {0,1,2,3,4,5};
int i,j,len=sizeof(list)/sizeof(int);
for (i=0; i < len/2; i++) {
j = list[i];
list[i] = list[len - i - 1];
list[len - i - 1] = j;
}
Reversing a linked list would be marginally longer, but a doubly linked list would be just as short or shorter than this. Only a real novice would take 100 lines of C to do it. BTW, how could you possibly learn assembly before learning what a stack is?
A good cheating policy (Score:5, Interesting)
CMU 15251 Course Document and Cheating Policy [cmu.edu]
His policy encourages collaboration and specifically forbids cheating. It itemizes various types of cheating, for example copying from another student, letting another student copy you, and looking at someone else's files online (even if they forgot to set their file permissions).
Furthermore, he requires all of the students in his class to sign a statement saying that they have read and understand the cheating policy. Not only does that discourage some students from cheating, but it also makes it much easier for him to get students into serious trouble with the school when they are caught.
In addition to the course document, here's more or less what he had to say on the first day of class: (I apologize for paraphrasing; this is how I remember it) "Nobody plans to cheat. You all must be very smart, or you wouldn't be here. You think you're going to try hard and do well in this class. But later in the semester you'll get busy with other classes and activities, and all of a sudden an assignment will be due in one day and you haven't started. Or you'll be taking a test and realize that you forgot to study an important equation. Or you'll work hard on an assignment and almost completely get it working, but get stuck on one subroutine. Even though you never planned on cheating, all of a sudden you'll find yourself in a circumstance like that and it will seem tempting."
(BTW, I shouldn't have to say this, but Prof. Rudich's cheating policy is copyrighted. If you're a teacher or T.A., don't copy his cheating policy without his permission. That would be just as dishonest as cheating!!! If you want to use it, contact him and I'm sure he'd be delighted to let you use it, as long as you give him credit.)
Re:How exact? (Score:1, Interesting)
One would think that the Midterms and Finals would be enugh to prevent something like this, However Though she consistantly fails Midterms and Finals her near perfict projects keep her afloat and she gets B's and C's.
The continuous stream of outside help that she is reciveing could never detected by any cheating detector, she in fact does write all of her own code. However she still somehow manages to know almost nothing about programing.
There are two things that truely sadden me about this situation, the first is that she will graduate with a little slip of paper that sais she knows how to program, while I have seen many of my freinds who could code circles around her wash out (usualy due to depression related life problems) The second is my room mate and dear freind who is draging her thru the Undergrad program is suffering for the effort. His grades have taken a noticable hit, and the extra work shows on him. (Sadly he does'nt stand a chance of catching her as a S.O. either)
I guess its just a bad situation all around. I pitty the fool who hires her.
What a crock... (Score:1, Interesting)
Oh many ages ago when I was finishing my CS degree, I had an assembly course - I *KNEW* people were stealing my code from the trash because they were too stupid to write their own...
So I planted stuff in the trash that would print out "I was stolen from
On the lighter side - CS Dept's have been wrestling with this issue forever. They've concocted scripts written in a zillion different languages - but for the same reason that we can't filter out all the spam, they will never be able to detect all the cheaters...
Once the detection mechanism is known it's trivial to avoid - even if your program is "the same" as someone elses...
My advice would be for CS depts to actually have people with a brain teaching the courses, and have as many assignments as there are people in the course. That is, if you have 20 students, you have 20 different projects - each project is designed to test the same CS concept, but each requires a different program to be written. This would ENCOURAGE thought on the part of the Prof, and the students... Alas, it's too much to expect...
O-O Paradigm Encourages "Cheating" (Score:1, Interesting)
Stephen Ambrose, watch out! (Score:4, Interesting)
These instances only came to light because an author of a lifted passage noticed it while reading Ambrose's book. Subsequent episodes came about because other authors started looking, and now some people are checking out new likely sources; this works because Ambrose only lifted passages from books that he admired and heavily footnoted (at least, so far as we know!).
Perhaps Ambrose was really just lazy, as he was fairly open about crediting others for the ideas (he "just" failed to credit them for the words, too). There are many cases of sneakier plagiarism than that, both in academia and in journalism.
So, class, the programming problem for today is, given the text of two books, spit out the most likely candidates for lifted passages, based on length and similarity of words. You get a B if you can do this for exact, verbatim matches, an A if you can do it with individual word substitution, and an A+ if you can recognize re-ordered clauses. The end users for this tool would be 1) authors everywhere who want to protect their own writing, and 2) journalists looking for juicy plagiarism scandals.
NO THIS IS NOT NEW (Score:2, Interesting)
Dear professor Andrew S. Tanenbaum of the free university of amsterdam and his colleagues already used this kind of program to check the hand-in programs we students wrote in modula-2 back 8 years ago or so, and actually it was kinda a big thing in dutch news back then. Guess these guess just reinvented the wheel.
so
Comment removed (Score:2, Interesting)
Naive Article (Score:1, Interesting)
Secondly, (as many posters have indicated), there is a difference between consultation and copying. I can't tell you how many CS-1 students I have had over the years who were mentally unable to either read or write code! In many cases, these were hard-working students who, while otherwise highly intelligent, simply did not have the correct wiring in their brains to be programmers. Driven to desperation (these students are entirely un-used to failing courses), they take someone else's code, and, with no understanding of what the code actually does, make cosmetic changes, and submit the program as their own.
Programs modified in this way can be matched with their original in several ways, most of which require no CPU assistance: 1) The program contains a wierd error or unusual logic. When the grader sees this, a bell goes off in the brain, and then you look for the program where you first saw the same pattern. 2) Identifiers are slightly "off". For example, a variable that should be named "speed" is instead named "how_fast". When many identifiers are slightly misnamed, it is a good sign of use of global-search-and-replace. 3) (Used after plagiarism is suspected) The window test: put one listing on top of the other, looking to see the "envelope" of the text to see if they essentially match. 4) A computer program can match programs using several techniques. It can count tokens, compute a moment of inertia on the text, or do a simple diff, among other techniques. This fourth family of techniques works better than the others when you are mass-producing CS-1 students; the others work better when the professor does the actual grading him(her)self.
In any case, an accusation of plagiarism should not be made until a human being has personnally inspected both listings. In my case, I don't assert plagiarism. I summon the individuals into my office and state, "These two programs look remarkably alike. Would you care to explain?" Usually, self-incrimination is the result.
More-sophisticated copying probably cannot be easily caught. However, it is usually easier to write the assignment yourself than to take someone else's assignment and re-write it so that the copying won't be detected. In any event, if a student has the mental wherewithal to edit another program into something that both evades plagiarism-detection and also works, that student probably could successfully write it him(her)self anyway.
In my courses, I have just introduced a new plagiarism-deterrence policy: on exams, I will give a problem on an exam that is similar to one that had been assigned as a programming project. My syllabus specifies that if the student cannot solve a problem on an exam that (s)he has solved successfully on a project, that project will be assumed to have been plagiarized.
More of Deterrent.. (Score:2, Interesting)
Re:Talk about cheating (Score:3, Interesting)
Of course, he was also aware of the limitations of the program (given that he wrote it), so I don't believe that he took the statistics as the sole sole arbitrator of whether or not students were stealing code.
On the other end of the scale, Dan once found out that someone had published a solution to one of his assignments. He publicly announded in class that he was aware of the cheat. For anybody who had already submitted copied code and couldn't come up with a 'real' solution, he offered a partial amnesty (a zero on the assignment, but otherwise no punnishment) for people who came forward and fessed. He also warned that anybody trying to sneak the cheat past him would be failed from the course.
Despite his warning, a number of students still submitted the known cheat. Some blindly submitted the file without any edits whatsoever -- not even bothering to fix a simple syntax error that kept the program from compiling. As promised, they were removed from the course and reported for cheating.
There's no accounting for abject stupidity.