Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?

Closed-Source Tests 122

The NYTimes has a lengthy expose of the actions of a company that creates and administers standardized tests, one destined for RISKS Digest very shortly. A bug in their software sent students to summer school and resulted in teachers and superintendents being fired from their jobs, even though the company was notified of problems early. It's a fascinating story of the risks of going with a closed source vendor - how the company acts to perform damage control, lies, stalls, compartmentalizes the damage by telling each complainer that they are the only one experiencing problems, and finally, most of a year after being notified of the problem, fixes the bug. (It's a two-part series - the first part discusses problems with human scoring of tests.)
This discussion has been archived. No new comments can be posted.

Closed-Source Tests

Comments Filter:
  • by Anonymous Coward
    This is not just about closed source software companies. This is endemic to ANY business with a customer service angle, which is, well, EVERY business.

    CS Drone: Hello, thank you for calling Frito-Lay Customer Service...

    Me: Your Olean potato chips made me shit blood. Is there some sort of problem with this product or is it just me?

    CS: Oh no sir, I'm sure it was something else you ate...have you spoken to your doctor?

    Oh yes, first karma whoring no registration link [nytimes.com].

  • by Anonymous Coward
    Yes, it could happen with open source.

    But, if it did, the guilty party would be naked and exposed and appropriate action could have been taken in time to avoid costly damage.

    With typical closed source, the damage is covered up as long as possible, the buck is passed becsuse it is always someone else's fault and the ultimate damage is maximized.

    Just ask yourself, why has Microsoft not caught on in avionics software or nuclear powerplant software, not even in the software for your car's controls?

    If I must fly in an aircraft, I want to know that the design engineer is held fully accountable for **all** of his work. I do not wish him to claim in an accident investigation, "but I can't be held accountable for the software, because it was a black box, but it was done by Microsoft and everybody knows that they are infallible!!!".

  • In any large school district that has both poorer and well-to-do schools, where do the newly hired teachers get assigned? Answer: The poorer schools. The poorer schools are also where teachers get transferred to when they're "disciplined" for something but not able to be fired for whatever reason.

    The crime free schools in the clean white neighborhoods are considered "sweet jobs" where teachers with "a lot of seniority" get to work at.

    Is it any wonder that poorer schools do miserable on standardized tests? They're treated like jails, not only from the student side BUT FROM THE TEACHING STAFF SIDE as well. Ever see the James Belushi movie, "The Principal"? There's some truth to that.

    This problem is not unique to schools, either. New cops get put on the worst beats too.

  • by Anonymous Coward on Monday May 21, 2001 @11:40AM (#208000)
    ...what are these pencil-scrawled changes on your report card?" "Corrections. Just a software bug. Will you just sign it already?"
  • Could still be worse. Fast forward to 'Gattaca' or some similar dystopia and the kid could be painlessly destroyed- or subjected to a course of medication to correct defectives. One where there is risk of death or permanent brain damage, but hey, if the subject is already defective what's the diff?
  • by Danse ( 1026 )

    What's with all the pathetic replies that do nothing but bash the author of the post? Ya'll can't come up with a decent rebuttal? Maybe you should reconsider the value of teaching kids to think instead of teaching them to memorize.

  • Never said anything about them "b[ing] criativ indevijuals an fele GUD aboute themsalvs" being the only thing that matters. There will always be a certain amount of memorization that is necessary. I'm just saying that memorization isn't gonna do them a damn bit of good if you haven't taught them to think properly for themselves as well. Something that seems to be completely overlooked as teachers scramble to drill facts into the kids so they can pass the tests.

  • Well, simply put, there are number of very important skills we should be teaching students that are not easily tested on a multiple guess exam. Critical thinking skills, application of known skills to unfamiliar problems, experimentation.

    The extreme example of these are the Japanese Jukart (sp?) schools, the test drilling schools. While in grad. school I encountered several japanese grad students who attended those schools, and who had absolutely amazing physics gre scores. But in the end they had a very hard time making it, because they had only been taught how to take tests, but not how to apply any of their schooling in real world situations.

    So how do you change things? Well, don't eliminate the tests, tests should be given, but should not be relied upon to give the complete picture of a student's achievement. The problem is that other methods (student observations, interviews) have a tendancy to be subjective and much more expensive to administer. I admit I don't know what the solution is, but I do know that going down the path of using standardized tests as the only measure of a student's potential is a really stupid idea.

  • I work at a school that is so desperate to have a BIG IT presence that they have morons on the staff and even a STUDENT teaching some of their IT courses.

    It's pretty bad when you sit around explaining TCP/IP to someone who is supposed to be teaching it in their class that night.

    Of course this has nothing to do with standardized tests, but just more to the general downward slope the whole US education system seems to be on.
  • I think what some folks might be forgetting here is this software doesn't work with a $129 Canon scanner. They use $100K high-speed machines [ncs.com], and much of the translation is done during the scanning process by customer software intefacing with the scanner company toolkit. You don't do this stuff for fun - trust me on this.

    After all, if the scanning software returns the wrong code the question will come up wrong even with the RIGHT answer key.

    But there's plenty of space in here for error: The operators are working 2nd and 3rd shifts stuffing piles of nasty test papers into a scanner (up to 6000/hr. for an NCS 5000i) taking pieces of paper and other garbage out, etc. etc.

    How do I know? Don't ask.
  • Because cheap scanners don't do the same kind of optical mark reading. The OMR scanners key off of marks on the page margins to provide information about the dark spots in a row as well as simply scan the document. Then they go through several levels of quality control - there are a lot of people who work on those things.

    And this is a service that handles millions of tests per year, which is why it's centralized and industrialized. You know, like people pay to outsource things so they don't have to do it themselves?

    My point is this is exactly the kind of situation which open source is NOT good at, simply because of the expensive specialized hardware involved.

    Sorry, you can't win 'em all!
  • So what's your solution? Non-standardized tests? Have every school district make up their own "special" test? Why do you consider this testing any more a "trivia quiz" the normal test teachers make up?

    The point of the article was that New York incorrectly based their cutoff scores for summer school on the test. The people at CTB/McGraw-Hill told them *not* to. Sounds like your example of students zoning out after the SOL is the same problem, but it has nothing to do with the test.

    Your rant about treating kids like potentials is heartwarming for a picket sign, but if you can't give an example how to change things, it's just hot wind. These tests try to *raise* the lowest common denominator. The brightest kids do fine. What's wrong with that?

    I'm sure that you have years of experience in testing science so tell me what type of testing is used for kids who think and solve problems? What's your idea? What type of test do you recommend?

  • As much as all of you appear to enjoy making fun of this comment, it contains a lot of truth.

    Texas has had a standardized test in place for several years, and the only real effect that I've seen is that schools are tending to focus more and more on teaching TO the test. If I remember correctly, at least one school district was in a heap of trouble a few years ago because they falsified their TAAS test results. It's more important to them that their schools score well on the test than it is for them to crank out well-educated kids.

    My wife, a teacher-in-training, just witnessed two weeks of in-class non-learning. These kids had already taken their TAAS test and knew that nothing they did would affect their grades. One might expect this kind of behavior out of high-school-age children, but these kids were in the 4th grade.

    What does THAT say about these kinds of tests?

    --- Chris
  • Everywhere you see, business get too greedy and screw up. Power generation, prisons, telephone service, education.. When an industry is successfu, 9 times out of then you know that government had to hand hold them up until they make the big money and then they do a Cheney in public and claim the govermnet had nothing to do with it, examples - the Internet, defence and space projects... Every libertarian and republican should realize this before more real people with real children get affected, like they did in NYC and Indiana...

    I say sue the testing outfit to bankruptcy, let the stupid Bush (redundant) bankruptcy "reform" law do some good.
  • On the other hand, local assessments are still (emperically) better predictors of future success than standardized assessments.

    More bluntly: standardized tests don't work.
  • I know several people who have worked for standardized test manufacturers (there's a big one locally). They all left because the entire process was such a crock. From the way the tests are created, to the way the results are interpreted, it's all a pile of assumptions and conjectures.

    And the company was held to no standard whatever by any of their clients (various state school districts). Because of the "high standards!" craze sweeping the nation, they have guaranteed fat money coming down the pipe. They would ship tests that weren't normed correctly, bill clients extra hours to fund parties, etc., with no thought that maybe the money would dry up. It won't, as long as this political craziness continues.

    We can all speculate about where their corporate campaign contributions were going.
  • by bstadil ( 7110 ) on Monday May 21, 2001 @11:54AM (#208013) Homepage
    The story has an example that the Answersheet had 6 wrong out of 68 questions. As incompetent as this seems its mindbuggling that they do not pareto the answers and flags where the "right" answer is not the highest ranked. Much better strategies can be devised but we are not talking rocket science here. Letting the schools have access to the data coupled with a few lines of perl would fix thes kinds of problems.

  • The problem was a design/logical/programming error.

    The error had - erroneously - made the current test appear easier than the previous year's. To make the tests equal in difficulty, the computer had then compensated by making it harder for some students to do as well as they had last time. The error did not change students' right and wrong answers, but it did affect their comparative percentile scores.

    And regarding the 'disclosure of questions', I don't think we're asking for to much if we want the questions/ansvers one year after the testing, it may be to late for correcting the damage, but at least the testing company will know that it's work will be doublechecked.

    This process is necessary so scores one year can be compared with those from previous years, even if different questions are used. States ask for new questions because they are worried the old questions will leak out.

    Also, there is no such thing as a "computer error", not any more than there is a "pencil error" when writing.
    echo '[q]sa[ln0=aln80~Psnlbx]16isb15CB32EF3AF9C0E5D7272 C3AF4F2snlbxq'|dc

  • CTB's error hit hardest in New York City, the nation's largest school system. Apart from the children, the most prominent victim may have been the city's schools chancellor, Rudy Crew. The error showed - incorrectly - that reading scores citywide had stagnated after rising for two years, raising questions about Dr. Crew's leadership. Within months, he was out of a job.

    Before the mistake was discovered, Dr. Crew had been a leading advocate for using standardized tests to hold students and educators accountable.

    In the immortal words of WS Gilbert, the punishment fits the crime! [boisestate.edu]
  • by VValdo ( 10446 ) on Monday May 21, 2001 @12:43PM (#208016)
    I know this is one of those big words everyone likes to use on slashdot (along with "obfuscate" and a few others), but since this is a thread on education, I hope you'll forgive me.

    v. tr. obviated, obviating, obviates.

    To anticipate and dispose of effectively; render unnecessary.

    Maybe you meant "illustrates" or "highlights" or "illuminates"?

  • Gorilla - Adaptive tests don't work that way. ya get one right, they give ya a harder question, get it wrong and it gets a little easier...til the computer can figure out what ya know and don't know. If ya are at the top of the test because ya just happened to get the right questions then there is something wrong with the test as it should be punishing ya with harder and harder and harder q's. Yeah, this is all simplified how it works (ya get into baysian logic and stuff) but it its essentially how it works. It aint all about random questions...

    Heh! This is already day old news, but I figure folks like me will read back to see if anyone posted any response a few days later :)

  • by clifyt ( 11768 ) on Monday May 21, 2001 @12:09PM (#208018)
    From what I understand, Opensourcing the thing wouldn't have done a damn thing.

    In my day job, I am Manager of Development at the Indiana University - Purdue Universities Testing Center. I've read quite a bit on this and have evaluated these guys software and didn't care mch for it (could be that my own software comes up with higher predictors than theirs and was much more flexible). With adaptive testing like their own (and this is all in laymens terms lest one of the wanna be psychometricists wants to correct me), ya build the item database, calibrate it, evaluate it and then calibrate it some more. Real testing may be going on in all this time, but even static items will be somewhat liquid in their numbers over years times.

    Unfortunately, companies like this like to change as many questions each year as possible. Doing this means that you will have better test security, but your items may not have all the correct weighting behind them. How does one Open Source this without loosing all data ya need to make this stuff adaptive. With standard testing, ya may ask 200 questions and a lot of times you are simply measuring a persons ability to do lots of work in a set amount of time. Adaptive testing uses a lot of calculations to figure out what ya know and what ya don't know. Instead of the 200 questions, ya might get 20 (or less on some of the new standardized ones) that you are free to take as long as you'd like.

    If the person taking this knew even a few of the questions they got before hand, this would throw off the entire test. If you don't think folks cheat on these types of tests, you are an idiot. There are school systems that have gotten ahold of written tests and drilled their students on the exact questions presented. On the high stakes testing, we find folks that will go to such lenghts as to take the test on the east coast in the morning under ficticious names, fly across the country to California and retake the afternoon test. There was a case where Law Students were memorizing one question each and as soon as they were outta the test, would cell phone in the questons, and someone would be selling code keyed pencils (we have one of these :) with different patterns from the different version tests (I think a set of these put the students back about $5k each).

    Anywho, no amount of Open Sourcing would have helped. Bad Software wasn't written, a bad analysis of the data was probably done. OS is not the answer to everything in life...

    Clif Marsiglio
  • It would also give the local school authorities an opportunity to cook the results. With so much riding on the results of the tests, the temptation to alter the results would be severe.
  • by Detritus ( 11846 ) on Monday May 21, 2001 @12:03PM (#208020) Homepage
    Why didn't the company validate the tests and the scoring process before releasing them to the school systems? There can be errors in the code, requirements and statistical models and techniques. They could have given the new tests to a sample of students along with reference tests that are never widely distributed. A comparison of the scores on the two tests should uncover any major errors.
  • Only if they're Mattel.
  • What is needed here is a peer process. Teachers get certified about what is important and what is not. They they evaluate the student as part of their classwork, and input to the class.

    Personality conflicts can cause problems with this, but if there is an appeals process of some kind, most of this can be worked out.

    Tests do not even begin to reveal a students achivements in school, or their worth to society. Peer review does.

    Imagine the students testing themselves. They know the requirements, let them work toward them. I am not saying let the students choose if they pass or fail, but make them involved in the process so they understand it, and can help each other.

    I know what I did in school. All of the really good stuff that mattered was not on the tests. It was the projects I did, and the papers I wrote, and the arguments I had with staff and friends.

    Of all the classes I have to say that Music and Drama were the most interesting from a testing point of view. These classes are peer reviewed by their nature. How do you know you are doing well? Do others say so? Did your performance at the play get some applause? Your teacher is a mentor in these sort of things. They take what is there and improve it. You will get an 'A' anyway, so why work hard at all? If things work the way they are supposed to in school, the teacher gets you motivated, and sets direction, your peers give you someone to work with and achieve goals and share success. Standardized tests totally ruin all of this.

    Point is simple. Teachers know the students best. Most of them actually care even if they are underpaid. Let them make choices, and help to form good citizens. Taking all the hard work, and boiling it down to one test is stupid. Even a genius will have a bad day. Should the rest of their life be changed because of it?

    Don't think so.
  • The simple-minded solution would be to have a 'user group' - i.e., a group formed of a representative educator from each state where the exams are given. Such a group should quickly discover such a nation-wide pattern. Yet, apparently the people in Tennessee and in New York weren't talking. Should one conclude the educators themselves were part of the coverup, trying to conceal their states' performance? Or are they just so parochial that they do not notice what is happening in other states?
  • People forget how the market works when so much of the market isn't working.

    When a companys product is known to be defective you stop using the product.
    Companys usually respond becouse eventually they will be cought.
    The software industry ignores this rule becouse so often after being cought in lie the company in question continues to get away with it. Something is broken.

    Open source and Linux are solutions in that they provide an alternitive that isn't crushed by monopolist tactics. Not becouse open source produces a better product every time but that right now close source is producing a poor product and an alternitive is nessisary.

    Open source produces quality even when users have no alternitive. Open source isn't dependent on market demands so it isn't effected by them. If it didn't have another way to produce quality none would happen.

    Closed produces quality as a direct result of compeditiveness. Plain and simple. They are not the only kids on the block and if they don't provide the quality software tools users demand the users will get them elsewhere.

    With open source if the software tools the users demand are missing some users add them to the software. Then we all have them.

    Open source dosn't produce better tools in a healthy market. But when the market is poisoned the quality of open source remains strong.

    I'm not saying open source dosn't produce quality software. It dose. It just dosn't need market demand to make it happen.

    If all were healthy Linux and Windows wouldn't be very diffrent in quality.

    Microsoft is cought up in this game of ignoring consummer complaints. Usually companys die when they do that. It's not normal.

    I'm sure in the future we'll run into some grand examples of poiosned open source work. Right now however the poison is over at Microsoft.
  • I would think one could find the errors in the answer key simply by looking at the distribution of responses... Shouldn't the right be the most popular one if the question is not misleading or tricky? Or take the top 10% of the respondees and look at their answers... wouldn't it be a bit surprising that 75% of those answer 'a' when the answer key states 'b' is the answer?
    The questions with poor success rate should be checked over, not only to make sure the answer key is right but to evaluate the question's validity.
    A solution to the 'open question-answer' problem could be the publication of answer distribution (right answer: 70%, wrong answer a: 11.2%,...) and the questions for which the distribution looks wrong could be eliminated from the scoring or else published with its choices.
    My 0.03$ canadian.
  • Excellent troll. The best that I have seen in a long time...
  • While I don't think I'm yet quite old enough to not want to deal with tests (I'm only 28!) I do remember some poorly written tests in grad school. So I made every multiple guess test a short answer type test. Luckily, my classes were small enough that my profs would look at this.

    My first child is due to be born any day now. Idiocy like this might make him home-schooled (much as I dread that). My wife largely lost her teaching job due to not teaching to the test (PG County MD if that means anything) and now works at a private company teaching kids to write and do math that they don't learn at the regular schools (too busy teaching to the test). But, to bring up home-schooling again, she's also teaching reading, writing, math to kids who have parents who are totally unqualified to home school anything, yet think they are.

    I need to find a Jedi and just let my kid become a padawan...

  • You've never programmed assembler have you?? Writing the ascii characters in hex is not easy in the beginning :-)

    Which of these platforms do you know:
    -BOPS 2040

    I found out that "Hello world" (or the hex variant 0xc001dea1 or 0xc001babe) is quite a valuable program. If that works you know your buses are working and the core is running.

    I'm still young so I still have a lot of architectures in front of me...

  • No. Even if you have source code, and recompile the source, then you do not know what the object code being run will be. See Reflections on Trusting Trust, Ken Thompson, Communication of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763. [acm.org]
  • Adaptive testing uses a lot of calculations to figure out what ya know and what ya don't know. Instead of the 200 questions, ya might get 20 (or less on some of the new standardized ones) that you are free to take as long as you'd like.

    I think we should be asking bigger questions. Is this form of testing worthwhile? To my mind, a heck of a lot of these are simply tests of how good you are at taking exams. If you happen to be the sort of person who can memorize large amounts of information and retrieve it in the time period you'll do well. Reducing the number of questions asked makes it more and more a instance of luck instead of skill. If the 20 questions asked happened to be ones you know, then you score brilliantly. On the other hand, if they happen to be questions you don't know, then you're stuck. This is acceptable on "Who wants to be a millionare", when nothing of consequence is at stake, but it's not acceptable in things which will affect a person's whole life. Unfortunatly, those exams which are useful at telling the true skills of a student, eg open book essay questions, are expensive to take & grade, and therefore not the sort which are being forced upon us.

  • But harder is not absolute. It depends on your knowledge mapping to the question. If the subject was geography, and they asked you which river runs through Paris, and you happened to take your last vacation in Paris, you'd have a good chance of getting the right answer. Does this mean that you are good at geography? No, if they'd asked you the same question about Glasgow, you'd have had no clue, so the Paris question is for you, much easier than the Glasgow one. For someone who happened to go to Scotland on vacation, the opposite would be true. With 20 questions or less, then it would not take a lot of co-incidence for the questions to happen to map really well, or really badly, onto your knowledge.
  • Consider, please that all of these errors are from someone who claims that his state has had these tests for "many years". I think that this bodes ill for the value of these tests to the educational system.
    I've got some good news and some bad news.

    The bad news is that instituting standardized testing has caused a 20% decrease in the quality of education that students have been recieving.

    The good news is that we've been able to measure that drop very accurately.

  • People advocating the use of these tests to determine the lives of millions of students seem to be studiously ignoring the fact that the companies providing these tests all but ship them with cover letters saying " PLEASE don't do that! .

    If you read the prequel article, some testing executives have justified providing tests into such an environment with the well known refrain that "If I didn't do it, someone else would". Despite such attitude some of them have still refused to bid on some of the projects because they were just SO indefensible.
    Unfortunately, the larger the contract, the less likely that a company capabable of handling it will walk away because it's being used rong. When President Bush offers a half-billion dollar RFQ my guess is that company executives would be under serious pressure from "the shareholders" to not stand up and say

    "This is insane!"

  • It can happen. In second year university, my assembly Language course was required course. I'd already been doing lots of assembly work, and my TA knew me because I'd helped him a number of times. When he walked in to the first class and saw me sitting at a desk, he exclaimed

    "What the hell are You doing here!?"

    I chose my section of class because the professor in question was used to a different processor (intel) than I was (IBM 370) and which the course was based on (this was the early '80s). Both my professor and my TA used me as an in-class resource.

    Now, it's not that these people weren't good. They were good (especially the professor). It's just that I'd already gotten a lot of experience in that area on my own. I guess that another way of putting it would be that I was the weird one, not them. In an environment where there weren't such hardwired rules as to who could teach what, I probably could have ended up teaching, or at least TAing) the coures.

  • True, but open-source software is usually associated with other forms of openness - developers mailing lists and so forth. Had general principles of openness been applied here, the scoring problem would not have been allowed to affect test results. As the problem was known about anyway, it should not have been allowed to affect the results in the first place.

    In this incidence, the use of a central location did not ensure that the ground was even. Maybe open source is not the solution, but more openness in general would at least have made the problem more inexecusable, and maybe would have prevented it from affecting peoples test scores. A problem was found, and no one was told about it, and it was allowed to do harm.
  • by ahunter ( 48990 ) on Monday May 21, 2001 @12:01PM (#208036)
    Sheesh, read the article. The very first paragraph states that someone outside the company had found out about the problem and notified the company, who promptly sat on it until it went bad.

    A school district might not be able to justify the money to check a system, but I suspect it could not justify using a system with known errors and would have an interest in getting it fixed.

  • I think I took the same test about a year ago. The funny thing was, I was going for a server-side java contract and the test asked, nearly, all AWT/Swing questions. Plus, some of the questions were so ambiguous as to be meaningless. I must have passed because I got the contract. But I'm not sure what anyone can tell about me from the results of that test.

    Hint to people that take these tests: if you are doing it from home like myself, keep the two volume Core Java books handy. I answered half the questions by looking the answers up while I was taking the test.

  • Perhaps the most brazen of this behavior was the administrators decision not to speak out about his concern because of the fear he had about his reputation.

    From the article: But as a national spokesman for the movement toward standardized assessment, Dr. Crew decided his credibility would be lost. He thought he would be seen as a crybaby.

    Yes, it's true he didn't speak up because he feared for his credibility, but in fact, he had made that decision because he accepted the results that the testing company were pushing at him. He was actually doing the right thing and accepting the responsibility that he had thrust upon himself. The testing company was giving him data that showed that kids weren't working up to the levels he wanted them to be working at. He had no other facts to negate the results, so he accepted responsibility and therefore lost his job. I see that as a very honorable thing and worthy of any good leader. NOW, on the other hand, the leadership of the testing company is another matter, and I would agree with everything you state, as long as it is geared towards them and not towards the school administrator.


  • by TomL ( 63825 ) on Monday May 21, 2001 @11:52AM (#208039) Homepage
    i always thought that errors in standardized test scoring put me in a talented and gifted class (i got kicked out of it after two years due to bad academic performance, thank god, heh).
  • While I agree with the idea that someone should be made responsible for this, exactly who should shoulder what part of the blame?

    On the one hand, we have the testing agency. They had problems with their software, lax quality control, and a PHB who withheld information from the schools. They scored the tests incorrectly (with regards to rankings and previous years), and as a result a lot of people got fired and a lot of kids spent a summer in school needlessly.

    On the other hand, we have the NYC school board. They made the decision to make the standardized test the end-all, be-all, despite the testing company's recommendations to the contrary. Even if the tests were scored and ranked properly, you have the opportunity for kids who test badly to end up in summer school.

    I think that it was "right" for Crew to lose his job, but not for the reasons that were given. Whether there were more people in the school system that needed to share the culpability will probably never be known. However I think that the testing company got off far too easily. And unfortunately, if any of the faculty in NYC that were fired because of the test results tried to sue them, they'd likely weasel out of it by saying "We didn't tell them to use this as a yardstick!"


  • Please keep in mind that I'm not trying to shift blame either way.

    I work for one of these companies. So just for some "inside info" here's the lowdown.

    As far as the QA process goes it is pretty good. In fact many states QA after we QA to double check. The fact is is that things are missed. You wouldn't believe the data that comes through the systems and while everything is supposed to be accounted for it sometimes isn't. I have to say that the testing process is extensive. Often times though the customer demands that the final product be delievered even when we know that the product has not been tested.

    As far as Open Source code, to the best of my knowledge there is no problem with a customer looking at our code. Do they have the monetary resources to hire somebody to look at it? Will that person interpret the specifications correctly? No they don't have the money and it takes a while to learn the process correctly.

    I have never scored an essay in my life but I can see where it would get terribly monotonous. Add on top of that supervisers pushing supervisors pushing little people and the monotonous job becomes a fast paced let's get it out the door deal.

    I do have a problem with schools using these tests as a high stakes test where students graduate/not graduate or similar repricussions. Just as anything created by man there is possibility of error. Just think of how many various things have been recalled. Heck, yesterday I saw a recall at a store for a Halloween Costume because it was flammable. It's May folks. Halloween was October. But stuff like this happens be in consumer goods, software, or whatever.

    I do think the tests are worthwhile (mildly) in comparison to other students, and to see if the student has improved. I do not think they should be used to decide a students fate by any means.

    I dunno... I don't feel like typing anymore 'cause I could go on and on... anyhow just in case you weren't aware these opinions are my own and not the opinion of the company I work for. If you have any questions feel free to ask them and I'll try to answer them (in my opinion and with the information I have without giving away "those things you're not supposed to when you work for a company").
  • One might also need the data -- their collection of previous raw test scores and so forth, which the system used to determine relative difficulty of exams from year to year. Just the source code might not be as useful if the bugs were subtle and unusual.
  • I see it now, the company claiming their EULA protects them from anything their software might or might not do. The school system saying it is the companies fault and everybody in court. The students who may lose scholarships, parents who now have to pay because of it, teachers and staff that lost their positions, and of course the townspeople of any school district that is somehow in a legal battle.

    In abstract it falls under the "okay we need to do this better next time.." however real people had a real impact on their lives. How many teachers would have a hard time getting hired AND be able to prove it? Like a claim of "improper behavior," this is not going away even if the screw up is completely resolved. The next school will wonder, are they a screw up or were they screwed? Better to fall on the side of "screw up" at least I won't be help responsible later.

  • This story reminds me of the Therac-25 X Ray machine accidents (do a google search). Of course, instead of getting test scores mixed up, people died.

    Poor QA and safety analysis. Trust in a product that was supposed to be fullproof. Lengthy process to fix the problem and recover damages.

    Frankly I think the problem is more that the company behavior was unethical and immoral. An all too common problem when dealing with money and people.
  • er, make that Obscure Store [obscurestore.com]
  • Boy takes two years of special ed after getting bum I.Q. results [ottawacitizen.com]

    "Mr. Kumpula hired a psychologist to review the original test results and found the boy's IQ was significantly higher than calculated. On a revised version of the same IQ test, the boy scored 99, which is within the average range of intelligence. Normal intelligence is between 85 and 110, plus or minus five, the statement of claim said.

    "Edmonton lawyer Scott Schlosser, who is acting for the Kumpula family, said the boy has subsequently scored 113, which is considered high average. "

    "The board has denied the allegations, none of which have so far been proved."

    (found this via the excellent Obsucre Store [obsucrestore.com])

  • Actually, that raises an interesting point which might apply to other areas like Carnivore. Is it possible to construct a computer which is trusted by two adversarial parties? In other words, the computer can prove to everyone that it is running the same code it claims to be running.
    This could easily be done with the assistance of a trusted party, which would modify the kernel. Doing it fully 'peer to peer' sounds hard, though.
  • I haven't thought this position through, but I increasingly believe that the 'closed answers' regime is a deeply unfair one, relying as it does on 'security through obscurity.' It creates a small minority of test-takers who know all the (previously asked) questions and answers. I think that this group exists for every sizeable test of this type.
    I wonder if this could be solved with parametrically generated questions - the question itself is merely a framework or script which when run (on a per-test-taker basis) creates a unique question instance.
    Under this scheme, the parametric templates would be released to the students, and would in fact constitute the definitive curriculum.
  • Just because the scanning has been centralized and industrialized on a massive scale doesn't mean it should be. Instead of shipping tons of paper to a central location to put it through a very expensive, fast machine, why not scan it locally through a cheap scanner? Then a computer can upload the data to a central server.
    Another approach would be to use fax machines calling a tally server, since sheet-feeding fax machines are more common than sheet-feeding scanners.

    To put it another way, the power of the large scanner has become a self-fulfilling prophecy - we aggregate because we have large scanners, and need large scanners because we aggegate. But with the internet, physical aggregation is not needed for data aggregation.
  • Once the test are out of the way the kids spend three weeks until the end of the year loafing in class as the teachers have no reason to give them a final, they already had it and passed their SOL.Just and example of how the schools are warping to fit around the SOLs , soon they will be the official final

    There are models where this works better, and so there is a baby/bathwater issue here. One methodology is to collect the results within one school for students over a period of time (say two years) and use these results (gathered, by whatever means that you deem fit. A mix of subjective and objective analysis worked for me) to rank and "spread" the students within the school. THEN use the schools overall results in some form of standardized "statewide" test to determine where that school fits in the overall scheme of things. One can then present two results 1) the standardised Assessment score 2) the public examination score. Anyone looking at the results can then examine the discrepancy (where once exists) and it is remarkable how little discrepancy there is.

    Standardisation of scores has benefits even when one values a more "expansive" approach to the education process. In the long run, how much does it really matter anyway, we all know that the results you get in the end of your schooling mean nothing, right?

    BTW the methodology outlined above is that used in the NSW (In Australia) school system for final assessment of year 12 students (17-18 year olds). Approx 65,000 per annum.

  • by 11thangel ( 103409 ) on Monday May 21, 2001 @11:38AM (#208051) Homepage
    Could the same bug have resulted in my Computer Programming teacher being hired? I still find it quite odd that she is teaching a top level programming class, yet doesnt understand what a function is. I still recall trying to explain the word "filesystem" to her. Hopefully, the same bug will assist my english grade (hey, it couldnt get worse).
  • In my view, this obviates the need for better auditing of the process. If you're going to use a piece of software for anything this important, then you need to be certain that it will work correctly. IMHO, the publisher was very nearly criminally negligent on this issue.

    'Course, I also think that this is yet another good example of the need for open source software, but that's me.
  • Regurgitating facts doesn't mean you can do analytical thinking well, doesn't mean you have the skills to form your own opinions about confusing issues, doesn't mean you can write to impress people, and doesn't mean that you can engage in creative thinking and/or problem solving. Furthermore, "facts" are only as good as the source. Are we teaching kids to just believe anything that is presented as "fact"?

    My friend is becoming a high school teacher. She says it's hard to get them to do ANYTHING but worksheets where they regurgitate answers anyway, not because they are lazy (which they certainly can be) but because they've been beaten down all these years to expect that's what you do in school, not actually LEARN things!

    What's the FIRST and most immediate thing you do in your family when you're running out of money? Get an additional job or try to reduce your spending? Now why is it that Cheney and his henchmonkey Dubya would have us work on SUPPLY before DEMAND, and then call it an American way of life? - my own political rant
  • Some of my favourite memories of high school and university were spotting errors in exam questions. The best one was in a SmallTalk exam where they gave us half a page of code, then spent several pages asking what outputs would be generated based on various inputs. I spotted that there was a period missing (same effect as a missing semicolon in most other languages). So I simply wrote "parse error, line 18" as an answer to each question.

    The problem was that we never got to examine the marked exams, so to this day I don't know if a clueless droid marked me as wrong, or had a good laugh and granted me the marks.

  • You're overlooking the fact that schools also have thousands of parents, and some of those parents would have had the technical sophistication to find such a bug and fix it, free of charge, if the source had been open. At the very least, they could have confirmed that something fishy was indeed going on. And it would have taken a hell of a lot less time than 8 months.

  • Do you have any idea the amount of time/money those students who had to take summer school lost? Add all of those together and this becomes a lot more serious. I guess that the fact that several high-quality administrators who were getting results (including one of the better chancellors of NYC schools in recent history) were fired because the tests didn't show the results they were getting is somewhat less serious. Yeesh.
  • I say that this is just symptomatic of a much bigger problem in the first place: Computer systems not having the proper amount of human oversight.

    Credit bureaus rely on their use of the computing systems for pretty much everything, and look how hard it is to get any error fixed at all.

    This could just as easily have been a private prison company (Which most all prisons in CA are) accidentally sending your traffic-ticket offender to a high security felon bin for 20 years instead of a 6 month stint for not paying their bills.
    ------------------------------------------ --------
  • Proper scoring software should detect a bad answer key.

    A basic concept of standardized test design is that each question should have some degree of predictive power for the final result. A bad question or a wrong answer key should make that question show up as a terrible predictor. Test scoring software should be checking this.

    Informally, one way to look at this is to look at the exams for students who score high and see what questions many of them seemed to get wrong. Questions consistently missed by high scorers are in some way defective. This is basic test quality control.

    New York State regulates tests by law. There's a Ttruth in testing" [state.ny.us] law. Test-takers have the right to see their questions, answers, and the correct answers after the test. So something like this should have been noticed.

  • Given the fact that I count three spelling mistakes in the title and first paragraph of your post, let alone the rest, perhaps these tests are not such a bad idea....
  • One reason some school districts like the Terra Nova-based tests is that it is pretty dumbed down.
    An acquantaince of ours had a highly gifted kid that they were trying to get a grade skip for. She got a shockingly poor (for her) score of 70 percentile in one section on one of the Terra Nova tests and this was being used by the school district as an argument that she wasn't advanced enough for a skip. After getting the raw scores (which there was a lot of resistance to giving, but our acquantance is...tenacious), they found out that she had gotten every question in the section correct. But so had 30% of the students taking it, so that was a 70 percentile. On some of the other standardized tests, the report has a special notation if you "topped out" the test; the
    Terra Nova test didn't show that. But most shocking was that 30% of the students got every quetsion in the section right. That indicates an awfully dumbed down test.
  • Did you even read the article? It has to do with systemic problems in the testing system. More and more tests are being asked for, with each being more and more comprehensive and tailored to the state's own criteria. This has nothing to do with what kind of software they are running. It has to do with them following their own procedures for quality assurance. It has to do with under-trained people not adhering to standards of grading. It has to do with schools placing the importance of those tests far higher than they should be, in life-altering decisions. What does that have to do with what kind of software they use?
  • ...but some companies handle crisis differently.

    NCS Pearson, McGraw-Hill's largest competitor, screwed up big time on standardized tests for the state of Minnesota. But they handled it a lot better [presentations.com].
  • This is probably a really old problem--I experienced it way back in 1990, when I was taught programming by the typing teacher. At that time, rumor had it that the class wasn't going to be taught again, because new regulations required that someone teaching a programming class must be certified with a masters degree. There are very few masters-degreed computer geeks who would be willing to work for $17,000, not to mention going back to a place that most of them were ridiculed and mocked at. To top it off, to teach high school, you also usually need to be certified, which eliminated the possibility of hiring "adjunct" faculty like they do at colleges.

    My question is, which would you prefer: a lame computer class that provides some structure and a potential for learning new things, or no class at all? In today's public school system, those appear to be the choices. The truth is, nobody really learns how to program in a class anyway--they learn how to program by programming, often as a requirement for getting a grade in a class, but frequently not.

  • by faust2097 ( 137829 ) on Monday May 21, 2001 @11:44AM (#208064)
    This has a lot less to do with closed source than it does with quality control in general. Just because someting's OSS doesn't mean anyone else has actually looked at the code. School districts aren't know for having budgets for consultants to check it out...
  • If you look at the clickwrap and the UCITA, these damages from known bugs are limited.

    Think of the graduating senior who now has to go to summer school instead of working for tuition money. This could delay their degree by a year. All caused by a known bug!

    Should the software company be held harmless?

  • It's a fascinating story of the risks of going with a closed source vendor

    Sorry, but no, it's not. All software has bugs. Open Source or Closed Source.

    "Whine whine, but if it was open source, they'd have found the bug and fixed it!"

    Uh huh. Someone tell me again how long that root compromise was in BIND before it was discovered?

    Is someone going to tell me with a straight face that there's a community of developers just salivating to pour over the source code to standardized testing software? The benefits of Open Source become less prevalent as the number of developers with an interest in the code goes down.
  • OK I work for a prometric testing center. I am also a teacher. What I noticed (this was the first and last time I gave one) with the scantron sheets is that when I did a glance over of all of the tests, that I had at least a 35% error rate on them. Mind you that the machines used to score the tests are a heck of a lot higher quality. (I finally ended up hand scoring the things). Also working at the prometric system, I can't answer to the actual scoring, but one thing that is nice, is when the hardware works (as with anything the computerized testing does have its shares of glitches in it). But there is one major major major major major advantage that I can see that most schools/testing agencies have to deal with that computerized testing doesn't--the problem of scanning tests to be entered on the computer--as the answers are already in the computer. I also know as a teacher, if I screwed up on a test and graded it incorrectly my *** would be hung out to dry, and I'd have to answer to quite a few angry parents. This is also because the kids would have a chance to review the questions. Here in lies the problem. I know for a fact that some of the testing agencies value their questions at more than $600.00 per question. Heck I had to sign I don't know how many waivers just to take a computerized GRE, and then I had to take the thing at a specific time of the month, when ETS was apparently going to rotate their questions again. This is the main reason why that parent had to threaten a lawsuit to view the test questions on the test. He was lucky that they did so to begin with, as this was not the only time that wierd things have happened with standardized tests. If you check out the book about Escalante, he had a heck of a battle on his hands with ETS's AP Calc test. At least those kids got a second shot at the test.....
  • Heheh, I remember stories of the OAC (grade 13) computer programming teacher at my highschool. Apparently, she was hired cause of prior work at Microsoft.... as a secretary. Or so the rumour ran. The course was in VB. She was given an nice, easy-to-understand course outline and manuals. Still, when it came time to work, she was clueless as could be. My favourite example was when a student asked how to work with "random files" - being a file type in VB. She replied that you should think of dice, how they can be any number and you don't know what. Of course, random files referred to random access, as in non-sequential, not random behavior, in that you could access any line of the file by its line number instead of reading the whole file into memory, like a primitive database. This was an entire unit of the course, and she didn't even have the faintest inkling of what it was about. Most people just pasted a few widgets together and didn't even bother coding the back end, and she passed them 'cause it looked right.
  • free as in "time"

    Nope... this would fall under the "Free as in beer" criteria. Why? 1: "Time is money." 2: "Money buys beer." So, "Free as in time", and "Free as in Money" equate to "Free as in beer."
  • by Frymaster ( 171343 ) on Monday May 21, 2001 @11:49AM (#208070) Homepage Journal
    really, the problem here is that
    a) bad software was written
    b) it was closed source.
    an open source solution would only address this problem if the purchasers (ie the school administration) actually sat down and audited the code. not very likely, imho. not that i'm dissing open source... far be it from me to do that, but oss is only as good as the people willing to audit it. being in the process of writing an online test generator [sourceforge.net] i can tell you that teachers and admins look at oss the same way they do proprietary software... the only difference is that its good for their budget and doesn't come in a box...

    nuff said.

  • by Phronesis ( 175966 ) on Monday May 21, 2001 @11:53AM (#208071)
    If you read part 1 of the NYT story, you find out that many more students were penalized by incorrect answer keys than by computer errors. Thus, if we are to trumpet open-source as the appropriate way to deal with risks of errors in the algorithm for normalizing percentiles for test difficulty, then do we also conclude that all answers must be revealed so that erroneous answer keys can be caught?

    Perhaps. I always give my students answer keys after I give them a test. But for big tests such as are described in the story, I would worry that having to change all the questions several times every year would introduce so much opportunity for poorly worded, misleading, or fundamentally flawed questions that this risk would outweigh the current risk of incorrect answer keys.

    There is also the cost question. If companies had to rewrite the whole test every time they administered it (because they would publish the answers), then the costs would rise sharply and the tests would become less affordable to struggling school districts (or we would see large tax increases to pay for the tests). The benefits of increased test accuracy might not justify the cost to the taxpayers.

    None of these considerations apply to the case of the software used to grade the tests. There seems to be little risk that understanding the way the grades are curved would enable cheats other gamesmanship. Since this is not networked software, the potential for attacking it from another computer is small (social engineering is still a risk, though, but no more so for open source than for closed-source software).

    Thus, although it would be tempting to put "Open Answer" on par with "Open Source," the first seems impractical and the latter seems well suited to the problem at hand.

  • yes, it's not just errors in closed-source grading that's sending some kids to summer school. have you considered the impact of closed-source testing? if we open-sourced the exams themselves, that would elimate more errors than any other measure! not to mention the freedom it would engender: free as in "time".
  • "Open Answer" would certainly be a big, very expensive, change. These tests take years to develop and the testing companies are already overworked. In the second part of the article (in today's paper edition, not sure if it's online), they talked about how the testing companies are very busy with the current volume of work, which is only going to increase as more states see standardized testing as a panacea. The good news for me is that, since my wife is a sub-sub-contractor for the testing companies and writes test questions, I think it would be great for her income!
  • by Fros1y ( 179059 ) on Monday May 21, 2001 @01:18PM (#208074)

    What I found so horrifing about this situation was the response of the people responsible for the NYC catastrophe. It would seem that those in power were far more concerned with the politics of their job to even think about worrying about the children and subordinates that the test was slowly crushing. Perhaps the most brazen of this behavior was the administrators decision not to speak out about his concern because of the fear he had about his reputation.

    If we are ever to have upstanding and capable students who know not only logical though but also ethical beliefs about participation in society it is precisely this sort of leadership that we can do without. I'm very sorry that the tragedy of these tests hurt so many children and some many competant superintendants, but was this really one of them?

  • You seem to be arguing that adaptive testing is entirely about collecting a database of responses. However, I'm pretty sure the article directly states that a programming error was made:

    As it turned out, CTB despite its assurances to Indiana and others had done an incomplete job of reviewing test data. When a much larger sample was reviewed, a programming error surfaced.

    So yes, bad software was written. Furthermore, since the error affected some but not all schools, I would assume that it's not a global problem with CTB's adaptive question database (i.e. having bad numbers for a few questions, which would have thrown every school off compared to prior years), but rather with some other part of the scoring software. Some components of CTB's software might potentially be Open-Sourced, in which case those people who were affected by CTB's error might have looked for the mistake (or more likely hired someone to look for the mistake) in the code. As long as this code could be released without giving away information on the test questions (a valid concern, of course), then someone would have a better chance of catching a company like CTB making an error. It would be one less opportunity for a proprietary process to proceed (correctly or incorrectly) without external review.

    So yes, Open-Sourcing might have helped here, if done properly.

  • Well, I will definatly watch my spelling on this reply!

    I don't recomend any testing, I recomend more dynamic funding (note I did not say MORE MONEY, just better directed were it goes). Vouchers look like a good route to me, you fill your school with know nothing teachers and inept administrators and the people will leave and take their money with them. I feel that a lot of the poor performance on the part of the school system is that they are a monopoly that must be payed. Even if you put your child in a private school or home school them you must still pay the taxes that fund the school system.

    As long as administrators and teachers get a free ride then they are not going to want to make things better. The only thing that is going to help the kids is more directed funding allowing a school to offer better classes with smaller groups better sorted by ability. It is just as much of an injustice to force a smart student to attend a remedial class as to force a not so bright student to take advanced classes. Both will start to fail. The smart one out of boredom and the less bright one out of lack of ability.

    More money won't help the problem, the best funded public schools in the nation are amongst the poorest performers. Until their is compition in the market place we will continue to see this lack of education.

    To break this down to it's points : Vouchers good, Testing Bad!
  • by Papa Legba ( 192550 ) on Monday May 21, 2001 @11:57AM (#208077)
    This is going to just get worse for kids. With press. Bush pushing to make standerdized testing nation wide this will become more and more common of an occurance.
    I live in Virginia, a state that implimented Standereds Of Learning tests (SOLs) years ago, The absolute paranoia that surrounds a test that "was just going to be used for monitorring purposes" is astounding. The schools have stopped pretending that they are teaching knowledge and instead spend all of their class time cramming facts down the students throats so they can pass the trivia quiz of the SOLs. br> It has gotten so bad that a local city has asked to extend the school year for kids just to prepare for these test. Once the test are out of the way the kids spend three weeks until the end of the year loafing in class as the teachers have no reason to give them a final, they already had it and passed their SOL.Just and example of how the schools are warping to fit around the SOLs , soon they will be the official final. This is the only outcome you can expect when a teacher and administrators depends on their raises based on how their school district does on the SOLs and their jobs depend on how well their own classes do on these tests. School should teach kids how to think and solve problems, not how to regurgitate facts at the drop of a hat, facts that can be easily found in a book or on the web if you were not sitting in a proctored testing room.
    This was a great idea to start but it is getting out of control, just like drug testing in the 80's early 90's. Seemed like a good idea until fly by night testing labs started turning in false positives by the truckload ruining people and their carrers.
    Kids don't need this pressure, Teachers ,maybe, but this is not how to apply it, school adminsitrators definatly need to be held accoutnable. I do not think this is the way to do it though. Ultimatly we do get rid of the incompitents , but we also get rid of the talented teacher. Once the lesson plan is dictated from the state or nations capital the chance for real learning is lost and it just becomes a numbers game. Kids are not numbers, they are potentials and should be treated as such! When we takes steps like these to teach to the lowest common denominator, the brightest of our children are wasted, we need to stop this and start teaching smart.
  • It is interesting to consider what would have happened if the error was to give higher scores. Why would any school district complain? All the politicians, parents, administrators would be very pleased.
    If I was running CTB I'd rather have errors that pleased my customers rather than ones that causes them to sue. A bias towards higher scores would be a good thing for my company so long as it wasn't large enough to be obvious.
  • I can't tell if you're joking or not. But if my kid were held back or told they had to do X, Y, or Z because they failed some standardized test like this, I'd want to see a copy of what the kid got wrong... but I bet it doesn't work like that. :)
  • Do I smell a class action lawsuit?

  • I find it interesting that although there are a large number of posts slagging off "closed" source, and saying how this problem would have never got as bad with "open" source, not one single post has been able to point to a suitable piece of open source code which they could have used instead.

    CAN anyone point to an existing open source alternative that meets all the necessary criteria?

    Because if not, then I'm afraid discussion of whether or not the problem would have been as big with open source is irrelevant if there IS no such open source product to begin with.

    Yes, "they" could have written software themselves, but do you really think they had the necessary expertise and time to spend writing such a product from scratch? The answer is "no", otherwise they would have.

    Just something to consider.

  • "If you read part 1 of the NYT story, you find out that many more students were penalized by incorrect answer keys than by computer errors. Thus, if we are to trumpet open-source as the appropriate way to deal with risks of errors in the algorithm for normalizing percentiles for test difficulty, then do we also conclude that all answers must be revealed so that erroneous answer keys can be caught?"

    Hmmm. It's not like the answers are unfathomable mysteries revealed only by divine inspiration. Presumably any given answer on these tests could be easily verified by someone who actually knows the topic being tested. Just like that father in the NYT story did. But look at all the nonsense that father had to go through to even find out what the heck his own kid did wrong.

    With a machine graded test, why not prepare two or more seperate answer keys independently? Making an proper answer key can't be _that_ hard. Run the tests through once with each key, compare the bulk results, and any problems like this immediately become glaringly obvious. I think a test that determines a student's entire academic future deserves a little simple error checking.

    The larger problem is that the testing companies have no independent oversight. They should be required to place their answer key and a copy of every question into the hands of a neutral party who can check these things if there's a dispute. (That's not necessarily some random government agency, as the gov't also has a pretty dismal track record as far as customer service and full discosure are concerned.) How about requiring the different major testing companies to hold and audit each other's tests? Competition in action!


  • I'd say this says more about the problems of standardized tests than it does about open source v closed source.

    It is astonishing how much meaning we attach to a test where you sit down for 1 or two hours. It might be a bad day, you may have just crammed the night before; either way, the score won't indicate what you know. And then to make decisions about a students career based on this exam is simply criminal. How the students actually perform in the classroom and on homework and projects should have told these adminstrators right away that something was wrong. No test should substitute for actually interacting with the teachers and students.

    You'd think we already knew this, but unfortunately the trend is towards ever more standardized tests. Sadly, this is a trend the our "President" is encouraging, at the expense of real progress. Unfortunately, in my state so is the board of Regents and Commissioner of Education.

  • is that the bug lists are public. For example, you can report Emacs bugs to bug-gnu-emacs@gnu.org, a mailing list of people who see all the bug reports. Want to see all the bug reports yourself? Write bug-gnu-emacs@gnu.org and ask to be added to the bug-gnu-emacs list. Some other free software projects, but not all of them, publish their bug info like that. A non-software example is Intel's errata list, which was secret until the media uproar over the FDIV bug. Now it's public.

    I think publishing bug lists does a lot to keep developers honest. Simply opening source isn't enough. If the bug reports for those tests were available--not necessarily to the whole public, but say, to the test administrators of school districts using those tests--that would have prevented a lot of the consequences of this problem.

  • The previous message had a typing error--I meant to say, in order to be added to the bug-gnu-emacs mailing list, write to bug-gnu-emacs-request@gnu.org. Always use the -request convention for requests to be added or removed from a list. Sorry for the slip.
  • The factors that led to this seem to revolve around closed-minded thinking: Dr. Crew refused to believe the scores could be wrong, despite a dramatic difference. Mr. Tangentt (sorry if I get the name spelled wrong here) refused to admit there might be a problem, and apparently still does.

    Customers not knowing anything about other customers also excarbated the situation. Had the customers been able to communicate to one another, they may have discovered the commonality of what they were seeing and had it resolved much sooner.

  • The consulting company I applied to just made me take an online apptitude test concerning J2EE. First of all, I hate tests. I'm too old for anything harder than "Millionaire". But this one really sucked. Not only were some of the possible answers ambiguous, it suffered from all of the same things that email and websites do -- poor spelling, improper punctuation, and bad grammar. That is arguably not a horrible thing when writing in English; but, when writing in a programming language, especially when that language is case-sensitive, it is absolutely unacceptable. I did not answer two of the questions because of the typo's in the code that would have prevented compilation (javax.swing.Jframe, for example).I won't have to take summer school or anything like that, but it might cost me a dollar or two an hour.
  • Given the fact that I count one attitude flaw in your post, I'm still referring to the original which has several salient points.
  • This is interesting, but is it really a strict comparison between closed and open source?

    This is just an example of bad business practices. In fact, I think that any customer should expect better than this. Not to mention that doing this sort of thing is a huge legal liability, thus not done by many companies, either by the feeling of "fair-play" or just cya.

    Just because an external firm reported the bug, it doesn't follow that only a closed source company would be able to bury/hide it. Perhaps the chances are lower that an open source product would get away with it, but it isn't a foregone conclusion that by definition, an open source product couldn't/wouldn't do the same thing.
  • You think this is bad?
    Wait till some similar process screws an election up.
    Oh, wait....
  • by vls ( 312292 ) on Monday May 21, 2001 @01:03PM (#208102)

    Testing companies such as ETS and CTB quickly gain monopoly power because of a simple, but powerful, network externality:

    As the company administers more tests, the company's database of questions and performance gets larger. Also, the company usually includes experimental questions on the same tests as calibrated questions -- giving powerful statistics on how the new questions will perform relative to older questions.

    Thus as the company administers more tests, the company gets a bigger an bigger lead over its competitors. If a school switches testing companies, they won't be able to track trends from one side of the switch to the other.

    Like other network externality markets -- think operating system -- the monopolist's proprietary edge comes not from the originality or sweat of the incumbent, but simply from the size of the adopted user base.

    But the testing market is subtlely different. In testing, much of the proprietary value comes from the answers the students themselves give. Indeed, if the school districts considered these data 'proprietary' then the testing companies might have to 'buy' their monopoly position from the customer.

    But even if the school retained ownership of its own pupils' data, the testing company would still have the power in the relationship. To truly move to 'tester portability' -- and thus competition -- schools would need 1) to be able to retain the actual wording of the tests (and share that with other testing companies) and 2) to be able to insert experimental questions from other testing companies into the testing, to allow for calibration if they were to switch vendors. The only hope I see for such an utter inversion of the relationship would be if districts comprising more than 50 percent of the testing market banded together. Possibly, if the U.S. Department of Education forced a change for a federal program.

  • Two other interesting points: there was no mention of lawsuits (and you know there's gotta be some brewing somewhere), and the woman in charge of the department now works for the company that administers the SATs. If I was a college-bound senior, I'd be real worried right now.
  • by tb3 ( 313150 ) on Monday May 21, 2001 @12:24PM (#208104) Homepage
    Yes, you can file lawsuits on software bugs, AECL was sued over the software bug in the THERAC-25 that caused six incidents of injury and/or death. Here's a good write-up [vt.edu].

    If the bug causes sufficient damage or harm, and the company was negligent, then that should be grounds for a lawsuit. (Of course, IANAL, but my sister is.)

  • Where I live (Michigan), we have statewide proficiency tests that actually are corrected by our teachers. While that's nice (the teacher-graders, not the tests themselves), it's not a solution for standardized tests, where finding the results isn't as simple as #correct/#total=%ile. There's a whole cadre of statistical manipulations the raw results go through, and we only have to look at the article to see how dangerous some of them are:

    Then CTB did something that it would not do in any other state: it simply raised the comparative rankings of many Tennessee students, and lowered some others, to conform with Mr. Sanders's statistical models - even though the company could find no error to justify those changes.

    My, my, my... adjusting results to jive well with a statistical model, are we? That's some quality data, there. That such manipulation is possible is a little chilling, but this article seems to suggest it is commonplace. School districts are making decisions based on test data that's been twisted and pulled like Silly Putty.

    Seems to me that the real reason we have standardized tests is to cast the legitimizing shadow of external validity on some fundamentally meaningless numbers so that we can claim those numbers constitute a meaningful measure. We need something quantifiable to judge our students against, after all, and if there isn't a good yardstick available we'll use a crappy one even though we know it's crap, because we certainly need something to show the parents.

    Don't get me wrong -- I strongly believe that we need to monitor our educators and drum out the bad ones. We don't tolerate bad surgeons; we shouldn't tolerate bad teachers. But articles like this make it abundantly clear that standardized tests, while appearing to offer an easy gauge of student performance, are all surface and no substance.

  • by Ryan_Terry ( 444764 ) <MessEdUp@violent s i n . com> on Monday May 21, 2001 @11:52AM (#208115) Homepage
    ...very nearly criminally negligent?

    He, that is my vote for understatemnt of the day. Do you have any idea the amount of time/money those students who had to take summer school lost? Add all of those together and this becomes a lot more serious. I think the only thing stopping them from serious legal troubles is the fact that these were high school kids. I'm not saying it is right, but teen-age americans don't get the same rights that their adult counterparts do. If this had been a corporation that screwed up on 40,000 paychecks you better believe ther'd be a legal battle to remember.

  • And the students are the ones who ultimately pay the price. The administrators may get some bad press, but in the end, students are the ones who feel the crunch from this fiasco. Why? Because low scores on standardized tests mean less money and poorer teachers for that school. By the time the school board realizes the students may have done okay on the tests, it will be too late.

    Whatever happened to the days when standardized tests were graded by a machine that checked for markings on the answer sheets? What was wrong with that system? It tended to be fairly accurate.

    Just my opinion; I could be wrong.


Due to lack of disk space, this fortune database has been discontinued.