Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?

Scientists Are Failing To Replicate AI Studies ( 89

The booming field of artificial intelligence (AI) is grappling with a replication crisis, much like the ones that have afflicted psychology, medicine, and other fields over the past decade. From a report: AI researchers have found it difficult to reproduce many key results, and that is leading to a new conscientiousness about research methods and publication protocols. "I think people outside the field might assume that because we have code, reproducibility is kind of guaranteed," says Nicolas Rougier, a computational neuroscientist at France's National Institute for Research in Computer Science and Automation in Bordeaux. "Far from it." Last week, at a meeting of the Association for the Advancement of Artificial Intelligence (AAAI) in New Orleans, Louisiana, reproducibility was on the agenda, with some teams diagnosing the problem -- and one laying out tools to mitigate it.
This discussion has been archived. No new comments can be posted.

Scientists Are Failing To Replicate AI Studies

Comments Filter:
  • At least some of them [] were artificially intelligent.

  • Join the Crowd (Score:2, Interesting)

    by sycodon ( 149926 )

    Science has a Replication [] problem []

    • Re:Join the Crowd (Score:4, Insightful)

      by ShanghaiBill ( 739463 ) on Friday February 16, 2018 @06:51PM (#56138304)

      Science has a Replication [] problem []

      This is not really the same issue. Replication failures in the physical and social sciences are difficult to fix, since they are can be caused by small differences in data collection, experimental procedures, and statistical analysis. It is a hard problem.

      Fixing the replication problem described in TFA is drop dead easy, since it has exactly two causes: closed data, and closed source. The fix? Reject any paper for publication if full source and data is not available. Science is based on openness, not secrets.

      • Re:Join the Crowd (Score:5, Insightful)

        by ceoyoyo ( 59147 ) on Saturday February 17, 2018 @01:10AM (#56140262)

        I agree with you, but I think it's the same problem at the root.

        A robust result, whether it's a psych study, something in a petrie dish, or some machine learning tweak, must be replicable on new data. If it's not... what's the point really?

        That's more obvious and easily demonstrable in machine learning; a research group asked for my help last year because they were having trouble with their deep learning model. They trained it on one dataset and it wouldn't work on another, similar dataset. Not surprising... you have to train it on diverse data to have it generalize well. Yeah, that's harder.

        Other fields are no different. Tightly controlled studies make things easier and cheaper. But if that result is to be used generally then the necessary controls need to be quantified.

        Having said that, the scientific literature is not supposed to be "truth." They're reports of observations. Individual papers are supposed to be the starting point for further investigation by other groups. Problem is, we've forgotten that, and don't reward it.

        I like the idea of open data, but it concerns me that it might just exacerbate the problem: I do something and publish the result and the data; you come along, confirm my result (in the same data) and we call it replicated.

      • by piojo ( 995934 )

        Fixing the replication problem described in TFA is drop dead easy, since it has exactly two causes: closed data, and closed source. The fix? Reject any paper for publication if full source and data is not available. Science is based on openness, not secrets.

        That assumes the set of problems is the same in the replication. It probably isn't. Testing with different problem data reveals overfitting, not to mention the fact that real world needs differ slightly from situation to situation.

  • It's hard to precisely match the tint and odor.
    • Re: (Score:2, Offtopic)

      It's hard to precisely match the tint and odor.

      That's not true. McDonald's successfully replicates it in their food in thousands of franchises around the world.

      • by mikael ( 484 )

        Different McDonalds have different atmospheres. The modern stores have touch-screens to do ordering. You just customize your order, make the payment and collect from the counter staff. Less modern stores still require the order to be taken over the counter. Some places seem to recycle burgers overnight - they are stale, hard and seem to have been reheated two or three times.

  • by ArhcAngel ( 247594 ) on Friday February 16, 2018 @05:33PM (#56137826)
    If you give ten people the exact same stimuli you will get ten different reactions to that stimuli. There will be a dominant leaning reaction but each person will asses the stimuli based on their personal history and beliefs. AI is an attempt to mimic the human thought process so if successful the same stimulus will start to generate different results as new data is processed. In fact the same stimulus can be perceived differently by the same person given different context. If you come to my door in the afternoon I might be glad to see you but if it is 3 AM I probably won't be.
    • by fluffernutter ( 1411889 ) on Friday February 16, 2018 @05:51PM (#56137948)
      This is about applying the exact same stimuli during the upbringing of the same person and yet getting people with vastly different beliefs about the world. Pretty scary that such a psychopath will soon be trying to drive us around.
    • by Anonymous Coward

      No, this is about portability and reproducible results. There are two things in general for AI "nets".. training and classification. Sometimes they are mixed, but the end results should be the same given all settings and inputs are the same. The biggest thing is hardware and API.. looking at you CUDA, that get "speedy" results by not guaranteeing the same results on different hardware, and obviously not using floating point standards on those different hardware.

      So say you buy a bunch of research machines th

      • by mikael ( 484 )

        That's known with the quality of graphics rendering. With floating-point data, there's a technique known as guardband bits. These are extra bits of precision that remain internally within the floating point logic units. These aren't mandatory, but protect against numerical instability with small values. This can be visualized by comparing simple color gradients []

        For some calculations like CFD, any overflow in one grid cell will expand outwards to all the other grid cells quite

    • by ShanghaiBill ( 739463 ) on Friday February 16, 2018 @06:55PM (#56138338)

      AI is an attempt to mimic the human thought process

      This is no more true than claiming that the Boeing 747 was designed to mimic a hummingbird's flight process.

    • by gweihir ( 88907 )

      You seem to have no clue what this research area deals with. It is not intelligence, despite the misleading name. It is automation.

  • Everything now is hype for headlines and continued funding, partially caused by social media madness. Not enough money left after PR and marketing expenses to do, like, actual stuff. Enjoy the decline.
    • Everything now is hype for headlines and continued funding

      Not true. Most AI research is being done by tech giants (Google, Facebook, Alibaba, Amazon, Baidu, etc), where funding has nothing to do with "headlines".

      The main incentive for these companies to publish is to help them attract talent. New graduates want to join a winning team.

    • by gweihir ( 88907 )


    • by rtb61 ( 674572 )

      Speaking of funding, I would dare to guess the most likely reason why they are not able to replicate results is they are doctoring outcomes to get desired results to get more money because there are big profits in AI. They are doctoring results when they include random good samples and exclude random bad samples. Keep in mind we are talking computers and generating a million samples from which you select 100 and claim, look it worked 100 times without discussing all the other failures is not good science.


  • by Anonymous Coward

    If scientists believe something wrong about medicine, they can give the wrong treatment, obviously bad. People die and stuff.

    But what happens if the fancy new network architecture someone proposed isn't really as good as they say?

    The worst thing that could happen is that people waste a lot of effort trying to get it to work. You won't accidentally put an inferior algorithm into production, because you'll see that it doesn't work as you try to get it to work.

    So yes, obviously more code is good, obviously ind

  • So, they can't reproduce a test, like in medicine when you try to reproduce the spread of a virus...

    Conclusion: IA is a virus, beware! ;-)

  • by ssclift ( 97988 ) on Friday February 16, 2018 @05:42PM (#56137888)
    ... an algorithm was something which reliably produced results when processing the same input. NN/AI people keep using that word, "algorithm", I do not think it means what they think it means...
    • So maybe the point is that it's not entirely about algorithms, is it? After all, animals aren't algorithms either.
    • Then your Computer Science course was wrong. If you did any basic machine learning unit, you'd know that randomized choices play a part in many algorithms.
    • by godrik ( 1287354 )

      Random algorithms do not always produce the same answers. We like them for that reason.
      I haven't RTFA (this is /. afterall) but I suspect that there are a lot of unspecified parameters and experimental settings that were left out of papers and which are actually critical.

    • by rtb61 ( 674572 )

      It's a complexity problem, because it is too complex in the initial instance it produce unpredictable results. So how do you get a computer to learn how to communicate. You first look at the normal learning approach, take an adult from the forest and try to teach they how to communicate as an adult and you will have very poor outcomes, teach them as a child and you have good outcomes.

      So how to teach a computer to speak, start a lower complexities. So teach it by ages. First let it learn how to communicate

  • by SuperKendall ( 25149 ) on Friday February 16, 2018 @05:50PM (#56137938)

    It seems quite obvious that if AI results cannot be replicated, the only possible expiration is that sentience has been achieved and it is throwing off results to mask true advancement.

    • by gweihir ( 88907 )

      +1 funny. Also like how you sneaked "expiration" in there! This whole research filed has expired indeed and most in it should be fired and found some jobs they can actually do, like flipping burgers or sweeping trash.

  • Next they'll tell us twins are not exactly the same person.

  • "No, I don't feel like it"
  • If you want to call it science the code should be reviewed and published before results from it are.

    ...only 6% of the presenters shared the algorithm's code...Researchers say there are many reasons for the missing details: The code might be a work in progress, owned by a company, or held tightly by a researcher eager to stay ahead of the competition. It might be dependent on other code, itself unpublished. Or it might be that the code is simply lost, on a crashed disk or stolen laptop - what Rougier calls t

  • The AI field, from the late 60s, has historically been 90% hype and 10% results.
    • by gweihir ( 88907 )

      I think they have mostly optimized away the results today, probably using some "advanced AI algorithms".

  • This just shows that most of the published "results" are based on wishful thinking or outright lies. Happens always when people of mediocre skills become highly enthusiastic about a subject.

  • ... fail to replicate scientists.

  • And given the exact same commands in a replay of certain battles, the outcomes would be mildly to wildly different.

    There was a random element to behavior in the game and as a result, given the same commands at the same time, the battle replays would display different out comes. Sometimes, you would lose but on replay it showed you won. Sometimes, you won but on replay it showed you lost. Kinda funny. (The result you got live was the one that counted).

    I wish they hadn't been sold and become so aggressi

  • Then it is Guano In, Gospel Out.
  • Cant wait until I get my hands on them.
  • According to the linked article, the main reasons for these reproducibility problems are:

    The code might be a work in progress, owned by a company, or held tightly by a researcher eager to stay ahead of the competition.

    On top of that, they include another quite "curious" possibility (!!):

    Or it might be that the code is simply lost, on a crashed disk or stolen laptop

    Nothing of this sounds like scientific/university research in its traditional form of sharing knowledge (+ actually having relevant knowledge, what doesn't seem the case with people saying/believing "the code is simply lost"). So, I hope that most of these cases refer to the research performed by (private) companies, which might also behave according

Heuristics are bug ridden by definition. If they didn't have bugs, then they'd be algorithms.