Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
AI

Furious AI Researcher Creates Site Shaming Non-Reproducible Machine Learning Papers (thenextweb.com) 128

The Next Web tells the story of an AI researcher who discovered the results of a machine learning research paper couldn't be reproduced. But then they'd heard similar stories from Reddit's Machine Learning forum: "Easier to compile a list of reproducible ones...," one user responded.

"Probably 50%-75% of all papers are unreproducible. It's sad, but it's true," another user wrote. "Think about it, most papers are 'optimized' to get into a conference. More often than not the authors know that a paper they're trying to get into a conference isn't very good! So they don't have to worry about reproducibility because nobody will try to reproduce them." A few other users posted links to machine learning papers they had failed to implement and voiced their frustration with code implementation not being a requirement in ML conferences.

The next day, ContributionSecure14 created "Papers Without Code," a website that aims to create a centralized list of machine learning papers that are not implementable...

Papers Without Code includes a submission page, where researchers can submit unreproducible machine learning papers along with the details of their efforts, such as how much time they spent trying to reproduce the results... If the authors do not reply in a timely fashion, the paper will be added to the list of unreproducible machine learning papers.

This discussion has been archived. No new comments can be posted.

Furious AI Researcher Creates Site Shaming Non-Reproducible Machine Learning Papers

Comments Filter:
  • Join the club (Score:4, Interesting)

    by Ubi_NL ( 313657 ) <joris.benschop@gmaiCOUGARl.com minus cat> on Monday March 08, 2021 @01:52AM (#61134954) Journal

    I am a molecular biologist, PhD, been in the business for over 20 years now. The experience that the vast majority of publications cannot be reproduced is unfortunately something quite well known to us, and its something you learn quickly. This includes high-impact papers in Cell and (especially?) Nature.

    So yes, its sad, but having an experienced researcher get so wound up about it is a bit silly.

    • I guess this is the logical end result for credential inflation?
      • Re: Join the club (Score:4, Interesting)

        by Vintermann ( 400722 ) on Monday March 08, 2021 @04:39AM (#61135250) Homepage

        Not really. Credentials are just the start, they don't get you into Cell or Nature.

        Rather, it's the logical end result for Goodhart's law. Metrics, from grades to citations to whatever, get gamed. If the stakes are high enough, they get gamed so hard they drift away from the thing they were supposed to target.

      • No. Partly it's because people need to publish to keep their jobs. And partly it's that most people especally outside science don't realise that a paper is the first word, not the last.

        • Even people who realize that don't want that word to be a lie.

          There's no excuse for a ML result to be unreproducible. Really none. If you have the same software versions, the same code to glue them together, and the same training corpus you should wind up with the same result.

          • The sub-thread had got a little off the topic of ML, and started covering papers in Cell and Nature. much harder to make things reproducible, when you need specific lab equipment, reagents and cell lines in the first place.

            • "much harder" != "can't be done"
              • "much harder" != "can't be done"

                In some cases, no. No one can in practice reproduce CERN's results because they'd need another CERN to do so. Bits can be reproduced, but no one has a particle accelerator of that energy or luminosity.

                For cellular stuff it is really really hard, and there may be no other labs which exactly the right mix of stuff at the time of submission to reproduce the results, especially as shipping things like genetically modified cell lines is from a legal perspective non trivial. Even m

    • This includes high-impact papers in Cell and (especially?) Nature.

      I've noticed that Nature seems rather bad as well. They're looking for flashy, not quality. It's disappointing.

    • Re:Join the club (Score:5, Insightful)

      by stephanruby ( 542433 ) on Monday March 08, 2021 @02:39AM (#61135046)

      So yes, its sad, but having an experienced researcher get so wound up about it is a bit silly.

      Maybe that's the real underlying problem here, that not enough senior researchers are actually furious about this.

      This researcher is not only furious, but they're actually trying to do something to fix the problem system-wide. That person should be applauded for their efforts instead of having their outrage belittled.

      • What's at stake with these kinds of failures is the reputation and utility of science itself. From that, a retreat in the progress of humanity. Knocking the species down a few rungs. Don't ever think "It can't happen" or that progress is a birthright or guarantee.

        If you've gotten so far down into the reeds that you can't see that, that's a problem. If everyone's in "the club", then it's a systemic problem. Don't be in that club.

    • Re:Join the club (Score:4, Insightful)

      by Rockoon ( 1252108 ) on Monday March 08, 2021 @03:34AM (#61135134)
      The problem is that other people are familiar with other fields, where the research, publications, and peer review process arent such complete shit.

      This phenomena that you think is so common was once only really relegated to the "social sciences" and is a function primarily of how many "practitioners" there are in the field and therefore how many papers get published.

      The Math guys dont fuck around when they publish because there arent that many of them and it will be very fucking embarrassing as more that a few will figure out where you went wrong and will not be shy about discovering that fact. You only ever get to publish complete shit once as a math guy.

      Maybe, just maybe, the discipline in your field is complete shit, and its time to make some fucking noise about it.
      • by Mitreya ( 579078 )

        other fields, where the research, publications, and peer review process arent such complete shit.

        Yeah, but reproducing something in Math is very different from AI (or CS in general).
        You can include a complete proof in Math, but in CS-related fields even including your code may not be enough. Particularly if you are also reporting runtime performance measurements, which are highly system-dependent.

    • Computer scientists shouldn't put up with this crap. The level of reproduction you get by running the same code isn't an unreasonable minimum standard for machine learning.

      • Re:Join the club (Score:4, Interesting)

        by gweihir ( 88907 ) on Monday March 08, 2021 @08:20AM (#61135610)

        The problem is that you run the code on data and often you cannot publish that data. Sure, there are people that use this to simply fake the data or the results of the training. From personal experience, quite a few even good looking papers are simply fakes. And I have had a situation where (as a PhD student) I was unable to convince my professor that a paper was bad. All he saw was the "quality" conference and the abstract. Worse, he wanted me to go into that direction with my own research. Funnily enough, about a year later, all the authors minus the first one (the lying scum PhD student) published a retraction. And that was how my PhD took a year longer. This system is utterly and completely broken because it gives the wrong incentives and too many people cave to the pressure.

        Now later, I had a situation where I also could not publish a rather large data-set, because it was a confidential industrial one. (This was actually industrial research for my employer.) So I made both careful benchmarks and explained in detail why things worked and what alternatives I had looked at and why the others did not work. It was a lot of work, but I managed to do first a conference paper and then a journal paper (which are relatively rare in CS). But the moral is that it is pretty hard to make a good paper when the data-set is critical and cannot be published. First, you actually have to have something that works on other data too! And that is often not a given in the low-quality ML field. (Low-quality like pretty much all "AI" research with very, very few exceptions.) And second, you have to really understand what you are doing and describe it carefully. From doing paper reviews for a long time, I conclude that many, many authors in the CS field have no real understanding of what they are trying to publish. Many also have a very limited CS understanding, both practical and theoretical.

        For an academic researcher, publishing quality research means fewer publications and less flashy ones. That may well reduce or end funding when the others publish more and flashier stuff. As I said, the whole process is utterly broken.

        • I was unable to convince my professor that a paper was bad. All he saw was the "quality" conference and the abstract.

          Frankly you should have put "professor" in quotes too. That's inexcusable being that credulous.

          • by gweihir ( 88907 )

            I was unable to convince my professor that a paper was bad. All he saw was the "quality" conference and the abstract.

            Frankly you should have put "professor" in quotes too. That's inexcusable being that credulous.

            Yes, probably. Unfortunately, this guy was one of the better ones.

    • Re:Join the club (Score:4, Insightful)

      by thegarbz ( 1787294 ) on Monday March 08, 2021 @08:22AM (#61135622)

      So yes, its sad, but having an experienced researcher get so wound up about it is a bit silly.

      I think it's more sad that you and your fellow researchers have accepted this kind of mediocrity. If something is sad, why is it silly to get wound up enough to try and do something about it, if nothing else but create a shame list to demonstrate a problem in the field more widely?

    • If you found a paper cannot be reproduced, did you publish that?
  • AI/ML is the modern snake oil. There is a lot of money you can extract from VCs, and C-levels are really excited about investing in it even though it hardly ever solves a problem at hand. But, yeah... FOMO.

  • Comment removed based on user account deletion
    • Mod up as funny or informative. I am out of points. Sadly fake it until you make it is here to stay, where with x'millions of farcebook likes, it must be revolutionary. Except no Joe Stalin to shoot these types. Next breakthrough - AI helps perpetual motion double efficiency.
    • And their annual "Ig Nobel" prizes. A startling amount of amazing and practical science passes through their pages. The "invisible gorilla" award winning paper from 2004 was fascinating: the prostate massage cure for hiccups was also practical, easily reproduced, and yet still has not become common practice, much to the discomfort of permanent hiccup sufferers around the world.

  • I say cease! I cannot live without being constantly reminded of what I bought last year.
  • Throw a coffee mug at the wall furious? Or active-shooter furious?
  • Just might work.. (Score:4, Informative)

    by mattr ( 78516 ) <mattr.telebody@com> on Monday March 08, 2021 @03:38AM (#61135142) Homepage Journal

    Interesting, this might work. The first one I clicked on seems to have had a bunch of code added 2 days ago. (the Bayesian optimizaiton paper). They should put a link to the results on the top page of the website thought. They also should put the dodgy author's name in lights too, I suppose the idea is to avoid hassles? From what I can see, the first three results are:
    1: Code coming soon actually arrived after shamed;
    2: There is code but they couldn't get the same results, which is baffling and needs a closer look, either original author or the poster is perhaps making a coding mistake? More eyes might resolve this. ("We asked the authors for their code and they uploaded their code but running their code under their own hyperparameters does not yield their results.");
    3) Dodgy author Yisen Wang appears to be perpetrating a scam. --> Looks good to me!

  • They're seriously giving authors just 24 hours to respond? Better not take a vacation or weekend anytime soon.
    • Even if the author responds within a month, it's not the end of the world. It's not like being listed on that site is a severe penalty.

  • Apparently most psychological studies can't be reproduced either.
    • Re: (Score:3, Insightful)

      by h33t l4x0r ( 4107715 )
      Yeah but unlike psychology, you can just put the ML code + training data in a docker file with clear instructions to run it. I have to deliver a demo to my clients or I don't get hired. I don't know why these academics aren't held to a similar standard. Also, I would love to have a free week to fuck around with "implementing a paper". Unfortunately I have bills to pay.
  • "...a website that aims to create a centralized list of machine learning papers that are not implementable..."

    You're gonna need a bigger boat.

  • So.. machine learning researcher discovers they are working in machine learning? Seriously, this kind of problem is why it used to be such a joke subfield within AI till cheap GPUs ment they could do so much of their work so quickly that the poor results stopped mattering. But seriously, this has always been a problem with that branch,... your results are opaque and based off statistical chance, so reproduction has always been challenging and usually comes down to 'does method XZY seem to have similar gai
  • Your job depends on getting papers published. It does not depend on having reproducable results, nor sharing your code or data.

    In fact, sharing code and data might be counter productive, since it *is* a competition in many cases.

    So, you keep everything to yourself, and journal editors try to make decisions by guessing whether the proposed method *could* actually work.

With your bare hands?!?

Working...