Furious AI Researcher Creates Site Shaming Non-Reproducible Machine Learning Papers (thenextweb.com) 128
The Next Web tells the story of an AI researcher who discovered the results of a machine learning research paper couldn't be reproduced. But then they'd heard similar stories from Reddit's Machine Learning forum:
"Easier to compile a list of reproducible ones...," one user responded.
"Probably 50%-75% of all papers are unreproducible. It's sad, but it's true," another user wrote. "Think about it, most papers are 'optimized' to get into a conference. More often than not the authors know that a paper they're trying to get into a conference isn't very good! So they don't have to worry about reproducibility because nobody will try to reproduce them." A few other users posted links to machine learning papers they had failed to implement and voiced their frustration with code implementation not being a requirement in ML conferences.
The next day, ContributionSecure14 created "Papers Without Code," a website that aims to create a centralized list of machine learning papers that are not implementable...
Papers Without Code includes a submission page, where researchers can submit unreproducible machine learning papers along with the details of their efforts, such as how much time they spent trying to reproduce the results... If the authors do not reply in a timely fashion, the paper will be added to the list of unreproducible machine learning papers.
"Probably 50%-75% of all papers are unreproducible. It's sad, but it's true," another user wrote. "Think about it, most papers are 'optimized' to get into a conference. More often than not the authors know that a paper they're trying to get into a conference isn't very good! So they don't have to worry about reproducibility because nobody will try to reproduce them." A few other users posted links to machine learning papers they had failed to implement and voiced their frustration with code implementation not being a requirement in ML conferences.
The next day, ContributionSecure14 created "Papers Without Code," a website that aims to create a centralized list of machine learning papers that are not implementable...
Papers Without Code includes a submission page, where researchers can submit unreproducible machine learning papers along with the details of their efforts, such as how much time they spent trying to reproduce the results... If the authors do not reply in a timely fashion, the paper will be added to the list of unreproducible machine learning papers.
Join the club (Score:4, Interesting)
I am a molecular biologist, PhD, been in the business for over 20 years now. The experience that the vast majority of publications cannot be reproduced is unfortunately something quite well known to us, and its something you learn quickly. This includes high-impact papers in Cell and (especially?) Nature.
So yes, its sad, but having an experienced researcher get so wound up about it is a bit silly.
Re: Join the club (Score:3)
Re: Join the club (Score:4, Interesting)
Not really. Credentials are just the start, they don't get you into Cell or Nature.
Rather, it's the logical end result for Goodhart's law. Metrics, from grades to citations to whatever, get gamed. If the stakes are high enough, they get gamed so hard they drift away from the thing they were supposed to target.
Re: (Score:2)
No. Partly it's because people need to publish to keep their jobs. And partly it's that most people especally outside science don't realise that a paper is the first word, not the last.
Re: (Score:2)
Even people who realize that don't want that word to be a lie.
There's no excuse for a ML result to be unreproducible. Really none. If you have the same software versions, the same code to glue them together, and the same training corpus you should wind up with the same result.
Re: (Score:2)
The sub-thread had got a little off the topic of ML, and started covering papers in Cell and Nature. much harder to make things reproducible, when you need specific lab equipment, reagents and cell lines in the first place.
Re: (Score:2)
Re: (Score:2)
"much harder" != "can't be done"
In some cases, no. No one can in practice reproduce CERN's results because they'd need another CERN to do so. Bits can be reproduced, but no one has a particle accelerator of that energy or luminosity.
For cellular stuff it is really really hard, and there may be no other labs which exactly the right mix of stuff at the time of submission to reproduce the results, especially as shipping things like genetically modified cell lines is from a legal perspective non trivial. Even m
Re: (Score:2)
This includes high-impact papers in Cell and (especially?) Nature.
I've noticed that Nature seems rather bad as well. They're looking for flashy, not quality. It's disappointing.
Re:Join the club (Score:5, Insightful)
So yes, its sad, but having an experienced researcher get so wound up about it is a bit silly.
Maybe that's the real underlying problem here, that not enough senior researchers are actually furious about this.
This researcher is not only furious, but they're actually trying to do something to fix the problem system-wide. That person should be applauded for their efforts instead of having their outrage belittled.
Re: (Score:2)
What's at stake with these kinds of failures is the reputation and utility of science itself. From that, a retreat in the progress of humanity. Knocking the species down a few rungs. Don't ever think "It can't happen" or that progress is a birthright or guarantee.
If you've gotten so far down into the reeds that you can't see that, that's a problem. If everyone's in "the club", then it's a systemic problem. Don't be in that club.
Re:Join the club (Score:4, Insightful)
This phenomena that you think is so common was once only really relegated to the "social sciences" and is a function primarily of how many "practitioners" there are in the field and therefore how many papers get published.
The Math guys dont fuck around when they publish because there arent that many of them and it will be very fucking embarrassing as more that a few will figure out where you went wrong and will not be shy about discovering that fact. You only ever get to publish complete shit once as a math guy.
Maybe, just maybe, the discipline in your field is complete shit, and its time to make some fucking noise about it.
Re: (Score:2)
other fields, where the research, publications, and peer review process arent such complete shit.
Yeah, but reproducing something in Math is very different from AI (or CS in general).
You can include a complete proof in Math, but in CS-related fields even including your code may not be enough. Particularly if you are also reporting runtime performance measurements, which are highly system-dependent.
Re: (Score:2)
Computer scientists shouldn't put up with this crap. The level of reproduction you get by running the same code isn't an unreasonable minimum standard for machine learning.
Re:Join the club (Score:4, Interesting)
The problem is that you run the code on data and often you cannot publish that data. Sure, there are people that use this to simply fake the data or the results of the training. From personal experience, quite a few even good looking papers are simply fakes. And I have had a situation where (as a PhD student) I was unable to convince my professor that a paper was bad. All he saw was the "quality" conference and the abstract. Worse, he wanted me to go into that direction with my own research. Funnily enough, about a year later, all the authors minus the first one (the lying scum PhD student) published a retraction. And that was how my PhD took a year longer. This system is utterly and completely broken because it gives the wrong incentives and too many people cave to the pressure.
Now later, I had a situation where I also could not publish a rather large data-set, because it was a confidential industrial one. (This was actually industrial research for my employer.) So I made both careful benchmarks and explained in detail why things worked and what alternatives I had looked at and why the others did not work. It was a lot of work, but I managed to do first a conference paper and then a journal paper (which are relatively rare in CS). But the moral is that it is pretty hard to make a good paper when the data-set is critical and cannot be published. First, you actually have to have something that works on other data too! And that is often not a given in the low-quality ML field. (Low-quality like pretty much all "AI" research with very, very few exceptions.) And second, you have to really understand what you are doing and describe it carefully. From doing paper reviews for a long time, I conclude that many, many authors in the CS field have no real understanding of what they are trying to publish. Many also have a very limited CS understanding, both practical and theoretical.
For an academic researcher, publishing quality research means fewer publications and less flashy ones. That may well reduce or end funding when the others publish more and flashier stuff. As I said, the whole process is utterly broken.
Re: (Score:2)
I was unable to convince my professor that a paper was bad. All he saw was the "quality" conference and the abstract.
Frankly you should have put "professor" in quotes too. That's inexcusable being that credulous.
Re: (Score:2)
I was unable to convince my professor that a paper was bad. All he saw was the "quality" conference and the abstract.
Frankly you should have put "professor" in quotes too. That's inexcusable being that credulous.
Yes, probably. Unfortunately, this guy was one of the better ones.
Re:Join the club (Score:4, Insightful)
So yes, its sad, but having an experienced researcher get so wound up about it is a bit silly.
I think it's more sad that you and your fellow researchers have accepted this kind of mediocrity. If something is sad, why is it silly to get wound up enough to try and do something about it, if nothing else but create a shame list to demonstrate a problem in the field more widely?
Re: (Score:2)
Modern snake oil (Score:2)
AI/ML is the modern snake oil. There is a lot of money you can extract from VCs, and C-levels are really excited about investing in it even though it hardly ever solves a problem at hand. But, yeah... FOMO.
Re: Modern snake oil (Score:2)
It's pretty good for mass surveillance.
Re: (Score:2)
It's pretty good for mass surveillance.
Not really. It is something many people think it is pretty good for mass-surveillance though, so there are a lot of authoritarian assholes you can sell things to.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
And their annual "Ig Nobel" prizes. A startling amount of amazing and practical science passes through their pages. The "invisible gorilla" award winning paper from 2004 was fascinating: the prostate massage cure for hiccups was also practical, easily reproduced, and yet still has not become common practice, much to the discomfort of permanent hiccup sufferers around the world.
To those about to squander AI achievements (Score:2)
How furious was he? (Score:2)
Just might work.. (Score:4, Informative)
Interesting, this might work. The first one I clicked on seems to have had a bunch of code added 2 days ago. (the Bayesian optimizaiton paper). They should put a link to the results on the top page of the website thought. They also should put the dodgy author's name in lights too, I suppose the idea is to avoid hassles? From what I can see, the first three results are:
1: Code coming soon actually arrived after shamed;
2: There is code but they couldn't get the same results, which is baffling and needs a closer look, either original author or the poster is perhaps making a coding mistake? More eyes might resolve this. ("We asked the authors for their code and they uploaded their code but running their code under their own hyperparameters does not yield their results.");
3) Dodgy author Yisen Wang appears to be perpetrating a scam. --> Looks good to me!
Better stay awake (Score:2)
Re: (Score:2)
Even if the author responds within a month, it's not the end of the world. It's not like being listed on that site is a severe penalty.
Same as psychology studies (Score:2)
Re: (Score:3, Insightful)
"Not implementable" (Score:2)
"...a website that aims to create a centralized list of machine learning papers that are not implementable..."
You're gonna need a bigger boat.
tautology (Score:2)
This is expected (Score:2)
Your job depends on getting papers published. It does not depend on having reproducable results, nor sharing your code or data.
In fact, sharing code and data might be counter productive, since it *is* a competition in many cases.
So, you keep everything to yourself, and journal editors try to make decisions by guessing whether the proposed method *could* actually work.
Re:Say what? (Score:5, Informative)
I don't know WTF this article is trying to say, at all.
Back story: Most academic (and industry) research papers discussing advancements in "Machine Learning" illustrate how they go to their results. Think of it as a video recipe for a fancy meal for dinner. The problem? The majority (over 50%) of these "recipes" don't actually work when you try it in your own kitchen!
The real-world take-away: all of these "advancements" cannot be repeated by literally anybody and are as much a computational fluke of the moment as they're essentially rubbish (and are only "peer reviewed" for grammar rather than successfully creating the "recipe" from scratch).
Re: (Score:2)
Reproducibility and Replicability is a basic tenet of the scientific process. Without it, you may as well publish articles on Alchemy and Free Energy.
Re: (Score:2, Insightful)
Re: (Score:3, Informative)
Machine learning is a branch of computer science, and computer science is a branch of applied mathematics. It's pretty much as far from your "so-called social sciences" as you can get.
Re:Say what? (Score:5, Informative)
Until it is very much like social sciences and homeopathy with layers of abstract mathematical analyses on top of analyses indicating an "deduced" result without showing testable results or reliable induction from evidence.. I'm afraid it's increasingly common in artifical intelligence research, with meta-analyses of meta-analyses "proving" results. in the field, rather than providing testable or falsifiable theories and conclusions. It's a problem common to "dark matter" physics as results, presenting theories in search of data to support them. Strikingly, if you have funding for your theory to find data, it's startling how often you can manage to find data and ignore the rest of the field which refutes it.
A great deal of artificial intelligence research is, unfortunately, mislabeled engineering or computer science to add excitement and increase funding. It's a shame, the work is interesting enough without unnecessarily encumbering it with irelevant "AI" labels.
Re: (Score:2)
It's a problem common to "dark matter" physics as results, presenting theories in search of data to support them.
You lost me right there.
That is pure, unadulerated, dumbfuckery.
DM isn't a theory searching for data to support them, it's a dwindling number of hypotheses attempting to match data that is very confusing.
We have very good gravitational data for DM. We have zero particle physics data for it.
Trying to reconcile those is problematic for some (in spite of the fact that gravity in general isn't reconciled with particle physics)
We have explored a hundred different ways the Universe could work that would f
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I'm referring to the extraordinarily contorted demands of increasingly extraordinary non-baryonic matter to explain what is a deduced, not a directly measured, shortage in the amount of detected matter in the cosmos. It's "dark matter" not because it's exotic, but because it's not easily detected yet apparently has some gravitational effects on cosmological scales.
The first logical possibility is that our estimates, and they are very contrived estimates, of the rate of the expansion are systematically skewe
Re: (Score:2)
I'm referring to the extraordinarily contorted demands of increasingly extraordinary non-baryonic matter to explain what is a deduced, not a directly measured, shortage in the amount of detected matter in the cosmos. It's "dark matter" not because it's exotic, but because it's not easily detected yet apparently has some gravitational effects on cosmological scales.
The demands are hardly contorted.
Your demand of direct measurement is absurd on its face.
Were you as skeptical about the Higgs before they found it at the LHC?
Massive neutrinos?
The Standard Model's current form is literally a running historical record as to why it should *not* be used as any kind of crutch for skepticism toward the nature of *whatever* causes the gravitational effects we observe.
The point being that there are far, far simpler explanations for the deduced "dark matter" which require no extraordinary properties.
And this is where you are simply wrong.
Flat out, simply wrong.
Let's play a game. You pick your favorite MON
Re: (Score:2)
Modification of Newtonian Dynamics? That is a straw man. I've mentioned _three_ theories, none of which demand inventing new physics. Extragalactic matter, essentially rogue planets or brown dwarfs formed without associated stars or galaxies. Systemic errors measuring the distance and velocity of the most distant detectable objects, errors that are quite likely with the poor understanding of galactic and stellar evolution over the course of tens of billions of years. And gravitational lensing screwing up t
Re: (Score:2)
Modification of Newtonian Dynamics? That is a straw man.
It wasn't. It was a bad assumption on my side.
I wrote that before I read your support for MACHOs- so I apologize for that.
So, more to the point, since you seem to support the MACHO model of DM (non-modified gravity, DM is baryonic) then you essentially support DM.
You're on the losing side of it, since there is very good evidence that the missing mass is non-baryonic in nature (or more indirectly, all of the evidence that supports our current Lamba-CDM is somehow wrong if it isn't) but that doesn't matte
Re: (Score:2)
That.... took me a few moments to follow. There is a great deal of support for the claim that there is no non-baryonic matter, or no significant amounts of it, namely the observations to date of our physical reality. The evidence for non-baryonic dark matter is _very_ thin, being primarily the observations of the expansion of the universe on a cosmological scale and for which there are several far more likely possibilities. It is _exciting_ to propose an entirely new form of matter, and difficult to falsify
Re: (Score:2)
The reason our current best cosmological model is Lambda-CDM is because it is what the evidence best supports.
You can point out problems with it all day, and nobody denies they exist.
But the fact is, it has more supporting evidence than anything else- and unless Lambda-CDM is *very* fucking wrong, there simply isn't enough baryonic matter to match what we see, gravitationally speaking.
Re: (Score:2)
> Lambda-CDM is *very* fucking wrong,
Yes, I'm afraid it is. The lack of evidence, or of any foreseeable evidence, for the outrageously complex theories of non-baryonic should be taken as evidence of the lack of non-baryonic mattter, much as the absence of evidence of magnetic monopoles should be taken as evidence of their non-existence, despite the fact that they would make Maxwell's equations more visually elegant.
Re: (Score:2)
Yes, I'm afraid it is.
I look forward to reading your dissertation or paper.
The lack of evidence, or of any foreseeable evidence, for the outrageously complex theories of non-baryonic should be taken as evidence of the lack of non-baryonic mattter
Ignoring that the premise of this is false, it falls pitfall to a basic logic error: absence of evidence is not evidence of absence.
Your argument, not long ago, could have been used to say that there is evidence of a lack of massive neutrinos.
This is again, false.
much as the absence of evidence of magnetic monopoles should be taken as evidence of their non-existence, despite the fact that they would make Maxwell's equations more visually elegant.
Much as the absence of evidence of germs was evidence of the miasma theory of pathology... in that it wasn't.
I'm sorry, but you're very wrong here. You're entitled to your opinion, but it wou
Re: (Score:2)
Re: (Score:2)
Please point to any SS studies that have done that.
Re: (Score:2)
Give a list. Be explicit.
Re: (Score:2)
Re: (Score:2)
Have you even seen the books in question? "Chinamen" and black people made to look like apes.
They actually changed it years ago to be less racist and nobody seemed to care.
Re: (Score:3)
Re: (Score:2)
Are they basic tenets?.
Yes, obviously. The corruption of academia by capitalist interests can't change fundamental principles, it can get people to act against them, but it's a losing proposition in the long run.
If you pay "scientists" to "research" something without using science, your results are going to be useless garbage, because using science is the only way to get real results.
Actually using science produces results valuable in and of themselves that you don't need smoke and mirrors and a song and dance to convince peop
Re: (Score:2)
This has nothing to do with "researching" without "using science". It has to do with validating claims made using cutting edge technologies before res
Re: (Score:2)
Re: (Score:2)
You can still do research at student-focused universities and frankly the best scientists I know work at such universities. You act like they cannot find other jobs. They can, they just prefer sucking the grant tit.
Any organization has responsibilities. Selling guns to terrorists because people need to get fed is a perfect excuse for companies but they wash their hands by using gun runners.
Likewise tenure at different places is very different. Student-focused universities will still want professors to do re
Re: (Score:2)
Re: (Score:2)
A paper should include a link to the full source code and data that can reproduce the results.
If all the source and data are not available, the paper should not be published.
I remember Mark Weiser [wikipedia.org] making the same recommendation at a conference 30 years ago. It is shameful that it still isn't standard practice.
Re:Say what? (Score:4, Insightful)
Re: Say what? (Score:3)
The worst enemy of determinism is multi threading. In theory you can make split and join programming deterministic but what if both threads call the same RNG instance? They will then get the otherwise determined random number in a random order.
But also small things, like using a pointer as key in a map or a set, will reorder when you
Re: Say what? (Score:4, Insightful)
Re: (Score:3)
"whatever is necessary" may not fit in the paper. Have you neard of the concept of "laying on of hands", where the authors of a paper visit other labs and walk them through procedural details they may not have successfully conveyed in the paper? This happens in many fields, where subtle differences in laboratory equipment and procedure unintentionally block experiments from succeeding, and it takes a visit from the original authors or their students to work out the problem.
That kind of "laying of hands" pay
Re: (Score:2)
I have not, but that's pretty interesting. I (and a lot of the reproducibility literature I've read) recognize that for many laboratory tasks, it may just not be possible to express the lab procedures fully and with enough detail for others to perform them based only on the description. Clearly, if you can go and te
Re: (Score:2)
Many details are assumed. Even quite ordinary details like measuring temperature can very based on measurement _inside_ an object, _outside_ an object, or measurement with a thermometer or an infra-red device when the object is being actively heated.
Re: (Score:2, Insightful)
but what if both threads call the same RNG instance?
Then you, the programmer, made sure your system was not deterministic.
For fuck sakes, monte-carlo methods are not fucking new. Every popular prng has had counter modes, FOR DECADES, that allow any number of threads you want to each have their own location within the effectively inexhaustible randomized sequence.
You would have gotten a pass if you were complaining about an actual sequential issue, such as how the way chess engines use hash tables leads to non-determinism when multi-threading. In that ca
Re: (Score:2)
You would put a seperate rng generator with its own seed into each thread.
But keep in mind the stock rng library will get you only 8 or 9 tails in a row, max, no matter how long you let it run. Which isn't that long today.
That's another thing that can get crushed by the tyranny of dimensions.
Re: (Score:3)
While RNG variance can influence small sample sizes, AI training is anything but small. If you need to include a seed to to reproduce your results, you got lucky, and need to refine your method so it produces more consistent results.
Re: (Score:3)
And, for code that uses some nondeterministic method, as is common in machine learning, the code must include the RNG seed and/or whatever is necessary to get the same answer as in the paper.
Deep learning training is almost always non deterministic, because it's massively parallel. You can run the toolkits in deterministic mode for debugging, but the performance penalty is huge so if you're anywhere near the cutting edge, you won't be able to get a deterministic results.
Re: (Score:3)
And, for code that uses some nondeterministic method, as is common in machine learning, the code must include the RNG seed and/or whatever is necessary to get the same answer as in the paper.
Hasn't Dijkstra taught us that forcing deterministic execution of non-deterministic computations only serves to hide bugs?
Re: Say what? (Score:3)
Not just the code, even the labnotes should really be included.
Re:Say what? (Score:5, Interesting)
eh...
I think this is massively overstated. I used to think as you do until I put my money where my mouth is. Quite a number of years ago, I wrote a relatively complex ML system as these things go (the modern easy to use toolkits didn't exist then as they do now, so it was a lot of work). I published the paper complete with the C++ source code and continued to maintain it (and continue to do so). The paper did well, clearing many citations so far which is maybe not so good by deep learning standards, but by pre deep learning standards, it has a lot of exposure, way way more than average.
Estimated number of people who ran the code: big, fat 0.
Other ones with many citations (hundreds), have got a lot of use of the code, or more realistically the compiled binaries.
The thing is there are all sorts of problems. One is that it's actually relatively hard to get code reliably to work on someone else's machine. It's OK for me, but I've always leaned heavily in the direction of software engineering. For most researchers, working code works on their machine, and that's about it. It's got a bit easier with deep learning, given the prevalence of the all-in-one toolkits that do more or less everything for you. but for other and especially bigger systems, not a chance.
It also depends on the type of paper. My first one demonstrated that something was possible. People apparently wanted to know that, but had their own very different ideas based off mine, so no one was really interested in my implementation. The second was a technique: people wanted to use the technique on their own data, but weren't on the whole that interested in taking the idea further (some were, not the majority).
But many people hide behind non reproducable statistical fluke.
So I've always been in favour and have put my time where my mouth is, and I think it's not nearly as clear as many people think.
Re:Say what? (Score:4, Insightful)
People could start archiving VMs running the code rather than just the code? I agree you can forget about running something 10+ years old.
Re: (Score:2)
Providing a VM is maybe overkill, but some have started to do just that.
I have also seen docker containers, jupyter, notebook, and git repository with complete dependencies and install scripts.
That works fine for some things.
Some works are just harder to reproduce though. In parallel computing/high performance, each platform may be somewhat different. And the results you get are machine dependent and the machine may be millions of dollars and won't exist anymore a year down the road (because configuration w
Re: (Score:2)
People could start archiving VMs running the code rather than just the code? I agree you can forget about running something 10+ years old.
Docker is getting more popular though my code long predates docker. I've maintained it so it still runs, unusually. But I wonder how long docker will stay compatible for.
Re: (Score:2)
The point is, if they feel there are problems with your o/p, they **can** run it.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re:Say what? (Score:5, Interesting)
and are only "peer reviewed" for grammar rather than successfully creating the "recipe" from scratch
Note: there is no field where peer reviewers reproduce the paper. That's not how peer review works.
Peer review is like a low pass filter, that gets rid of the really bad papers.
Re: (Score:2)
and are only "peer reviewed" for grammar rather than successfully creating the "recipe" from scratch
Note: there is no field where peer reviewers reproduce the paper. That's not how peer review works.
Peer review is like a low pass filter, that gets rid of the really bad papers.
100% Agreed. I actually added the quip to stave off any "but they're peer-reviewed" responses. =)
Re: (Score:2)
Mathematics (and others, e.g. teoretical computer science does it for journals or you are at least expected to do so as a reviewer - most ML does not get into journals and wheather it is TCS can also be discussed) check the proofs which is more or less the equivalent.
Re: (Score:2)
Fixed.
Re: (Score:2)
Computer aecurity is just as bad. If I had a penny for every paper I've read that presents some amazing new tool that does everything and beats every other tool out there, but that never releases the tool, I'd have... $121.72.
I estimate that maybe one in a hundred papers that presents some new tool or technique actually provides the tool for others to use. For the other 99, you have to take the researchers at their word that they haven't just made the results up.
Re: (Score:2)
The problem? The majority (over 50%) of these "recipes" don't actually work when you try it in your own kitchen!
How long have you been spying on me in my kitchen!?
Re: (Score:2)
How long have you been spying on me in my kitchen!?
Long enough to know to NOT try your recipes =OP
Re: (Score:2)
Reviewed for grammar? You're not a linguist. I've seen so many badly worded texts out there that one could wonder how they even managed to get through.
Re: (Score:2)
Reviewed for grammar? You're not a linguist. I've seen so many badly worded texts out there that one could wonder how they even managed to get through.
https://boredhumans.com/resear... [boredhumans.com]
https://www.nature.com/article... [nature.com]
https://news.mit.edu/2015/how-... [mit.edu]
https://www.sciencemag.org/new... [sciencemag.org]
I agree that a lot of research papers are terrible at grammar (and, by the way, the field of linguistics is not about proofreading grammar rules compliance), but for the given audience, it may be intelligible enough to warrant publication (by the for-profit journal).
Re: (Score:2)
Honestly, this is really the fault of the field. Computer science has a long history of accepting single numbers as results. When you're talking about real computation that can be okay, but machine learning isn't computation, it's statistics.
Most people write their super-great-ML paper and quote some value for accuracy and say it's bigger than some other value someone else published. Then reviewers accept it. You can reproduce their result if you try hard enough, or they give you code, but it's meaningless
Re: (Score:2)
Honestly, this is really the fault of the field.
Very true! Just another headline today referencing a Microsoft-led Team retracting some quantum computing paper https://science.slashdot.org/s... [slashdot.org]
Re: (Score:2)
At least that one sounds like it was fraud. Nothing protects against people outright lying other than painful replication. This issue with the machine learning field not doing basic statistics affects nearly every paper and is trivially avoidable. No confidence interval? No publish.
The use of leaderboards and focus on "state-of-the-art" result just encourages it. Except for the big jumps in performance, pretty much everything else is a game of who can guess the best random seed. There are even papers about
Re: (Score:2)
Was the other one posted by the same obvious troll account?
Re: (Score:2)
Peer review isn't noise. That's not to say it can't be improved in many areas, and there needs to be greater accountability. But trashing the system isn't going to help anything.
Re: (Score:3)
A few years ago, there were some really good AI bloggers. I remember in particular Andrej Karpathy and Chris Olah, and one other guy with an uncommon last name (sadly it's all ungoogleable now due to how SEO-spammed anything ML related has become). There was a bigger community of people who shared code, torch/pytorch had a terrific community around it. There were so many great ideas and people who shared code, and you'd see them all happily copying and pasting from e.g. the original DCGAN code.
Now Karpathy
Re: (Score:2)
Re: (Score:2)
> BTW, what about all those human-written theoretical proofs that nobody ever verifies in details?
I find that the reviewers in theoretical venues in CS are very careful about that. Even in theory proof of domain conferences, they actually check the proofs. Sometimes not all of them, but anything fishy will usually get pointed out. And the program comittees never take a chance; if they think it may be wrong, they would rather err on the side of caution and reject the paper.
Also, in a theoretical paper, th