Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Microsoft AI

AI Beats Humans at Reading Comprehension (bloomberg.com) 171

In what is being called a landmark moment for natural language processing, Alibaba and Microsoft have developed AIs that can outperform humans on a reading and comprehension test. From a report: Alibaba Group put its deep neural network model through its paces last week, asking the AI to provide exact answers to more than 100,000 questions comprising a quiz that's considered one of the world's most authoritative machine-reading gauges. The model developed by Alibaba's Institute of Data Science of Technologies scored 82.44, edging past the 82.304 that rival humans achieved. Alibaba said it's the first time a machine has out-done a real person in such a contest. Microsoft achieved a similar feat, scoring 82.650 on the same test, but those results were finalized a day after Alibaba's, the company said.
This discussion has been archived. No new comments can be posted.

AI Beats Humans at Reading Comprehension

Comments Filter:
  • Al seems to be able to do a lot of stuff lately. It seems this one guy named Al is doing everyone jobs at once. How do I get him on my payroll.

  • This says way more about the quality of our school system...

    • If the test were 100,000 questions they are lucky they could get anyone to complete it all. If they averaged 1 minute per question and did the test for 8 hours a day it would take about 208 days to complete, throw in some week ends and holidays your looking at about a year.

    • by Anonymous Coward

      This says way more about the quality of our school system...

      A 30-year old calculator can outperform just about any human in math. It doesn't take much to best a meatsack, which is exactly why good enough AI will be all it takes to start replacing human workers. We don't even have to come close to perfecting that technology as many people purport will be necessary to start affecting jobs. We currently pay humans a lot of money for nothing more than an imperfect result. AI adoption will be no different.

    • Is our school system in decline or are the students populating it somehow different than they were in the first half of the 20th century. hmm... On an unrelated note, Swedes seem to be losing a quarter a point of IQ ever year. Strange.
      • Literacy rates today are far, far better than in the first half of the 20th century -- even in the USA and Europe a large percentage of the population couldn't read or write at all back then.

        Of course, English literacy in the USA has likely decreased in the 21st century due to immigration. Back in 2003 I scored California High School Exit Exams for a bit, and it was obvious that a lot of the students simply had not learned English yet but may have been quite proficient in their own language.

        • by kqs ( 1038910 )

          Of course, English literacy in the USA has likely decreased in the 21st century due to immigration.

          Seems unlikely; it's not like we never had immigrants in the past. My great-grandparents (or maybe g-g-gp), who came to Pittsburgh to work in the steel mills, only spoke Italian or German and probably could not read their own language. My wife's g-g-gp only spoke Polish. I doubt if current immigrants decrease the English literacy more than the 20th century immigrants did, though I can believe that we measure current literacy far better than we measured the literacy of the previous immigration waves.

    • by rhazz ( 2853871 )
      I don't think the results are directly comparable, though the article doesn't elaborate on what the test is like. The question quoted in the article is "what causes rain". Do you score a point if you understand the question, or do you only get a point if you can both understand the question and provide the answer? AIs would parse the question and then return a result based on a massive knowledge base. Are human's allowed to look up the answer? Was the human score a single smart human or was it an average ov
      • by djinn6 ( 1868030 ) on Monday January 15, 2018 @04:29PM (#55933689)
        The questions are nothing like that. Here's the reading material [github.io]:

        Packet mode communication may be implemented with or without intermediate forwarding nodes (packet switches or routers). Packets are normally forwarded by intermediate network nodes asynchronously using first-in, first-out buffering, but may be forwarded according to some scheduling discipline for fair queuing, traffic shaping, or for differentiated or guaranteed quality of service, such as weighted fair queuing or leaky bucket. In case of a shared physical medium (such as radio or 10BASE5), the packets may be delivered according to a multiple access scheme.

        And here's the questions:

        How are packets normally forwarded?
        Answer: asynchronously using first-in, first-out buffering, but may be forwarded according to some scheduling discipline for fair queuing

        How is packet mode communication implemented?
        Answer: with or without intermediate forwarding nodes

        In cases of shared physical medium how are they delivered?
        Answer: according to a multiple access scheme

        So the test taker only needs to find a selection of the original text that answers the question.

        The way I see it, the real issue with the "reading comprehension" quiz is that you don't need to actually comprehend the text to answer it. A better question than "How are packets normally forwarded?" would be something like "What are some situations where packets are not forwarded in the fifo order?" The first question only requires you to find the words "packets", "normally" and "forwarded" in the paragraph and answer with the rest of the sentence. The second question requires you to understand that the text is presenting 2 options, one is "normal" and the other isn't.

        There's also some official answers that are just plain incorrect. The answer to "How is packet mode communication implemented?" is the entire rest of the paragraph, not just "with or without intermediate forwarding nodes".

    • Using the reading comprehension of Slashdot commenters as a gauge, I'm not a bit surprised that AI (or a child's toy) has better comprehension. Just this morning a guy here said "high explosives ... nobody is talking about low explosives" - in a thread about black powder. His own previous post said "explosives like black powder". Far too often, Slashdot commenters don't even comprehend their own posts, much less the article.

    • I was curious if they were really tested in "reading and comprehension" so I read the story, and it only talked about a reading test.

      The humans' comprehension isn't even good enough to talk about what the robot can do. It is like we're flapping our arms to understand an airplane.

    • Doug Lenat's Test (Score:5, Insightful)

      by sycodon ( 149926 ) on Monday January 15, 2018 @11:24AM (#55931571)

      “Mary saw a bicycle in the store window. She wanted it.”

      Does Mary want the bike, the store, or the window?

      • by amalcolm ( 1838434 ) on Monday January 15, 2018 @11:28AM (#55931597)
        Obviously, she was gagging for sex. The window shopping was just a distraction
      • by pem ( 1013437 )
        This is not an edge case. The rules of English, if properly followed by both writer and reader, render the object of Mary's desire unambiguous, and if this is the sort of thing Doug Lenat is focused on, it's no wonder he's falling behind.
        • It isn't unambiguous to normal people. That is the point, and that is the difference between intelligence and just following rules. You just proved his point.
          • by f00zbll ( 526151 )

            It isn't unambiguous to normal people. That is the point, and that is the difference between intelligence and just following rules. You just proved his point.

            Honestly that statement isn't true for all situations. Without the context, the pronoun "it" could refer to the window or store. If that sentence was in a paragraph about a girl that has dreamed of owning a bicycle shop, "it" probably refers to the store. If the girl was a stained-glass artist and the window has a stained glass border, it could be the window. The intelligent answer to that question isn't "it refers to the bicycle." A more intelligent response is "tell me more about the girl and the context

            • by Anonymous Coward

              No, if the statement is allowed to stand as is, then "the bicycle" is the only reasonable answer to what Mary wants. That's common sense based upon numerical probabilities and ordinary everyday business.

              And just suppose that Mary wants the window or the store. If the speaker doesn't make the effort to state that, the result is on them. Not the listener! An unusual request, statement or situation is entirely the responsibility of the speaker to clarify. If the listener does so that's fine, but the actua

        • by sycodon ( 149926 )

          Because people speaking in normal conversation always use the proper rules of English?

          That's the entire point of his work is to enable the computer to understand things that you and I intuitively understand, but which is vague and indeterminate to a computer.

          On the other hand, something AI could benefit from is a properly defined AI interface syntax. Like it does for coding, a properly defined syntax for interacting verbally with computers could move things ahead quite a bit by eliminating the need for the

        • by Kjella ( 173770 )

          This is not an edge case. The rules of English, if properly followed by both writer and reader, render the object of Mary's desire unambiguous, and if this is the sort of thing Doug Lenat is focused on, it's no wonder he's falling behind.

          That sentence is fairly unambiguous but the construct is not. "Mary remembered all the long trips in the back seat of daddy's car, she and her brother playing games and singing along to Elvis on the radio. She missed it." What did she miss, the long trips? The back seat? Daddy's car? Playing games? Singing along to the radio? Listening to Elvis? Childhood? Family? All of the above, individually? All of the above, simultaneously? The use of "in" doesn't even mean it's the object of desire, like "Mary caught

          • If the rest of it was well-written, I'd assume she wanted the mannequin. If the rest of it was dribble, I'd assume she wanted the dress and the writer sucked.

            Actually, that computer can probably do that meta-analysis very easily once they get to the point of trying to add that much context awareness.

          • The dress is described as something that might be desired with some details, and the mannequin is just mentioned. How about "Mary caught sight of a finely made mannequin that appeared to match her figure dressed in a wedding gown."? This is not a matter of the rules of English, since I can keep the sentence and adjust adjectives to make Mary want either the mannequin or the gown. Heck, how about "Mary caught sight of a well-made mannequin dressed in a tacky, overblown wedding gown."? It depends on the c

      • by f00zbll ( 526151 )
        Clearly the person that wrote the sentence doesn't know how to write succinctly without ambiguity. Which sadly represents 80% of the US population. In my 18 years of experience in IT, 95% of the engineers write worse than that and don't realize their writing is shit.

        A deep neural net won't be able to do shit with that sentence probably guess roughly the same as random. If we make it into a whole paragraph to provide more context, a DNN can improve the accuracy. But the root of the problem is that far too m

        • "Mary saw a bicycle in the store window. She wanted it."

          Any intelligent system knows that "it" refers to the bicycle. There is no ambiguity when you use that sentence with (non-autistic) people. That is what is wrong with "a deep neural net". It isn't "deep" or anything like a brain.
          • Remember? The one where computers performed at par with humans on a very closely related task to this?

            Like, I understand why you're aroused by this impossible-to-overcome flaw you're imagining, where computers could never guess what the "it" is referring to - but it turns out they don't work like the text parsers in 80s chat-bots, and are capable of correctly interpreting these references. They continue to improve at exactly this kind of work.

            And hey, do you know what the "deep" in "deep neural net" means

          • I'm ASD, you insensitive clod!

            Anyway, you're not mentioning the store as a noun, and people want bicycles much more often than they want store windows. A lot of this is context. "Mary had been trying to decide what sort of small business to set up. Then she saw a bicycle in the window of a bicycle shop. She wanted it." I've made it a lot more ambiguous, by adding a completely different sentence in front of it.

        • Comment removed based on user account deletion
      • See also Winograd schemas [wikipedia.org] which are more nuanced than that:

        The city councilmen refused the demonstrators a permit because they feared violence. Who feared violence?

        The city councilmen refused the demonstrators a permit because they advocated violence. Who advocated violence?

        Both instances of the schema are unambiguous, yet machines have difficult telling them appart and knowing who does what at each.

        In these cases, you can't merely decide which one is the correct subject based on properties that only apply

        • The city councilmen refused the demonstrators a permit because they feared violence. Who feared violence?

          The city councilmen refused the demonstrators a permit because they advocated violence. Who advocated violence?

          ...

          In these cases, you can't merely decide which one is the correct subject based on properties that only apply to that item in the sentence and discarding the others; you need to understand the situation.

          You don't need to understand at all, you just need semantic analysis in addition to lexical and syntactic analysis.

          And you can then narrow it down until there is a single meaning. You don't need to "understand," which is an abstract concept that an AI can never hope to achieve. You just need enough semantic meta-data about the words and phrases to construct additional rules beyond what the human writing teachers have enumerated as style guides. (For English has not rules)

          If you can identify that permits hav

          • At some point (unless you're going to go mystical on us), semantic analysis blends in with understanding. You're saying that it needs to have semantic metadata on moderately common phrases, and there's tons of those. It would be easy to miss some, and suddenly your system fails unexpectedly. "...because they had trepidations about violence" - that's highly unlikely to be in your semantic metadatabank, and can be interpreted. An English speaker with sufficient vocabulary will parse the sentence correctl

      • Comment removed based on user account deletion
      • The questions in this test weren't like that. The reading passages were Wikipedia articles, and the questions asked about objective statements that were clearly given in the passage. Here's an example [github.io].

        The test you're talking about looks at something totally different. It presents ambiguous sentences with no context. The reader is supposed to use their existing knowledge to resolve the ambiguity and infer what the sentence is talking about. These are both interesting and important problems. But they're

  • No Kidding (Score:4, Insightful)

    by tsqr ( 808554 ) on Monday January 15, 2018 @11:04AM (#55931469)

    Based upon the knee-jerk quality of many comments posted on /. this should not be a surprise to anyone.

  • by cstacy ( 534252 ) on Monday January 15, 2018 @11:14AM (#55931523)

    Sadly, this does not surprise me.

    Most people don't read and have shockingly poor comprehension when they do.
    This has gotten much worse (at least in the US) over the past 100 years.

    LOL I didn't bother to read TFA so perhaps totally don't comprehend what it said...

    • Sadly, this does not surprise me.

      Most people don't read and have shockingly poor comprehension when they do. This has gotten much worse (at least in the US) over the past 100 years.

      LOL I didn't bother to read TFA so perhaps totally don't comprehend what it said...

      With cutbacks to education and the abysmal teaching salaries, are you honestly surprised education has gotten so bad in the US?

      • by alvinrod ( 889928 ) on Monday January 15, 2018 @11:47AM (#55931751)
        It's okay in a lot of areas, it's just that no one really gives a shit about the inner city school districts and so they've gone to absolute shit. If you remove that from the equation, the U.S. as a whole is quite comparable to other western democracies. The U.S. has seems more content to let this problem fester and to deal with the consequences rather than tackle it head on so the problem just goes from bad to worse in a lot of ways.

        On a side note, if there weren't so many useless (not as in they suck at their jobs, just useless in that their jobs don't improve educational outcomes in any measurable way) administrators soaking up money, we could pay teachers a hell of a lot more. The U.S. spends more on education as a percentage of GDP than other countries that do as well or better than us, and over time our spending on education as a percentage of GDP has increased. Even though you hear about cutbacks all the time (who pays attention when funding is increased?) the trend has been moving upward over time. So it's not strictly a money problem.

        Here's a good report [edchoice.org] (PDF warning) that has looked into how public education has changed in the U.S. over time. The increase in administrative staff has done nothing to improve outcomes and removing the excess would allow for an additional ~$11,000 in yearly salary for every teacher.
        • Our drug policy + criminal justice system ensures the poor stay safely in the bounds of their own distinct. All without any messy discussions about segregation. When the Rodney King riots happened we used militarized police to surround The neighborhoods and they wreaked their own stuff all without spilling over to the middle class neighborhoods let alone the rich ones.
        • Many people care about the inner city schools. In my city, one group has consistently tried for many decades to get under performing teachers and administrators fired, reassigned, or removed from inner city schools. Their counterparts in city government and school administration have rebuffed them by calling them racists, and demanding that under performing and detrimental administrators and teachers keep their jobs because of the color of their skin.

          Then one group tried to give children choices besides e

          • My point is that society as a whole doesn't really care, not that there isn't some individual person out there that isn't trying to solve the problem. You should be able to figure out that in plain language saying "no one" doesn't imply a universal quantifier across the entire population. Individual teachers probably care (until they get burned out) but they can do fuck all and probably have their hands tied by the system as much as anything. The same goes for other groups and individuals as well, who lack
    • Most cases of poor reading comprehension that I encounter would better be described as sloppy reading. If people took their head out of their own arses while reading, they'd understand perfectly.

    • Illiteracy has gone down over the past century in the US. What we talk about now is "functional illiteracy", which I don't believe they even tracked a hundred years ago.

  • Now I believe in AI. It provided "exact answers" to a quiz that is "authoritative". And more than 100,000 questions too! Very impressive!
  • And what material. Dice rolls could probably outperform slashdot readers on article summaries.

  • AI can answer "What causes it to rain?" ...meh
    AI can answer "Will it rain anywhere I'll be tomorrow?" ...okay, that is better
    AI can answer "What will the weather be 6 months from now when I want to go on a cruise?" ..now you've got my attention.

    Maybe have the AI read the farmer's almanac?
  • by Anonymous Coward

    Is not 'reading comprehension'. This was the problem with the common core: comprehension is not a recognition of memorized information, no it is an actual understanding of how information relates to experience and new ideas, and those are unique to every person, almost impossible to quantify (it's different with pure logic like math, where 2 + 2 always equals four, that is why computers are good at it). What is described here is essentially transcription, the software doesn't 'know' diddly squat (well done!

  • In what is being called a landmark moment for natural language processing, Alibaba and Microsoft have developed AIs that can outperform humans on a reading and comprehension test.

    WHICH humans? I know people that my dog can probably outperform on a reading test. If this is basically a lookup contest ala Watson on Jeopardy, that's not really reading comprehension. That's an expert system doing what they are designed to do. It's only AI in the most rudimentary form.

  • This is plain stupid as it is just misleading. Granted, scholars can barely read these days. Nonetheless, what is being defined as 'AI' here might lead to frustration when people actually expect some sort of 'intelligence' from it.
    • Of course it is misleading. This is aimed squarely at hyping up "AI" to attract investment capital.
      • Get into AI and you'll have a definition of intelligence early on which is extremely broad. Get in far enough to realize where things are today and you'll see that it really stands for Applied Intelligence where nothing even begins to come close to what people think of as AI in sci-fi.

        Furthermore, what all such experiments demonstrate is the EVALUATION system and how well it can be gamed. An AI can figure out your exams or gameshows and learn to do better than the average human at them (and the average hu

        • So far there is only AI capability when there is a well-defined set of rules. Chess and Go have a small set of rules. Language has rules that are complex but it's not like grammar isn't something that is studied and well understood. Compare that to an activity like driving, where you may need to judge if a bent and half-obscured stop sign is a legal one, or interpret whether a front end loader operator wants you to wait for it or pass around it, or interpret what construction workers mean by analyzing po
          • IBM clobbered the best Jeopardy players in history. Remember? Language didn't stop it. The reality is that even NOT understanding english, a powerful AI today is able to find patterns without understanding to beat human real understanding when evaluated in a Jeopardy! exam.

            What they really did is learn the history of Jeopardy! questions and evaluate patterns that the limited set of question writers for that show use to create questions and answers and the syntax game for flipping answer/question Jeopard

        • Yeah, no, I already "got into AI" when I studied it at University. It is total BS at this point.
    • by Anonymous Coward

      Weak AI is called also AI. Just like sharks can be called also fish. It makes sense to omit the "weak" since all AI we can currently make is weak AI.

      It also makes sense to call this AI instead of calling it algorithms, since it is trained to perform the tasks with training data, which makes it very different from traditional algorithms where every decision is hand coded by humans or carefully controlled by some library data.

      So AI is a good term in this case. If you have a better term, it doesn't matter, bec

  • by account_deleted ( 4530225 ) on Monday January 15, 2018 @11:48AM (#55931771)
    Comment removed based on user account deletion
  • by wafflemonger ( 515122 ) on Monday January 15, 2018 @12:06PM (#55931905)

    A much better test would be seeing if it could understand some deconstructionist literary criticism.

    • A much better test would be seeing if it could understand some deconstructionist literary criticism.

      As I see it, the whole point of deconstructionist literary criticism is that it's not understandable. And I don't mean, not understandable by the hoi polloi; deconstructionist literary criticism fails if anybody, up to and including Derrida-quoting luminaries manage to make any sense of it. I think deconstructionist literary criticism is a huge hoax played on society by a group of literary pranksters, who compete on seeing how far they can trick their marks into accepting and admiring meaningless drivel.

      I e

  • Comment removed based on user account deletion
  • Maybe AI would have read the operator's manual better for Hawaii's emergency alert system ...
  • Another land mark of not A.I. achieved. Soon every task a human can do that can be done by a machine will also not be A.I.!

  • by PPH ( 736903 )

    Who is this Al guy everyone is speaking of?

  • Then they'll SEE how good its comprehension really is!

    • SMight be interesting if it used moderation to weight inputs.

      Nothing below +3, with some sort of categorization by moderation type (Funny, Insightful, etc.).

      Let one of these systems process 10 years of Slashdot comments, see what comes out.

      Better yet, time limit response inputs for test questions.

  • My *dog* reads better than some of the younger people graduating from high school.
  • The Text to be "Comprehended":
    "The principle is mix the adhesive in a 1:1 ratio by weight. To mix the adhesive fill the bucket one third the up with Part A and weigh it subtracting the Tare weight. The bucket weighs 25 ounces and the scale reads 25 pounds. How much should hardener should we add to the bucket.
    Note: The Part A adhesive is 19.2 pounds per gallon and The Part B adhesive is 12.7 pounds per gallon."

    Feeds the text to an AI...
    Raw parse tree is generated and up comes a google search about Vaping and

  • AI might PARSE sentences better than humans, and even relate it to other text... according to the way humans read its output.

    But today's state of "AI" doesn't "comprehend" a damn thing.

    There is nothing to do the comprehending. There is no mind. This is a completely one-off, specifically programmed task. Which we already know computers are good at.

    But "comprehension"? Not a chance.

Our OS who art in CPU, UNIX be thy name. Thy programs run, thy syscalls done, In kernel as it is in user!

Working...