Anthropic Builds RAG Directly Into Claude Models With New Citations API (arstechnica.com) 22

Posted by BeauHD on Monday January 27, 2025 @07:40PM from the minimizing-hallucinations dept.

An anonymous reader quotes a report from Ars Technica: On Thursday, Anthropic announced Citations, a new API feature that helps Claude models avoid confabulations (also called hallucinations) by linking their responses directly to source documents. The feature lets developers add documents to Claude's context window, enabling the model to automatically cite specific passages it uses to generate answers. "When Citations is enabled, the API processes user-provided source documents (PDF documents and plaintext files) by chunking them into sentences," Anthropic says. "These chunked sentences, along with user-provided context, are then passed to the model with the user's query."

The company describes several potential uses for Citations, including summarizing case files with source-linked key points, answering questions across financial documents with traced references, and powering support systems that cite specific product documentation. In its own internal testing, the company says that the feature improved recall accuracy by up to 15 percent compared to custom citation implementations created by users within prompts. While a 15 percent improvement in accurate recall doesn't sound like much, the new feature still attracted interest from AI researchers like Simon Willison because of its fundamental integration of Retrieval Augmented Generation (RAG) techniques. In a detailed post on his blog, Willison explained why citation features are important.

"The core of the Retrieval Augmented Generation (RAG) pattern is to take a user's question, retrieve portions of documents that might be relevant to that question and then answer the question by including those text fragments in the context provided to the LLM," he writes. "This usually works well, but there is still a risk that the model may answer based on other information from its training data (sometimes OK) or hallucinate entirely incorrect details (definitely bad)." Willison notes that while citing sources helps verify accuracy, building a system that does it well "can be quite tricky," but Citations appears to be a step in the right direction by building RAG capability directly into the model. Anthropic's Alex Albert clarifies that Claude has been trained to cite sources for a while now. What's new with Citations is that "we are exposing this ability to devs." He continued: "To use Citations, users can pass a new 'citations [...]' parameter on any document type they send through the API."

Anthropic Builds RAG Directly Into Claude Models With New Citations API

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 22 Comments Log In/Create an Account

Comments Filter:

- Re: (Score:2)
  
  by martin-boundary ( 547041 ) writes:
  
  Which edition?
  Checkmate, protestants.
  - Re: (Score:1)
    
    by Iamthecheese ( 1264298 ) writes:
    
    By which palpal infallibility? Checkmate, Hussites.
so, basically (Score:1)

by Mr. Dollar Ton ( 5495648 ) writes:

A search engine that pastes the results into an open Word document?
Meh, people were doing that with perl quite well three decades ago already.
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  Meh, people were doing that with perl quite well three decades ago already.
  ummmm, still am, thank you ;)
  - Re: (Score:2)
    
    by Mr. Dollar Ton ( 5495648 ) writes:
    
    Not fixing what ain't broken is an alpha male non-move. As a fellow aIpha, I can't disapprove.
I like this (Score:3)

by alvinrod ( 889928 ) writes: on Monday January 27, 2025 @08:27PM (#65123709)

Even if it's not perfect this is the direction that I want to see AI LLMs head down. Humans hallucinate things all the time due to faults in memory or other cognitive processes, but citations give other humans a tool to verify and have greater confidence in claims. The final step is for AI to be able to reconcile different or even opposing claims and to be able to cite or attribute them correctly. The final (and hardest step) will be for AI to employ the scientific method to devise an experiment which can discriminate between competing hypotheses or engage in logical reasoning to produce proofs for new theorems in the manner that humanity had advanced knowledge.

There ought to be some other steps in there (the last one really is a doozy) and I don't know how easy it will be able to go from one to the next, but this one seems to me to be headed in the right direction.

- Re: (Score:3)
  
  by DamnOregonian ( 963763 ) writes:
  
  There ought to be some other steps in there (the last one really is a doozy) and I don't know how easy it will be able to go from one to the next, but this one seems to me to be headed in the right direction.
  Having spent a weekend playing with DeekSeek R1, I don't think that's far away at all.
  You can train these things to reason, and they do it well. They show their work.
  Training for additional reasoning strategies is certainly doable. The evolution continues.
  - Re: (Score:2)
    
    by postbigbang ( 761081 ) writes:
    
    Still no one is auditing the results for hallucinations.
    Merely citing source docs doesn't mean they exist. Other models cough lots of imagined source docs. Worse, sometimes the actual docs are part of the discredited research paper mills that are rife with bad data-- even imagined data.
    There's no disciplining basis for models that has as a prerequisite, an autonomous auditing mechanism that can score accuracy of citation, both in existence, reference, and answer-fit.
    Your experience seems good, but it's anec
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      I think everyone agrees they need some kind of strategy to catch hallucinations.
      R1 is already a step in that direction- I've witnessed it catch and squelch its own hallucinations many times within its thinking tags.
      
      It'll emit some claim, and the next paragraph say something like "But wait, that doesn't make sense..."
      
      I've still caught it make some pretty basic errors, even in its reasoning, but overall- it's quite good.
      
      To anyone who feels compelled to make uninformed comments about all of this, start
      - Re: (Score:2)
        
        by postbigbang ( 761081 ) writes:
        
        But this is a basic fallacy.
        Imagine you've written what's designed to be a factual article.
        Or four thousand lines of C or Py.
        Of a human, I'd expect a review before submission or push that would catch and correct or clarify to prevent errors. Hallucinations and conceptual error artifacts are 100% not appropriate for intelligence, human or artificial.
        It's a nice fact that the DeepSeek R1 tries to limit this, but this should be the very premise of HI, or AI. Regression testing of code (as an example) went out
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Unsure how you think it's a fallacy.
        
        You think humans are free from conceptual errors and hallucinations?
        
        Re: (Score:2)
        
        by postbigbang ( 761081 ) writes:
        
        Humans use audit, peer review, and other mechanisms as a sanity check.
        It reminds me of Elvis Costello and Watching The Detective.
        Referential integrity is a step that isn't self-satisfied. Whatever the concept, it's the output that's in question. How it's arrived at, the output, can be correct despite the logic used to conceive it, even inventive/hallucinatory logic. George Boole would be proud.
        Yet consistency and the ability to audit for assured consistent reliable output, requires trustworthyness. There ar
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        I'd argue that you're setting a standard that the average human themself doesn't satisfy.
        I don't think we can say that lack of conceptual errors, or an ability to audit ones own thoughts is a "necessity" of intelligence.
        Those are taught processes. Humans will gladly cut the heart out of a child to make their crops grow, even though the statistics stare them in the face- it does nothing.
        We are not good reasoners, naturally.
        And yet- here we are.
        
        Re: (Score:2)
        
        by postbigbang ( 761081 ) writes:
        
        No, we disagree along the way, a values gulf that's unlikely to be surmounted. Quality is everything. Humans do not cut the heart out of a child to make their crops grow. I suspect you're a bot lacking innate sense of morality. And certainly not an Oregonian.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Read, and weep [wikipedia.org]
        You grossly overestimate human intelligence- particularly on average.
    - Re: (Score:2)
      
      by Local ID10T ( 790134 ) writes:
      
      Still no one is auditing the results for hallucinations.
      That is a user function. It is up to you to review any documents you generate with the tool.
- Re: (Score:2)
  
  by kmoser ( 1469707 ) writes:
  
  It's only a matter of time before LLMs are built to not hallucinate or lie, at which point your government will declare them to be an enemy of the state for being part of the intelligentsia.
Not really (Score:2)

by cstacy ( 534252 ) writes:

This new feature doesn't do what most people seem to think. It doesn't affect the model in general, just your immediate input. It's still going to hallucinate like crazy.
It might be a positive step in using appearances to trick people that the model doesn't hallucinate, and that it understands what it's saying, and that it has citable sources. None of which is true.
Granularity of the model (Score:2)

by cstacy ( 534252 ) writes:

The model will still hallucinate as always. It's just that when it is your own input during a conversation, it will be able to show you where it copied a sentence you gave it. The feature does not do anything about the actual model (all those gazillion training inputs and the weights that comprise the system).
The reason they can't have these models keep track of where it got its "ideas" is that the fundamental process is to slice up words (not sentences) into (essentially) phonemes, and put them into a blen
Bad Move, imho. (Score:1)

by business_kid ( 973043 ) writes:

AI was/is all fine when it provides an answer, ande you are gioven the illusion it knows or can find out stuff.

Citations is going to lay bare what BS is being quoted. In many cases it will undermine the 'authority' of the AI model if you don't respect it's source. Do AI models read https://retractionwatch.com [retractionwatch.com]? Am I going to be quoted any nutcase as an authority simply because his one sane viewpoint aligns? Isn't the Internet the globe's largest open sewer?
Sounds like plagiarism. (Score:2)

by Fly Swatter ( 30498 ) writes:

But since it happens on the internet, that's ok.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Anthropic Builds RAG Directly Into Claude Models With New Citations API (arstechnica.com) 22

Anthropic Builds RAG Directly Into Claude Models With New Citations API More Login

Anthropic Builds RAG Directly Into Claude Models With New Citations API

Re: (Score:2)

Re: (Score:1)

so, basically (Score:1)

Re: (Score:2)

Re: (Score:2)

I like this (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Not really (Score:2)

Granularity of the model (Score:2)

Bad Move, imho. (Score:1)

Sounds like plagiarism. (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot