>Carbon Health claims 88 percent of the verbiage can be accepted without edits
Yeah, I buy that. Then again, this is perhaps the situation where I *least* want to encounter the 12% of the verbiage where it hallucinates wildly.
I sometimes use GPT-4 to summarize corporate meetings, and yeah, 88% accuracy feels about right. It once hallucinated that our CEO opened the meeting by discussing the sacred indiginous land upon which our campus was situated, though. Which is inaccurate in about four different ways.
"She's allergic to sulfa drugs," "She's not allergic to sulfa drugs..." hey, they're both valid sentences that can be produced by language. That's the goal of contemporary LLMs, after all.