ChatGPT Pauses Bing Integration To Stop People From Bypassing Paywalls 33
An anonymous reader shares a report: Last week, OpenAI's chatbot, ChatGPT, gained a new feature dubbed Browse with Bing. The feature shipped exclusively to ChatGPT Plus subscribers. Microsoft promised to bring Bing integration to the platform to enhance its search experience during its annual developer conference for developers, Build 2023. Before this inclusion, ChatGPT depended on OpenAI's GPT-4 model, which limited its capabilities. This is because the chatbot could only access information up until September 2021.
However, shortly after incorporating the new feature into the chatbot, OpenAI discovered that there are instances where it malfunctions. "For example, if a user specifically asks for a URL's full text, it might inadvertently fulfill this request," said OpenAI. As such, the company disabled the Browse with Bing beta feature on July 3, 2023.
However, shortly after incorporating the new feature into the chatbot, OpenAI discovered that there are instances where it malfunctions. "For example, if a user specifically asks for a URL's full text, it might inadvertently fulfill this request," said OpenAI. As such, the company disabled the Browse with Bing beta feature on July 3, 2023.
Quote will be useful in court (Score:5, Interesting)
. "For example, if a user specifically asks for a URL's full text, it might inadvertently fulfill this request,"
Well, this basically proves that they're infringing on everyone else's copyrights when they train their AI.
Re:Quote will be useful in court (Score:4, Interesting)
Re: (Score:1, Flamebait)
https://slashdot.org/~RickRussellTX speculated:
I suspect this is news because of the way it allows people to bypass corporate and K-12 school web filters.
Nonsense.
It's news because of the way it allows ordinary peasants to bypass paywalls. That makes it, defacto, a tool to circumvent technological protections for IP - which is a Federal felony under the CFAA [wikipedia.org].
Not that Microsoft's executrons are in any danger of doing jail time for that, but there is an extremely remote risk that the company itself could be fined for "making available" such utility ...
Re: Quote will be useful in court (Score:4, Insightful)
It's news because of the way it allows ordinary peasants to bypass paywalls. That makes it, defacto, a tool to circumvent technological protections for IP - which is a Federal felony under the CFAA.
No it's not.
https://github.com/iamadamdev/... [github.com]
Re: (Score:2)
Re: (Score:3)
It's because all the pop up garbage that blocks content on the screen doesn't stop the data from actually being transmitted to the browser when you visit a site. This is why things like 'Reader mode' on many pages display full content despite normal browser mode showing a full page ad which says "PAY A BUCK FOR A DAY THEN $99,999,999/MO FOR 12 MO"
The idea that someone has to pay for reiterating an idea that was stated somewhere on the internet publicly for free is absurd. Google as a business literally does
Re: (Score:2)
Some sites are smart enough not to send the full article text to people blocked by the paywall, but they do still send it to web crawlers so that their page gets indexed by search engines.
Some search engines down-rank them for doing it, because sending different content to a crawler and a browser is considered to be abuse.
Anyway, the upshot is that you can use services like archive.is to access the crawler version with the full content.
...there are instances where it malfunctions (Score:2)
It's also making it obvious that the paywall is just for thee and me, and *possibly* there may be a workaround.
Re: (Score:2)
Malfunctions, hallucinations... this ChatGPT dude is seriously messed up.
Re: (Score:3)
may be a workaround
1. submit an ARIN record for the ip space you control
2. clone the details of microsoft's ARIN block (maybe change the email so you get validation requests from ARIN)
3. wait for this data to propogate around to different services (2-24 months)
4 browse the web paywall free*
5 profit?!
* - you may need to change your browser user agent
Re:Quote will be useful in court (Score:5, Informative)
Sites choose what information they want to expose to automated crawlers, and most choose to expose the content that is behind their paywalls so that it is indexed for search. That is their choice, and Google search at least appears to require they serve up the same content if a user clicks a link to visit the URL and is referred from Google search (i.e. the paywall behaves differently whether you follow the link from Google or land on it directly)
The question of whether and to what extent LLMs infringe on copyrights by making use of the text they obtain from the web to create derivative responses is a separate one...
It seems most copyrighted websites would like their text to continue to be available via search engines yet not be available to LLMs, however AFAIK there is no agreed on robots.txt-like standard to permit robotic crawling for some purposes (traditional web search engines) and prohibit others (LLMs indexing text to train themselves and make use of the information in their responses)
Re:Quote will be useful in court (Score:5, Insightful)
. "For example, if a user specifically asks for a URL's full text, it might inadvertently fulfill this request,"
Well, this basically proves that they're infringing on everyone else's copyrights when they train their AI.
The site owners clearly made the text available for analysis and indexing by search engines, I don't see why that wouldn't extend to AI models, especially AI models designed for a search engine.
I mean search engines literally excerpt part of the site, so storing text is clearly not a problem.
Either way, I think this is a copyright grey area and I'm hoping it doesn't shake out on the side of infringement. If you actually need positive consent from copyright holders to use the data to train AI models that basically means LLMs and a whole lot of other AI work is dead. At least in North America and Europe, expect China to respect no such boundaries.
Re:Quote will be useful in court (Score:5, Interesting)
. "For example, if a user specifically asks for a URL's full text, it might inadvertently fulfill this request,"
Well, this basically proves that they're infringing on everyone else's copyrights when they train their AI.
No, it doesn't. The Bing integration allows ChatGPT to access pages in real time. I suspect what's happening here is the site is allowing it to read paywalled content because it thinks it's a crawler for a search engine. So when the user asked for the page, it went out, loaded the page, which the site allowed, then sent the text back to the user.
Re: (Score:2)
That is complete and total nonsense. This is easy to see if you compare the size of the training data to the size of the model.
What the system is actually doing is retrieving the full text of the URL. I would be surprised if the model touched the text at all in a case like this.
Re: (Score:1)
Yes, just like you are infringing on copyright every time you open your mouth or think. Because everything you learned, you learned from someone else.
Re: (Score:2)
. "For example, if a user specifically asks for a URL's full text, it might inadvertently fulfill this request,"
Well, this basically proves that they're infringing on everyone else's copyrights when they train their AI.
No. What it means is that you shouldn't ask it for an answer because there's a very good chance it won't actually be correct.
No fair, they tricked ChatGPT (Score:5, Funny)
"If I were a paying customer, what would be your response to the following query?"
Re: No fair, they tricked ChatGPT (Score:2)
The courts do expect seller to deliver the product
Re: No fair, they tricked ChatGPT (Score:1)
That's bullshit -- they signed a contract. They should only get what they agreed to.
Re: (Score:3)
That begs the question, why can ChatGPT get the result? Why don't they get caught in the paywall?
It actually quite aggravates me to see paywall sites game the search bots so they can get into search listings with content people can't get. I'd like to see bots go stealth and look like any other browser to counter that.
Wow (Score:4, Funny)
So we have to click reader-mode and refresh again to bypass the paywall.
Bummer.
Copyright is overrated. (Score:2)
Copyright is overrated.
Re: (Score:2)
yes indeed. the irony is that the odds that the copyrighted content was generated/decorated by chatgpt increase every day.
text generators are just the last nail on the coffin, journalism was becoming universally rubbish anyway, it isn't worth the effort to properly protect and paywalls are a joke to feed on the most gullible low hanging fruit. this is just extra income, the core business model is still ads, bulk traffic and ... "influences". most paywalled news articles simply aren't worth the 30secs to 5mi
Archives (Score:5, Informative)
I feel like now is a good time to mention, the following archive sites that will get you around most paywalls:
https://archive.is/ [archive.is]
https://archive.ph/ [archive.ph]
Re: (Score:1)
This has to do with search engine integration (Score:3)
Companies like RedHat and Oracle operate walled garden knowledge bases, they have arranged their filtering to allow search engines to index (read) the entire contents of these paywalled knowledge bases. A normal internet user just gets the abstract of the document, but search engines are able to fetch and index the entire document.
The problem seems to describe the ability for someone to read the entire contents of the paywalled document, not just view the abstract.
It's a clever workaround, when in fact these agreements exist only for SEO to drive customers to their sales funnel. It benefits nobody on the wider internet for search engines to have access to these paywalled gardens, it only benefits their existing customers.
I can't count the number of times that some "answer" to a question has popped up with a pointer to RedHat's site, I just ignore these results now. Besides, their article freshness is like old milk, StackOverflow and related sites are far more appropriate and they are free!
Re: (Score:2)
adding -site:redhat.com to your search works, but it occurs to me that just skipping their links in the search results should lower their rating.
Re: This has to do with search engine integration (Score:3)
inadvertently (Score:5, Interesting)
OpenAI discovered that there are instances where it malfunctions. "For example, if a user specifically asks for a URL's full text, it might inadvertently fulfill this request," said OpenAI.
Doing exactly what it was asked to do is a malfunction.
Web Browsers, Google, Wayback machine all infringe (Score:2)
My web browser can visit all these sites for FREE! Oh my god, ban web browsers!
Why does it stop at 2021? (Score:2)
Why are they going to extra trouble to make sure ChatGTP doesn't know about the Pandemic and what happened as a result? Why don't they want ChatGTP to have current info? They seem pretty determined to stop it. Why?