Firefox

Firefox 128 Criticized for Including Small Test of 'Privacy-Preserving' Ad Tech by Default (itsfoss.com) 57

"Many people over the past few days have been lashing out at Mozilla," writes the blog Its FOSS, "for enabling Privacy-Preserving Attribution by default on Firefox 128, and the lack of publicity surrounding its introduction."

Mozilla responded that the feature will only run "on a few sites in the U.S. under strict supervision" — adding that users can disable it at any time ("because this is a test"), and that it's only even enabled if telemetry is also enabled.

And they also emphasize that it's "not tracking." The way it works is there's an "aggregation service" that can periodically send advertisers a summary of ad-related actions — again, aggregated data, from a mass of many other users. (And Mozilla says that aggregated summary even includes "noise that provides differential privacy.") This Privacy-Preserving Attribution concept "does not involve sending information about your browsing activities to anyone... Advertisers only receive aggregate information that answers basic questions about the effectiveness of their advertising."

More from It's FOSS: Even though Mozilla mentioned that PPA would be enabled by default on Firefox 128 in a few of its past blog posts, they failed to communicate this decision clearly, to a wider audience... In response to the public outcry, Firefox CTO, Bobby Holley, had to step in to clarify what was going on.

He started with how the internet has become a massive cesspool of surveillance, and doing something about it was the primary reason many people are part of Mozilla. He then expanded on their approach with Firefox, which, historically speaking, has been to ship a browser with anti-tracking features baked in to tackle the most common surveillance techniques. But, there were two limitations with this approach. One was that advertisers would try to bypass these countermeasures. The second, most users just accept the default options that they are shown...

Bas Schouten, Principal Software Engineer at Mozilla, made it clear at the end of a heated Mastodon thread that "[opt-in features are] making privacy a privilege for the people that work to inform and educate themselves on the topic. People shouldn't need to do that, everyone deserves a more private browser. Privacy features, in Firefox, are not meant to be opt-in. They need to be the default.

"If you are 'completely anti-ads' (i.e. even if their implementation is private), you probably use an ad blocker. So are unaffected by this."

This has already provoked a discussion among Slashdot readers. "It doesn't seem that evil to me," argues Slashdot reader geekprime. "Seems like the elimination of cross site cookies is a privacy enhancing idea." (They cite Mozilla's statement that their goal is "to inform an emerging Web standard designed to help sites understand how their ads perform without collecting data about individual people. By offering sites a non-invasive alternative to cross-site tracking, we hope to achieve a significant reduction in this harmful practice across the web.")

But Slashdot reader TheNameOfNick disagrees. "How realistic is the part where advertisers stop tracking you because they get less information from the browser maker...?"

Mozilla has provided simple instructions for disabling the feature:
  • Click the menu button and select Settings.
  • In the Privacy & Security panel, find the Website Advertising Preferences section.
  • Uncheck the box labeled Allow websites to perform privacy-preserving ad measurement.

The Internet

The Data That Powers AI Is Disappearing Fast (nytimes.com) 93

An anonymous reader quotes a report from the New York Times: For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models. Now, that data is drying up. Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group. The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an "emerging crisis in consent," as publishers and online platforms have taken steps to prevent their data from being harvested.

The researchers estimate that in the three data sets -- called C4, RefinedWeb and Dolma -- 5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted. Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt. The study also found that as much as 45 percent of the data in one set, C4, had been restricted by websites' terms of service. "We're seeing a rapid decline in consent to use data across the web that will have ramifications not just for A.I. companies, but for researchers, academics and noncommercial entities," said Shayne Longpre, the study's lead author, in an interview.

AI

It May Soon Be Legal To Jailbreak AI To Expose How It Works (404media.co) 26

An anonymous reader quotes a report from 404 Media: A group of researchers, academics, and hackers are trying to make it easier to break AI companies' terms of service to conduct "good faith research" that exposes biases, inaccuracies, and training data without fear of being sued. The U.S. government is currently considering an exemption to U.S. copyright law that would allow people to break technical protection measures and digital rights management (DRM) on AI systems to learn more about how they work, probe them for bias, discrimination, harmful and inaccurate outputs, and to learn more about the data they are trained on. The exemption would allow for "good faith" security and academic research and "red-teaming" of AI products even if the researcher had to circumvent systems designed to prevent that research. The proposed exemption has the support of the Department of Justice, which said "good faith research can help reveal unintended or undisclosed collection or exposure of sensitive personal data, or identify systems whose operations or outputs are unsafe, inaccurate, or ineffective for the uses for which they are intended or marketed by developers, or employed by end users. Such research can be especially significant when AI platforms are used for particularly important purposes, where unintended, inaccurate, or unpredictable AI output can result in serious harm to individuals."

Much of what we know about how closed-sourced AI tools like ChatGPT, Midjourney, and others work are from researchers, journalists, and ordinary users purposefully trying to trick these systems into revealing something about the data they were trained on (which often includes copyrighted material indiscriminately and secretly scraped from the internet), its biases, and its weaknesses. Doing this type of research can often violate the terms of service users agree to when they sign up for a system. For example, OpenAI's terms of service state that users cannot "attempt to or assist anyone to reverse engineer, decompile or discover the source code or underlying components of our Services, including our models, algorithms, or systems (except to the extent this restriction is prohibited by applicable law)," and adds that users must not "circumvent any rate limits or restrictions or bypass any protective measures or safety mitigations we put on our Services."

Shayne Longpre, an MIT researcher who is part of the team pushing for the exemption, told me that "there is a lot of apprehensiveness about these models and their design, their biases, being used for discrimination, and, broadly, their trustworthiness." "But the ecosystem of researchers looking into this isn't super healthy. There are people doing the work but a lot of people are getting their accounts suspended for doing good-faith research, or they are worried about potential legal ramifications of violating terms of service," he added. "These terms of service have chilling effects on research, and companies aren't very transparent about their process for enforcing terms of service." The exemption would be to Section 1201 of the Digital Millennium Copyright Act, a sweeping copyright law. Other 1201 exemptions, which must be applied for and renewed every three years as part of a process through the Library of Congress, allow for the hacking of tractors and electronic devices for the purpose of repair, have carveouts that protect security researchers who are trying to find bugs and vulnerabilities, and in certain cases protect people who are trying to archive or preserve specific types of content.
Harley Geiger of the Hacking Policy Council said that an exemption is "crucial to identifying and fixing algorithmic flaws to prevent harm or disruption," and added that a "lack of clear legal protection under DMCA Section 1201 adversely affect such research."
Links

Google URL Shortener Links Will Return a 404 Response 39

In 2018, Google replaced its URL shortener service, goo.gl, with Firebase Dynamic Links, citing "the changes we've seen in how people find content on the internet, and the number of new popular URL shortening services that emerged in that time." Although it stopped accepting new URLs to shorten, it continued to serve existing URLs that used their service. That's about to change on August 25th, 2025, when Google will turn off the service portion of Google URL Shortener.

"Any developers using links built with the Google URL Shortener in the form https://goo.gl/* will be impacted, and these URLs will no longer return a response after August 25th, 2025," says Google in a blog post today. "Starting August 23, 2024, goo.gl links will start displaying an interstitial page for a percentage of existing links notifying your users that the link will no longer be supported after August 25th, 2025 prior to navigating to the original target page. Over time the percentage of links that will show the interstitial page will increase until the shutdown date." All links will return a 404 response after the shutdown date.
Privacy

USPS Shared Customers Postal Addresses With Meta, LinkedIn and Snap (techcrunch.com) 25

An anonymous reader quotes a report from TechCrunch: The U.S. Postal Service was sharing the postal addresses of its online customers with advertising and tech giants Meta, LinkedIn and Snap, TechCrunch has found. On Wednesday, the USPS said it addressed the issue and stopped the practice, claiming that it was "unaware" of it. TechCrunch found USPS was sharing customers' information by way of hidden data-collecting code (also known as tracking pixels) used across its website. Tech and advertising companies create this kind of code to collect information about the user -- such as which pages they visit -- every time a webpage containing the code loads in the customer's browser.

In the case of USPS, some of that collected data included the postal addresses of logged-in USPS Informed Delivery customers, who use the service to see photos of their incoming mail before it arrives. It's not clear how many individuals had their information collected or for how long. Informed Delivery had more than 62 million users (PDF) as of March 2024. [...] The code also collected other data, such as information about the user's computer type and browser, which appeared as partly pseudonymized -- essentially scrambled in a way that makes it more difficult for humans to know where data came from, or who it relates to, by using randomized identifiers in place of real customer names. But researchers have long warned that pseudonymous data can still be used to re-identify seemingly anonymous individuals.

TechCrunch also found that tracking numbers entered into the USPS website were also shared with advertisers and tech companies, including Bing, Google, LinkedIn, Pinterest and Snap. Some in-transit tracking data was also shared, such as the real-world location of the mail in the postal system, even if the customer was not logged in to USPS' website.
USPS spokesperson Jim McKean said in a statement: "The Postal Service leverages an analytics platform for our own internal purposes, so that we understand the usage of our products and services and which we use on an aggregated basis to market our products. The Postal Service does not sell or provide any personal information that is collected from this analytics platform to any third party, and we were unaware of any configuration of the platform that collected personal information from the URL and that shared it without our knowledge with social media."

"We have taken immediate action to remediate this issue," the spokesperson added, without saying what action was taken.
The Internet

Cloudflare Reports Almost 7% of Internet Traffic Is Malicious (zdnet.com) 34

In its latest State of Application Security Report, Cloudflare says 6.8% of traffic on the internet is malicious, "up a percentage point from last year's study," writes ZDNet's Steven Vaughan-Nichols. "Cloudflare, the content delivery network and security services company, thinks the rise is due to wars and elections. For example, many attacks against Western-interest websites are coming from pro-Russian hacktivist groups such as REvil, KillNet, and Anonymous Sudan." From the report: [...] Distributed Denial of Service (DDoS) attacks continue to be cybercriminals' weapon of choice, making up over 37% of all mitigated traffic. The scale of these attacks is staggering. In the first quarter of 2024 alone, Cloudflare blocked 4.5 million unique DDoS attacks. That total is nearly a third of all the DDoS attacks they mitigated the previous year. But it's not just about the sheer volume of DDoS attacks. The sophistication of these attacks is increasing, too. Last August, Cloudflare mitigated a massive HTTP/2 Rapid Reset DDoS attack that peaked at 201 million requests per second (RPS). That number is three times bigger than any previously observed attack.

The report also highlights the increased importance of application programming interface (API) security. With 60% of dynamic web traffic now API-related, these interfaces are a prime target for attackers. API traffic is growing twice as fast as traditional web traffic. What's worrying is that many organizations appear not to be even aware of a quarter of their API endpoints. Organizations that don't have a tight grip on their internet services or website APIs can't possibly protect themselves from attackers. Evidence suggests the average enterprise application now uses 47 third-party scripts and connects to nearly 50 third-party destinations. Do you know and trust these scripts and connections? You should -- each script of connection is a potential security risk. For instance, the recent Polyfill.io JavaScript incident affected over 380,000 sites.

Finally, about 38% of all HTTP requests processed by Cloudflare are classified as automated bot traffic. Some bots are good and perform a needed service, such as customer service chatbots, or are authorized search engine crawlers. However, as many as 93% of bots are potentially bad.

The Courts

Federal Court Blocks Net Neutrality Rules (theverge.com) 54

An anonymous reader quotes a report from The Verge: A federal appeals court has agreed to halt the reinstatement of net neutrality rules until August 5th, while the court considers whether more permanent action is justified. It's the latest setback in a long back and forth on net neutrality -- the principle that internet service providers (ISPs) should not be able to block or throttle internet traffic in a discriminatory manner. The Federal Communications Commission has sought to achieve this by reclassifying ISPs under Title II of the Communications Act, which gives the agency greater regulatory oversight. The Democratic-led agency enacted net neutrality rules under the Obama administration, only for those rules to be repealed under former President Donald Trump's FCC. The current FCC, which has three Democratic and two Republican commissioners, voted in April to bring back net neutrality. The 3-2 vote was divided along party lines.

Broadband providers have since challenged the FCC's action, which is potentially more vulnerable after the Supreme Court's recent decision to strike down Chevron deference -- a legal doctrine that instructed courts to defer to an agency's expert decisions except in a very narrow range of circumstances. Bloomberg Intelligence analyst Matt Schettenhelm said in a report prior to the court's ruling that he doesn't expect the FCC to prevail in court, in large part due to the demise of Chevron. A panel of judges for the Sixth Circuit Court of Appeals said in an order that a temporary "administrative stay is warranted" while it considers the merits of the broadband providers' request for a permanent stay. The administrative stay will be in place until August 5th. In the meantime, the court requested the parties provide additional briefs about the application of National Cable & Telecommunications Association v. Brand X Internet Services to this lawsuit.

The Internet

NYC's Massive Link5G Towers Aren't Actually Providing 5G (gothamist.com) 33

An anonymous reader shares a report: The vast majority of the massive, metallic towers the city commissioned to help low-income neighborhoods access high-speed 5G internet still lack cell signal equipment -- more than two years after hundreds of the structures began sprouting across the five boroughs. Just two of the nearly 200 Link5G towers installed by tech firm CityBridge since 2022 have been fitted with 5G equipment, company officials said. Delayed installations and cooling enthusiasm around 5G technology have discouraged carriers like Verizon from using the towers to build out their networks, experts say. The firm only has an agreement with a single telecommunications carrier to deliver high-speed internet, stymieing its efforts to boost mobile connectivity citywide.

The 32-foot-tall structures, which resemble giant tampon applicators emerging from the sidewalk, offer the same services as the LinkNYC electronic billboards that popped up around the city in 2016. Those were also installed by CityBridge. Both the original Link kiosks and the 5G towers provide free limited-range Wi-Fi, charging outlets and a tablet to connect users to city services. Data shared by the company shows that 16 million people have used the internet at kiosks since 2016, and the attached tablets are used to call for city services thousands of times each month. But unlike the LinkNYC kiosks, each new tower is topped with a 12-foot-tall cylindrical mesh chamber containing five empty shelves reserved for companies like Verizon and T-Mobile to store the equipment they use to transmit high-speed 5G internet service to paying customers.

Microsoft

Palestinians Say Microsoft Unfairly Closing Their Accounts (bbc.co.uk) 184

Ancient Slashdot reader Alain Williams writes: Palestinians living abroad have accused Microsoft of closing their email accounts without warning -- cutting them off from crucial online services. They say it has left them unable to access bank accounts and job offers -- and stopped them using Skype, which Microsoft owns, to contact relatives in war-torn Gaza. Microsoft says they violated its terms of service -- a claim they dispute. He also said being cut off from Skype was a huge blow for his family. The internet is frequently disrupted or switched off there because of the Israeli military campaign - and standard international calls are very expensive. [...] With a paid Skype subscription, it is possible to call mobiles in Gaza cheaply -- and while the internet is down -- so it has become a lifeline to many Palestinians.

Some of the people the BBC spoke to said they suspected they were wrongly thought to have ties to Hamas, which Israel is fighting, and is designated a terrorist organization by many countries. Microsoft did not respond directly when asked if suspected ties to Hamas were the reason for the accounts being shut. But a spokesperson said it did not block calls or ban users based on calling region or destination. "Blocking in Skype can occur in response to suspected fraudulent activity," they said, without elaborating.

Social Networks

In a First, Federal Regulators Ban Messaging App From Hosting Minors (washingtonpost.com) 15

An anonymous reader quotes a report from the Washington Post: Federal regulators have for the first time banned a digital platform from serving users under 18 (Warning: source may be paywalled; alternative source), accusing the app -- known as NGL -- of exaggerating its ability to use artificial intelligence to curb cyberbullying in a groundbreaking settlement. Anapp popular among children and teens, NGL aggressively marketed to young users despite risks of bullying on the anonymous messaging site, the Federal Trade Commission and the Los Angeles District Attorney's Office alleged in a complaint unveiled Tuesday.

The complaint alleged that NGL tricked users into paying for subscriptions by sending them computer-generated messages appearing to be from real people and offering a service for as much as $9.99 a week to find out their real identity. People who signed up received only "hints" of those identities, whether they were real or not, enforcers said. After users complained about the "bait-and switch tactic," executives at the company "laughed off" their concerns, referring to them as "suckers," the FTC said in an announcement. NGL, internet shorthand for "not gonna lie," agreed to pay $5 million and stop marketing to kids and teens to settle the lawsuit, which also alleged that the company violated children's privacy laws by collecting data from youths under 13 without parental consent.

The settlement marks a major milestone in the federal government's efforts to tackle concerns that tech platforms are exposing children to noxious material and profiting from it. And it's one of the most significant actions by the FTC under Chair Lina Khan, who has dialed up scrutiny of the tech sector at the agency since taking over in 2021. "We will keep cracking down on businesses that unlawfully exploit kids for profit," Khan (D) said in a statement.
NGL co-founder Joao Figueiredo said in a statement Tuesday that the company cooperated with the FTC's investigation for nearly two years and viewed the "resolution as an opportunity to make NGL better than ever."

"While we believe many of the allegations around the youth of our user base are factually incorrect, we anticipate that the agreed upon age-gating and other procedures will now provide direction for others in our space, and hopefully improve policies generally."
United States

US Nuke Agency Buys Internet Backbone Data (404media.co) 24

A U.S. government agency tasked with supporting the nation's nuclear deterrence capability has bought access to a data tool that claims to cover more than 90 percent of the world's internet traffic, and can in some cases let users trace activity through virtual private networks, according to documents obtained by 404 Media. From the report: The documents provide more insight into the use cases and customers of so-called netflow data, which can show which server communicated with another, information that is ordinarily only available to the server's owner, or the internet service provider (ISP) handling the traffic. Other agencies that have purchased the data include the U.S. Army, NCIS, FBI, IRS, with some government clients saying it would take too long to get data from the NSA, so they bought this tool instead. In this case, the Defense Threat Reduction Agency (DTRA) says it is using the data to perform vulnerability assessments of U.S. and allied systems.

A document written by the DTRA and obtained by 404 Media says the agency "has a requirement to support ongoing assessments of the vulnerability of critical U.S. and allied national/theater mission systems, networks, architectures, infrastructures, and assets." The tool "is capable of following communications between servers, even private servers," which allows the agency to identify infrastructure used by malicious actors, the document continues. That contract was for $490,000 in 2023, according to the document. 404 Media obtained the document and others under a Freedom of Information Act (FOIA) request.

The Internet

Substack Rival Ghost Federates Its First Newsletter (techcrunch.com) 16

After teasing support for the fediverse earlier this year, the newsletter platform and Substack rival Ghost has finally delivered. "Over the past few days, Ghost says it has achieved two major milestones in its move to become a federated service," reports TechCrunch. "Of note, it has federated its own newsletter, making it the first federated Ghost instance on the internet." From the report: Users can follow the newsletter through their preferred federated app at @index@activitypub.ghost.org, though the company warns there will be bugs and issues as it continues to work on the platform's integration with ActivityPub, the protocol that powers Mastodon and other federated apps. "Having multiple Ghost instances in production successfully running ActivityPub is a huge milestone for us because it means that for the first time, we're interacting with the wider fediverse. Not just theoretical local implementations and tests, but the real world wide social web," the company shared in its announcement of the news.

In addition, Ghost's ActivityPub GitHub repository is now fully open source. That means those interested in tracking Ghost's progress toward federation can follow its code changes in real time, and anyone else can learn from, modify, distribute or contribute to its work. Developers who want to collaborate with Ghost are also being invited to get involved following this move. By offering a federated version of the newsletter, readers will have more choices on how they want to subscribe. That is, instead of only being able to follow the newsletter via email or the web, they also can track it using RSS or ActivityPub-powered apps, like Mastodon and others. Ghost said it will also develop a way for sites with paid subscribers to manage access via ActivityPub, but that functionality hasn't yet rolled out with this initial test.

The Internet

Internet Archive Blames 'Environmental Factors' For Overnight Outages (theregister.com) 14

The Internet Archive took a tumble overnight after "environmental factors" downed the Wayback Machine, leaving archive.org wobbling in a way that might bring a smile to the faces of certain publishers wishing for its demise. From a report: According to the organization, there was a "brief power outage in one of our datacenters," which was followed by "environmental factors," causing the service blackout. Those environmental factors are likely to be an increase in heat following a cooling outage. By this morning, The Internet Archive was reporting that things were back up and running again. However, some users (this writer included) are still experiencing the odd error or two when accessing the organization's services.
Anime

Popular Pirate Site Animeflix Shuts Down 'Voluntarily' (torrentfreak.com) 13

An anonymous reader quotes a report from TorrentFreak: With dozens of millions of monthly visits, Animeflix positioned itself as one of the most popular anime piracy portals. The site also has an active Discord community of around 35k members, who actively participate in discussions, art competitions, even a chess tournament. While rightsholders take no offense at these side-projects, the site's core business was streaming pirated videos. That hasn't gone unnoticed; last December Animeflix was listed as one of the shutdown targets of anti-piracy coalition ACE.

Whether these early enforcement efforts were responsible for the site's closure is unclear. In May, rightsholders increased the pressure through the High Court of India, obtaining a broad injunction that effectively suspended Animeflix's main domain name; Animeflix.live. This follow-up action didn't seem to hurt the site too much. It simply moved to new domains, Animeflix.gg and Animeflix.li, informing its users that the old domain name had become "unavailable." Yesterday, the site became unreachable again, initially returning a Cloudflare error message. This time, the domain wasn't the problem but, for reasons unknown, the team decided to shut down the site without prior notice.

"It is with a heavy heart that we announce the closure of Animeflix. After careful consideration, we have decided to shut down our service effective immediately. We deeply appreciate your support and enthusiasm over the years." "Thank you for being a part of our journey. We hope the joy and excitement of anime continue to brighten your days through other wonderful platforms," the Animeflix team adds. The Animeflix team doesn't provide any insight into its reasoning, but it's clear that keeping a site like that online isn't without challenges. And, when a pirate site shuts down, voluntarily or not, copyright issues typically play a role. It's clear that rightsholders were keeping an eye on the site, and were actively seeking out options to take it offline. That might have played a role in the shutdown decision but without more information from the team, we can only speculate.

Security

384,000 Sites Pull Code From Sketchy Code Library Recently Bought By Chinese Firm (arstechnica.com) 35

An anonymous reader quotes a report from Ars Technica: More than 384,000 websites are linking to a site that was caught last week performing a supply-chain attack that redirected visitors to malicious sites, researchers said. For years, the JavaScript code, hosted at polyfill[.]com, was a legitimate open source project that allowed older browsers to handle advanced functions that weren't natively supported. By linking to cdn.polyfill[.]io, websites could ensure that devices using legacy browsers could render content in newer formats. The free service was popular among websites because all they had to do was embed the link in their sites. The code hosted on the polyfill site did the rest. In February, China-based company Funnull acquired the domain and the GitHub account that hosted the JavaScript code. On June 25, researchers from security firm Sansec reported that code hosted on the polyfill domain had been changed to redirect users to adult- and gambling-themed websites. The code was deliberately designed to mask the redirections by performing them only at certain times of the day and only against visitors who met specific criteria.

The revelation prompted industry-wide calls to take action. Two days after the Sansec report was published, domain registrar Namecheap suspended the domain, a move that effectively prevented the malicious code from running on visitor devices. Even then, content delivery networks such as Cloudflare began automatically replacing pollyfill links with domains leading to safe mirror sites. Google blocked ads for sites embedding the Polyfill[.]io domain. The website blocker uBlock Origin added the domain to its filter list. And Andrew Betts, the original creator of Polyfill.io, urged website owners to remove links to the library immediately. As of Tuesday, exactly one week after malicious behavior came to light, 384,773 sites continued to link to the site, according to researchers from security firm Censys. Some of the sites were associated with mainstream companies including Hulu, Mercedes-Benz, and Warner Bros. and the federal government. The findings underscore the power of supply-chain attacks, which can spread malware to thousands or millions of people simply by infecting a common source they all rely on.

Google

Google Paper: AI Potentially Breaking Reality Is a Feature Not a Bug (404media.co) 82

An anonymous reader shares a report: Generative AI could "distort collective understanding of socio-political reality or scientific consensus," and in many cases is already doing that, according to a new research paper from Google, one of the biggest companies in the world building, deploying, and promoting generative AI. The paper, "Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data," [PDF] was co-authored by researchers at Google's artificial intelligence research laboratory DeepMind, its security think tank Jigsaw, and its charitable arm Google.org, and aims to classify the different ways generative AI tools are being misused by analyzing about 200 incidents of misuse as reported in the media and research papers between January 2023 and March 2024.

Unlike self-serving warnings from Open AI CEO Sam Altman or Elon Musk about the "existential risk" artificial general intelligence poses to humanity, Google's research focuses on real harm that generative AI is currently causing and could get worse in the future. Namely, that generative AI makes it very easy for anyone to flood the internet with generated text, audio, images, and videos. Much like another Google research paper about the dangers of generative AI I covered recently, Google's methodology here likely undercounts instances of AI-generated harm. But the most interesting observation in the paper is that the vast majority of these harms and how they "undermine public trust," as the researchers say, are often "neither overtly malicious nor explicitly violate these tools' content policies or terms of service." In other words, that type of content is a feature, not a bug.

The Internet

Cloudflare Rolls Out Feature For Blocking AI Companies' Web Scrapers (siliconangle.com) 40

Cloudflare today unveiled a new feature part of its content delivery network (CDN) that prevents AI developers from scraping content on the web. According to Cloudflare, the feature is available for both the free and paid tiers of its service. SiliconANGLE reports: The feature uses AI to detect automated content extraction attempts. According to Cloudflare, its software can spot bots that scrape content for LLM training projects even when they attempt to avoid detection. "Sadly, we've observed bot operators attempt to appear as though they are a real browser by using a spoofed user agent," Cloudflare engineers wrote in a blog post today. "We've monitored this activity over time, and we're proud to say that our global machine learning model has always recognized this activity as a bot."

One of the crawlers that Cloudflare managed to detect is a bot that collects content for Perplexity AI Inc., a well-funded search engine startup. Last month, Wired reported that the manner in which the bot scrapes websites makes its requests appear as regular user traffic. As a result, website operators have struggled to block Perplexity AI from using their content. Cloudflare assigns every website visit that its platform processes a score of 1 to 99. The lower the number, the greater the likelihood that the request was generated by a bot. According to the company, requests made by the bot that collects content for Perplexity AI consistently receive a score under 30.

"When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint," Cloudflare's engineers detailed. "For every fingerprint we see, we use Cloudflare's network, which sees over 57 million requests per second on average, to understand how much we should trust this fingerprint." Cloudflare will update the feature over time to address changes in AI scraping bots' technical fingerprints and the emergence of new crawlers. As part of the initiative, the company is rolling out a tool that will enable website operators to report any new bots they may encounter.

AI

AI Trains On Kids' Photos Even When Parents Use Strict Privacy Settings 33

An anonymous reader quotes a report from Ars Technica: Human Rights Watch (HRW) continues to reveal how photos of real children casually posted online years ago are being used to train AI models powering image generators -- even when platforms prohibit scraping and families use strict privacy settings. Last month, HRW researcher Hye Jung Han found 170 photos of Brazilian kids that were linked in LAION-5B, a popular AI dataset built from Common Crawl snapshots of the public web. Now, she has released a second report, flagging 190 photos of children from all of Australia's states and territories, including indigenous children who may be particularly vulnerable to harms. These photos are linked in the dataset "without the knowledge or consent of the children or their families." They span the entirety of childhood, making it possible for AI image generators to generate realistic deepfakes of real Australian children, Han's report said. Perhaps even more concerning, the URLs in the dataset sometimes reveal identifying information about children, including their names and locations where photos were shot, making it easy to track down children whose images might not otherwise be discoverable online. That puts children in danger of privacy and safety risks, Han said, and some parents thinking they've protected their kids' privacy online may not realize that these risks exist.

From a single link to one photo that showed "two boys, ages 3 and 4, grinning from ear to ear as they hold paintbrushes in front of a colorful mural," Han could trace "both children's full names and ages, and the name of the preschool they attend in Perth, in Western Australia." And perhaps most disturbingly, "information about these children does not appear to exist anywhere else on the Internet" -- suggesting that families were particularly cautious in shielding these boys' identities online. Stricter privacy settings were used in another image that Han found linked in the dataset. The photo showed "a close-up of two boys making funny faces, captured from a video posted on YouTube of teenagers celebrating" during the week after their final exams, Han reported. Whoever posted that YouTube video adjusted privacy settings so that it would be "unlisted" and would not appear in searches. Only someone with a link to the video was supposed to have access, but that didn't stop Common Crawl from archiving the image, nor did YouTube policies prohibiting AI scraping or harvesting of identifying information.

Reached for comment, YouTube's spokesperson, Jack Malon, told Ars that YouTube has "been clear that the unauthorized scraping of YouTube content is a violation of our Terms of Service, and we continue to take action against this type of abuse." But Han worries that even if YouTube did join efforts to remove images of children from the dataset, the damage has been done, since AI tools have already trained on them. That's why -- even more than parents need tech companies to up their game blocking AI training -- kids need regulators to intervene and stop training before it happens, Han's report said. Han's report comes a month before Australia is expected to release a reformed draft of the country's Privacy Act. Those reforms include a draft of Australia's first child data protection law, known as the Children's Online Privacy Code, but Han told Ars that even people involved in long-running discussions about reforms aren't "actually sure how much the government is going to announce in August." "Children in Australia are waiting with bated breath to see if the government will adopt protections for them," Han said, emphasizing in her report that "children should not have to live in fear that their photos might be stolen and weaponized against them."
United States

Will a US Supreme Court Ruling Put Net Neutrality at Risk? (msn.com) 192

Today the Wall Street Journal reported that restoring net neutrality to America is "on shakier legal footing after a Supreme Court decision on Friday shifted power away from federal agencies." "It's hard to overstate the impact that this ruling could have on the regulatory landscape in the United States going forward," said Leah Malone, a lawyer at Simpson Thacher & Bartlett. "This could really bind U.S. agencies in their efforts to write new rules." Now that [the "Chevron deference"] is gone, the Federal Communications Commission is expected to have a harder time reviving net neutrality — a set of policies barring internet-service providers from assigning priority to certain web traffic...

The Federal Communications Commission reclassified internet providers as public utilities under the Communications Act. There are pending court cases challenging the FCC's reinterpretation of that 1934 law, and the demise of Chevron deference heightens the odds of the agency losing in court, some legal experts said. "Chevron's thumb on the scale in favor of the agencies was crucial to their chances of success," said Geoffrey Manne, president of the International Center for Law and Economics. "Now that that's gone, their claims are significantly weaker."

Other federal agencies could also be affected, according to the article. The ruling could also make it harder for America's Environmental Protection Agency to crack down on power-plant pollution. And the Federal Trade Commission face more trouble in court defending its recent ban on noncompete agreements. Lawyer Daniel Jarcho tells the Journal that the Court's decision "will unquestionably lead to more litigation challenging federal agency actions, and more losses for federal agencies."

Friday a White House press secretary issued a statement calling the court's decision "deeply troubling," and arguing that the court had "decided in the favor of special interests".
The Internet

Japan Achieves 402 TB/s Data Rate - Using Current Fiber Technology (tomshardware.com) 21

Tom's Hardware reports that Japan's National Institute of Information and Communications Technology (working with the Aston Institute of Photonic Technologies and Nokia Bell) set a 402 terabits per second data transfer record — over commercially available optical fiber cables. The NICT and its partners were able to transmit signals through 1,505 channels over 50 km (about 31 miles) of optic fiber cable for this experiment. It used six types of amplifiers and an optical gain equalizer that taps into the unused 37 THz bandwidth to enable the 402 Tb/s transfer speed. One of the amplifiers this was demonstrated with is a thulium-based doped fiber amplifier, which uses C-band or C+L band systems. Additionally, semiconductor optical amplifiers and Raman amplifiers were used, which achieved 256 Tb/s data rate through almost 20 THz. Other amplifiers were also used for this exercise which provided a cumulative bandwidth of 25 THz for up to 119 Tb/s data rate.

As a result, its maximum achievable result surpassed the previous data rate capacity by over 25 percent and increased transmission bandwidth by 35 percent.

"This is achievable with currently available technology used by internet service providers..." the article points out.

"With 'beyond 5G' potential speeds achievable through commercially available cables, it will likely further a new generation of internet services."

Slashdot Top Deals