Slashdot: News for nerds, stuff that matters, & software

AI Has Already Run Out of Training Data, Goldman's Data Chief Says (businessinsider.com) 81

Posted by msmash on Thursday October 02, 2025 @04:41PM from the running-out-of-oil dept.

Experimental Gene Therapy Found To Slow Huntington's Disease Progression (bbc.com) 13

Posted by BeauHD on Thursday September 25, 2025 @03:00AM from the medical-breakthroughs dept.

Doctors report the first successful treatment for Huntington's disease using a new type of gene therapy given during 12 to 18 hours of delicate brain surgery. The BBC reports: An emotional research team became tearful as they described how data shows the disease was slowed by 75% in patients. It means the decline you would normally expect in one year would take four years after treatment, giving patients decades of "good quality life", Prof Sarah Tabrizi told BBC News. The first symptoms of Huntington's disease tend to appear in your 30s or 40s and is normally fatal within two decades -- opening the possibility that earlier treatment could prevent symptoms from ever emerging. None of the patients who have been treated are being identified, but one was medically retired and has returned to work. Others in the trial are still walking despite being expected to need a wheelchair. Treatment is likely to be very expensive. However, this is a moment of real hope in a disease that hits people in their prime and devastates families. [...]

It starts with a safe virus that has been altered to contain a specially designed sequence of DNA. This is infused deep into the brain using real-time MRI scanning to guide a microcatheter to two brain regions - the caudate nucleus and the putamen. This takes 12 to 18 hours of neurosurgery. The virus then acts like a microscopic postman -- delivering the new piece of DNA inside brain cells, where it becomes active. This turns the neurons into a factory for making the therapy to avert their own death. The cells produce a small fragment of genetic material (called microRNA) that is designed to intercept and disable the instructions (called messenger RNA) being sent from the cells' DNA for building mutant huntingtin. This results in lower levels of mutant huntingtin in the brain. [...]

The data showed that three years after surgery there was an average 75% slowing of the disease based on a measure which combines cognition, motor function and the ability to manage in daily life. The data also shows the treatment is saving brain cells. Levels of neurofilaments in spinal fluid -- a clear sign of brain cells dying -- should have increased by a third if the disease continued to progress, but was actually lower than at the start of the trial.

How NASA Saved a Camera From 370 Million Miles Away (phys.org) 38

Posted by BeauHD on Tuesday July 22, 2025 @09:00AM from the Hail-Mary dept.

An anonymous reader quotes a report from Phys.org: The mission team of NASA's Jupiter-orbiting Juno spacecraft executed a deep-space move in December 2023 to repair its JunoCam imager to capture photos of the Jovian moon Io. Results from the long-distance save were presented during a technical session on July 16 at the Institute of Electrical and Electronics Engineers Nuclear & Space Radiation Effects Conference in Nashville. JunoCam is a color, visible-light camera. The optical unit for the camera is located outside a titanium-walled radiation vault, which protects sensitive electronic components for many of Juno's engineering and science instruments. This is a challenging location because Juno's travels carry it through the most intense planetary radiation fields in the solar system. While mission designers were confident JunoCam could operate through the first eight orbits of Jupiter, no one knew how long the instrument would last after that. Throughout Juno's first 34 orbits (its prime mission), JunoCam operated normally, returning images the team routinely incorporated into the mission's science papers. Then, during its 47th orbit, the imager began showing hints of radiation damage. By orbit 56, nearly all the images were corrupted.

While the team knew the issue might be tied to radiation, pinpointing what was specifically damaged within JunoCam was difficult from hundreds of millions of miles away. Clues pointed to a damaged voltage regulator that was vital to JunoCam's power supply. With few options for recovery, the team turned to a process called annealing, where a material is heated for a specified period before slowly cooling. Although the process is not well understood, the idea is that heating can reduce defects in the material. Soon after the annealing process finished, JunoCam began cranking out crisp images for the next several orbits. But Juno was flying deeper and deeper into the heart of Jupiter's radiation fields with each pass. By orbit 55, the imagery had again begun showing problems.

"After orbit 55, our images were full of streaks and noise," said JunoCam instrument lead Michael Ravine of Malin Space Science Systems. "We tried different schemes for processing the images to improve the quality, but nothing worked. With the close encounter of Io bearing down on us in a few weeks, it was Hail Mary time: The only thing left we hadn't tried was to crank JunoCam's heater all the way up and see if more extreme annealing would save us." Test images sent back to Earth during the annealing showed little improvement in the first week. Then, with the close approach of Io only days away, the images began to improve dramatically. By the time Juno came within 930 miles (1,500 kilometers) of the volcanic moon's surface on Dec. 30, 2023, the images were almost as good as the day the camera launched, capturing detailed views of Io's north polar region that revealed mountain blocks covered in sulfur dioxide frosts rising sharply from the plains and previously uncharted volcanoes with extensive flow fields of lava. To date, the solar-powered spacecraft has orbited Jupiter 74 times. Recently, the image noise returned during Juno's 74th orbit.

Android 16 Is Here (blog.google) 23

Posted by BeauHD on Tuesday June 10, 2025 @05:43PM from the new-and-improved dept.

AI Firms Say They Can't Respect Copyright. But A Nonprofit's Researchers Just Built a Copyright-Respecting Dataset (msn.com) 100

Posted by EditorDavid on Saturday June 07, 2025 @07:28PM from the trials-of-training dept.

Is copyrighted material a requirement for training AI? asks the Washington Post. That's what top AI companies are arguing, and "Few AI developers have tried the more ethical route — until now.

"A group of more than two dozen AI researchers have found that they could build a massive eight-terabyte dataset using only text that was openly licensed or in public domain. They tested the dataset quality by using it to train a 7 billion parameter language model, which performed about as well as comparable industry efforts, such as Llama 2-7B, which Meta released in 2023." A paper published Thursday detailing their effort also reveals that the process was painstaking, arduous and impossible to fully automate. The group built an AI model that is significantly smaller than the latest offered by OpenAI's ChatGPT or Google's Gemini, but their findings appear to represent the biggest, most transparent and rigorous effort yet to demonstrate a different way of building popular AI tools....

As it turns out, the task involves a lot of humans. That's because of the technical challenges of data not being formatted in a way that's machine readable, as well as the legal challenges of figuring out what license applies to which website, a daunting prospect when the industry is rife with improperly licensed data. "This isn't a thing where you can just scale up the resources that you have available" like access to more computer chips and a fancy web scraper, said Stella Biderman [executive director of the nonprofit research institute Eleuther AI]. "We use automated tools, but all of our stuff was manually annotated at the end of the day and checked by people. And that's just really hard."

Still, the group managed to unearth new datasets that can be used ethically. Those include a set of 130,000 English language books in the Library of Congress, which is nearly double the size of the popular-books dataset Project Gutenberg. The group's initiative also builds on recent efforts to develop more ethical, but still useful, datasets, such as FineWeb from Hugging Face, the open-source repository for machine learning... Still, Biderman remained skeptical that this approach could find enough content online to match the size of today's state-of-the-art models... Biderman said she didn't expect companies such as OpenAI and Anthropic to start adopting the same laborious process, but she hoped it would encourage them to at least rewind back to 2021 or 2022, when AI companies still shared a few sentences of information about what their models were trained on.

"Even partial transparency has a huge amount of social value and a moderate amount of scientific value," she said.

Football and Other Premium TV Being Pirated At 'Industrial Scale' (bbc.com) 132

Posted by BeauHD on Friday May 30, 2025 @11:30PM from the ambivalence-and-inertia dept.

An anonymous reader quotes a report from the BBC: A lack of action by big tech firms is enabling the "industrial scale theft" of premium video services, especially live sport, a new report says. The research by Enders Analysis accuses Amazon, Google, Meta and Microsoft of "ambivalence and inertia" over a problem it says costs broadcasters revenue and puts users at an increased risk of cyber-crime. Gareth Sutcliffe and Ollie Meir, who authored the research, described the Amazon Fire Stick -- which they argue is the device many people use to access illegal streams -- as "a piracy enabler." [...] The device plugs into TVs and gives the viewer thousands of options to watch programs from legitimate services including the BBC iPlayer and Netflix. They are also being used to access illegal streams, particularly of live sport.

In November last year, a Liverpool man who sold Fire Stick devices he reconfigured to allow people to illegally stream Premier League football matches was jailed. After uploading the unauthorized services on the Amazon product, he advertised them on Facebook. Another man from Liverpool was given a two-year suspended sentence last year after modifying fire sticks and selling them on Facebook and WhatsApp. According to data for the first quarter of this year, provided to Enders by Sky, 59% of people in UK who said they had watched pirated material in the last year while using a physical device said they had used a Amazon fire product. The Enders report says the fire stick enables "billions of dollars in piracy" overall. [...]

The researchers also pointed to the role played by the "continued depreciation" of Digital Rights Management (DRM) systems, particularly those from Google and Microsoft. This technology enables high quality streaming of premium content to devices. Two of the big players are Microsoft's PlayReady and Google's Widevine. The authors argue the architecture of the DRM is largely unchanged, and due to a lack of maintenance by the big tech companies, PlayReady and Widevine "are now compromised across various security levels." Mr Sutcliffe and Mr Meir said this has had "a seismic impact across the industry, and ultimately given piracy the upper hand by enabling theft of the highest quality content." They added: "Over twenty years since launch, the DRM solutions provided by Google and Microsoft are in steep decline. A complete overhaul of the technology architecture, licensing, and support model is needed. Lack of engagement with content owners indicates this a low priority."

Amazon Unveils Its First Quantum Computing Chip (aboutamazon.com) 6

Posted by msmash on Thursday February 27, 2025 @07:14AM from the race-intensifies dept.

VGHF Opens Free Online Access To 1,500 Classic Game Mags, 30K Historic Files (arstechnica.com) 12

Posted by BeauHD on Thursday January 30, 2025 @11:30PM from the come-and-get-it dept.

'The Dying Language of Accounting' (wsj.com) 177

Posted by msmash on Wednesday December 11, 2024 @05:01PM from the how-about-that dept.

'Brain Rot' Named Oxford Word of the Year 2024 26

Posted by msmash on Monday December 02, 2024 @02:10PM from the how-about-that dept.

Google Deepens Crackdown on Sites Publishing 'Parasite SEO' Content (theverge.com) 13

Posted by msmash on Wednesday November 20, 2024 @10:01AM from the moving-forward dept.

AI Lab PleIAs Releases Fully Open Dataset, as AMD, Ai2 Release Open AI Models (huggingface.co) 5

Posted by EditorDavid on Saturday November 16, 2024 @12:34PM from the model-citizens dept.

French private AI lab PleIAs "is committed to training LLMs in the open," they write in a blog post at Mozilla.org. "This means not only releasing our models but also being open about every aspect, from the training data to the training code. We define 'open' strictly: all data must be both accessible and under permissive licenses."

Wednesday PleIAs announced they were releasing the largest open multilingual pretraining dataset, according to their blog post at HuggingFace: Many have claimed that training large language models requires copyrighted data, making truly open AI development impossible. Today, Pleias is proving otherwise with the release of Common Corpus (part of the AI Alliance Open Trusted Data Initiative) — the largest fully open multilingual dataset for training LLMs, containing over 2 trillion tokens of permissibly licensed content with provenance information (2,003,039,184,047 tokens).

As developers are responding to pressures from new regulations like the EU AI Act, Common Corpus goes beyond compliance by making our entire permissibly licensed dataset freely available on HuggingFace, with detailed documentation of every data source. We have taken extensive steps to ensure that the dataset is high-quality and is curated to train powerful models. Through this release, we are demonstrating that there doesn't have to be such a [heavy] trade-off between openness and performance.

Common Corpus is:

— Truly Open: contains only data that is permissively licensed and provenance is documented

— Multilingual: mostly representing English and French data, but contains at least 1B tokens for over 30 languages

— Diverse: consisting of scientific articles, government and legal documents, code, and cultural heritage data, including books and newspapers

— Extensively Curated: spelling and formatting has been corrected from digitized texts, harmful and toxic content has been removed, and content with low educational content has also been removed.

Common corpus builds on a growing ecosystem of large, open datasets, such as Dolma, FineWeb, RefinedWeb. The Common Pile currently in preparation under the coordination of Eleuther is built around the same principle of using permissible content in English language and, unsurprisingly, there were many opportunities for collaborations and shared efforts. But even together, these datasets do not provide enough training data for models much larger than a few billion parameters. So in order to expand the options for open model training, we still need more open data...

Based on an analysis of 1 million user interactions with ChatGPT, the plurality of user requests are for creative compositions... The kind of content we actually need — like creative writing — is usually tied up in copyright restrictions. Common Corpus tackles these challenges through five carefully curated collections...
Last week AMD also released its first series of fully open 1 billion parameter language models, AMD OLMo.

And last month VentureBeat reported that the non-profit Allen Institute for AI had unveiled Molmo, "an open-source family of state-of-the-art multimodal AI models which outpeform top proprietary rivals including OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet, and Google's Gemini 1.5 on several third-party benchmarks."

LG's New Stretchable Display Can Grow By 50% (tomshardware.com) 26

Posted by BeauHD on Tuesday November 12, 2024 @09:00AM from the industry-leading dept.

Why Boeing is Dismissing a Top Executive (barrons.com) 45

Posted by EditorDavid on Saturday September 28, 2024 @09:34PM from the personnel-changes dept.

Last weekend Boeing announced that its CEO of Defense, Space, and Security "had left the company," according to Barrons. "Parting ways like this, for upper management, is the equivalent to firing," they write — though they add that setbacks on Starliner's first crewed test flight is "far too simple an explanation." Starliner might, however, have been the straw that broke the camel's back. [New CEO Kelly] Ortberg took over in early August, so his first material interaction with the Boeing Defense and Space business was the spaceship's failed test flight... Starliner has cost Boeing $1.6 billion and counting. That's lot of money, but not all that much in the context of the Defense business, which generates sales of roughly $25 billion a year.... [T]he overall Defense business has performed poorly of late, burdened by fixed price contracts that have become unprofitable amid years of higher than expected inflation. Profitability in the defense business has been declining since 2020 and started losing money in 2022. From 2022 to 2024 losses should total about $6 billion cumulatively, including Wall Street's estimates for the second half of this year.

Still, it felt like something had to give. And the change shows investors something about new CEO Ortberg. "At this critical juncture, our priority is to restore the trust of our customers and meet the high standards they expect of us," read part of an internal email sent to Boeing employees announcing the change. "Why his predecessor — David Calhoun — didn't pull this trigger earlier this year is a mystery," wrote Gordon Haskett analyst Don Bilson in a Monday note. "Can't leave astronauts behind."
"Ortberg's logic appears sound," the article concludes. "In recent years, Boeing has disappointed its airline and defense customers, including NASA...

"After Starliner, defense profitability, and the strike, Ortberg has to tackle production quality, production rates, and Boeing's ailing balance sheet. Boeing has amassed almost $60 billion in debt since the second tragic 737 MAX crash in March 2019."

Thanks to Slashdot reader Press2ToContinue for sharing the news.

OceanGate Submersible Victim's Family Sues For $50 Million, Partly Blames $30 Logitech Controller (extremetech.com) 92

Posted by BeauHD on Tuesday August 13, 2024 @08:02PM from the low-cost-high-stakes dept.

Nvidia RTX 40-Series GPUs Hampered By Low-Quality Thermal Paste (pcgamer.com) 50

Posted by BeauHD on Monday July 22, 2024 @08:10PM from the penny-pinching dept.

Fired Employee Accessed NCS' Computer 'Test System' and Deleted Servers (channelnewsasia.com) 63

Posted by BeauHD on Thursday June 13, 2024 @04:40PM from the confused-and-upset dept.

An anonymous reader quotes a report from Singapore's CNA news channel: Kandula Nagaraju, 39, was sentenced to two years and eight months' jail on Monday (Jun 10) for one charge of unauthorized access to computer material. Another charge was taken into consideration for sentencing. His contract with NCS was terminated in October 2022 due to poor work performance and his official last date of employment was Nov 16, 2022. According to court documents, Kandula felt "confused and upset" when he was fired as he felt he had performed well and "made good contributions" to NCS during his employment. After leaving NCS, he did not have another job in Singapore and returned to India.

Between November 2021 and October 2022, Kandula was part of a 20-member team managing the quality assurance (QA) computer system at NCS. NCS is a company that offers information communication and technology services. The system that Kandula's former team was managing was used to test new software and programs before launch. In a statement to CNA on Wednesday, NCS said it was a "standalone test system." It consisted of about 180 virtual servers, and no sensitive information was stored on them. After Kandula's contract was terminated and he arrived back in India, he used his laptop to gain unauthorized access to the system using the administrator login credentials. He did so on six occasions between Jan 6 and Jan 17, 2023.

In February that year, Kandula returned to Singapore after finding a new job. He rented a room with a former NCS colleague and used his Wi-Fi network to access NCS' system once on Feb 23, 2023. During the unauthorized access in those two months, he wrote some computer scripts to test if they could be used on the system to delete the servers. In March 2023, he accessed NCS' QA system 13 times. On Mar 18 and 19, he ran a programmed script to delete 180 virtual servers in the system. His script was written such that it would delete the servers one at a time. The following day, the NCS team realized the system was inaccessible and tried to troubleshoot, but to no avail. They discovered that the servers had been deleted. [...] As a result of his actions, NCS suffered a loss of $679,493.

Journalists 'Deeply Troubled' By OpenAI's Content Deals With Vox, The Atlantic (arstechnica.com) 100

Posted by BeauHD on Saturday June 01, 2024 @06:00AM from the thanks-for-the-heads-up dept.

Benj Edwards and Ashley Belanger reports via Ars Technica: On Wednesday, Axios broke the news that OpenAI had signed deals with The Atlantic and Vox Media that will allow the ChatGPT maker to license their editorial content to further train its language models. But some of the publications' writers -- and the unions that represent them -- were surprised by the announcements and aren't happy about it. Already, two unions have released statements expressing "alarm" and "concern." "The unionized members of The Atlantic Editorial and Business and Technology units are deeply troubled by the opaque agreement The Atlantic has made with OpenAI," reads a statement from the Atlantic union. "And especially by management's complete lack of transparency about what the agreement entails and how it will affect our work."

The Vox Union -- which represents The Verge, SB Nation, and Vulture, among other publications -- reacted in similar fashion, writing in a statement, "Today, members of the Vox Media Union ... were informed without warning that Vox Media entered into a 'strategic content and product partnership' with OpenAI. As both journalists and workers, we have serious concerns about this partnership, which we believe could adversely impact members of our union, not to mention the well-documented ethical and environmental concerns surrounding the use of generative AI." [...] News of the deals took both journalists and unions by surprise. On X, Vox reporter Kelsey Piper, who recently penned an expose about OpenAI's restrictive non-disclosure agreements that prompted a change in policy from the company, wrote, "I'm very frustrated they announced this without consulting their writers, but I have very strong assurances in writing from our editor in chief that they want more coverage like the last two weeks and will never interfere in it. If that's false I'll quit.."

Journalists also reacted to news of the deals through the publications themselves. On Wednesday, The Atlantic Senior Editor Damon Beres wrote a piece titled "A Devil's Bargain With OpenAI," in which he expressed skepticism about the partnership, likening it to making a deal with the devil that may backfire. He highlighted concerns about AI's use of copyrighted material without permission and its potential to spread disinformation at a time when publications have seen a recent string of layoffs. He drew parallels to the pursuit of audiences on social media leading to clickbait and SEO tactics that degraded media quality. While acknowledging the financial benefits and potential reach, Beres cautioned against relying on inaccurate, opaque AI models and questioned the implications of journalism companies being complicit in potentially destroying the internet as we know it, even as they try to be part of the solution by partnering with OpenAI.

Similarly, over at Vox, Editorial Director Bryan Walsh penned a piece titled, "This article is OpenAI training data," in which he expresses apprehension about the licensing deal, drawing parallels between the relentless pursuit of data by AI companies and the classic AI thought experiment of Bostrom's "paperclip maximizer," cautioning that the single-minded focus on market share and profits could ultimately destroy the ecosystem AI companies rely on for training data. He worries that the growth of AI chatbots and generative AI search products might lead to a significant decline in search engine traffic to publishers, potentially threatening the livelihoods of content creators and the richness of the Internet itself.

For Data-Guzzling AI Companies, the Internet Is Too Small (wsj.com) 60

Posted by msmash on Monday April 01, 2024 @10:44AM from the how-about-that dept.

Researchers Develop New Material That Converts CO2 into Methanol Using Sunlight (scitechdaily.com) 56

Posted by EditorDavid on Saturday March 30, 2024 @01:34PM from the fun-with-photocatalysis dept.

2008	Are C and C++ Losing Ground?	961 comments
2007	FDA Considers Redefining Chocolate	939 comments
2006	New Congressional Bill Makes DMCA Look Tame	895 comments
2003	Linus on DRM	969 comments
2002	Worst Buy	1037 comments

Slashdot Top Deals