AI

'Copyright Traps' Could Tell Writers If an AI Has Scraped Their Work 79

An anonymous reader quotes a report from MIT Technology Review: Since the beginning of the generative AI boom, content creators have argued that their work has been scraped into AI models without their consent. But until now, it has been difficult to know whether specific text has actually been used in a training data set. Now they have a new way to prove it: "copyright traps" developed by a team at Imperial College London, pieces of hidden text that allow writers and publishers to subtly mark their work in order to later detect whether it has been used in AI models or not. The idea is similar to traps that have been used by copyright holders throughout history -- strategies like including fake locations on a map or fake words in a dictionary. [...] The code to generate and detect traps is currently available on GitHub, but the team also intends to build a tool that allows people to generate and insert copyright traps themselves. "There is a complete lack of transparency in terms of which content is used to train models, and we think this is preventing finding the right balance [between AI companies and content creators]," says Yves-Alexandre de Montjoye, an associate professor of applied mathematics and computer science at Imperial College London, who led the research.

The traps aren't foolproof and can be removed, but De Montjoye says that increasing the number of traps makes it significantly more challenging and resource-intensive to remove. "Whether they can remove all of them or not is an open question, and that's likely to be a bit of a cat-and-mouse game," he says.
AI

It May Soon Be Legal To Jailbreak AI To Expose How It Works (404media.co) 26

An anonymous reader quotes a report from 404 Media: A group of researchers, academics, and hackers are trying to make it easier to break AI companies' terms of service to conduct "good faith research" that exposes biases, inaccuracies, and training data without fear of being sued. The U.S. government is currently considering an exemption to U.S. copyright law that would allow people to break technical protection measures and digital rights management (DRM) on AI systems to learn more about how they work, probe them for bias, discrimination, harmful and inaccurate outputs, and to learn more about the data they are trained on. The exemption would allow for "good faith" security and academic research and "red-teaming" of AI products even if the researcher had to circumvent systems designed to prevent that research. The proposed exemption has the support of the Department of Justice, which said "good faith research can help reveal unintended or undisclosed collection or exposure of sensitive personal data, or identify systems whose operations or outputs are unsafe, inaccurate, or ineffective for the uses for which they are intended or marketed by developers, or employed by end users. Such research can be especially significant when AI platforms are used for particularly important purposes, where unintended, inaccurate, or unpredictable AI output can result in serious harm to individuals."

Much of what we know about how closed-sourced AI tools like ChatGPT, Midjourney, and others work are from researchers, journalists, and ordinary users purposefully trying to trick these systems into revealing something about the data they were trained on (which often includes copyrighted material indiscriminately and secretly scraped from the internet), its biases, and its weaknesses. Doing this type of research can often violate the terms of service users agree to when they sign up for a system. For example, OpenAI's terms of service state that users cannot "attempt to or assist anyone to reverse engineer, decompile or discover the source code or underlying components of our Services, including our models, algorithms, or systems (except to the extent this restriction is prohibited by applicable law)," and adds that users must not "circumvent any rate limits or restrictions or bypass any protective measures or safety mitigations we put on our Services."

Shayne Longpre, an MIT researcher who is part of the team pushing for the exemption, told me that "there is a lot of apprehensiveness about these models and their design, their biases, being used for discrimination, and, broadly, their trustworthiness." "But the ecosystem of researchers looking into this isn't super healthy. There are people doing the work but a lot of people are getting their accounts suspended for doing good-faith research, or they are worried about potential legal ramifications of violating terms of service," he added. "These terms of service have chilling effects on research, and companies aren't very transparent about their process for enforcing terms of service." The exemption would be to Section 1201 of the Digital Millennium Copyright Act, a sweeping copyright law. Other 1201 exemptions, which must be applied for and renewed every three years as part of a process through the Library of Congress, allow for the hacking of tractors and electronic devices for the purpose of repair, have carveouts that protect security researchers who are trying to find bugs and vulnerabilities, and in certain cases protect people who are trying to archive or preserve specific types of content.
Harley Geiger of the Hacking Policy Council said that an exemption is "crucial to identifying and fixing algorithmic flaws to prevent harm or disruption," and added that a "lack of clear legal protection under DMCA Section 1201 adversely affect such research."
Moon

Radar Images Suggest There's a Tunnel On the Moon (gizmodo.com) 58

Longtime Slashdot reader fahrbot-bot shares a report from Gizmodo: A team of researchers think they've discovered a cave on the Moon in radar images of the lunar surface, which they posit could be a future site for an established human presence on our rocky satellite. The tunnel is in the Mare Tranquillitatis (Sea of Tranquility) pit, the deepest known pit on the Moon. (If the name is familiar to you, the Sea of Tranquility is where the Apollo 11 mission landed in 1969.) The pit formed due to a lava tube's roof collapse or a collapse of a void structure created by tectonic processes. To look for potential cave structures within the pit, the researchers studied side-looking radar images taken by the Lunar Reconnaissance Orbiter's Mini-RF instrument between 2009 and 2011. The team then conducted 3D radar simulations of potential geometries of the pit and its cave, to determine that the brightness they saw in radar images could be due to subsurface features. Ultimately, the team determined there is a tunnel in the pit that is between 98 feet (30 meters) long and 262ft (80m) long. The tunnel is roughly 148ft (45m) wide and is either flat or inclined with a maximum steepness of 45 degrees. "The exploration of lunar caves through future robotic missions could provide a fresh perspective on the lunar subsurface and yield new insights into the evolution of lunar volcanism," the team wrote in the paper. "Furthermore, direct exploration could confirm the presence of stable subsurface environments shielded from radiation and with optimal temperature conditions for future human utilization."

The findings have been published in the journal Nature Astronomy.
Mozilla

Thunderbird 128: Annual ESR Brings New Features and 'a Rust Revolution' (thunderbird.net) 78

Thunderbird's annual Extended Support Release was revealed Friday, promising "significant" improvements to the overall user experience and "the speed at which we can deliver new features to you," according to the Thunderbird blog: We've devoted significant development time integrating Rust — a modern programming language originally created by Mozilla Research — into Thunderbird. Even though this is a seemingly invisible change, it is a major leap forward because it enhances our code quality and performance. This overhaul will allow us to share features between the desktop and future mobile versions of Thunderbird, and speed up our development process. It's a win for our developers and a win for you.
More from the blog OMG Ubuntu: I'm also stoked to see that Thunderbird 128 makes 'newest first' the default sort order for messages in message list. While some prefer the old way, I always found it strange that the oldest mails were shown first — team reverse chronology, represent!
They also cite "a number of OpenPGP improvements," plus a new preference option for displaying full names and email addresses of all recipients in the message list. (Plus, threaded-message views now display a "New Message" count.)

Other new features in this release:
  • A new and more attractive layout for Cards View (with adjustable heights) that "makes it easier to scan your email threads and glean information."
  • The folder pane has better recall of message thread states
  • Improved theme compatibility. "Your Thunderbird should blend seamlessly with your desktop environment, matching the system's accent colors perfectly." (Especially beneficial on Ubuntu and Mint.)
  • You can now customize the color of your account icon.

The Thunderbird blog also mentions that "We plan to launch the first phase of built-in support for Exchange, as well as Mozilla Sync, in a future Nebula point release (e.g. Thunderbird 128.X)."


Security

CISA Broke Into a US Federal Agency, No One Noticed For a Full 5 Months (theregister.com) 35

A 2023 red team exercise by the U.S. Cybersecurity and Infrastructure Security Agency (CISA) at an unnamed federal agency exposed critical security failings, including unpatched vulnerabilities, inadequate incident response, and weak credential management, leading to a full domain compromise. According to The Register's Connor Jones, the agency failed to detect or remediate malicious activity for five months. From the report: According to the agency's account of the exercise, the red team was able to gain initial access by exploiting an unpatched vulnerability (CVE-2022-21587 - 9.8) in the target agency's Oracle Solaris enclave, leading to what it said was a full compromise. It's worth noting that CVE-2022-21587, an unauthenticated remote code execution (RCE) bug carrying a near-maximum 9.8 CVSS rating, was added to CISA's known exploited vulnerability (KEV) catalog in February 2023. The initial intrusion by CISA's red team was made on January 25, 2023. "After gaining access, the team promptly informed the organization's trusted agents of the unpatched device, but the organization took over two weeks to apply the available patch," CISA's report reads. "Additionally, the organization did not perform a thorough investigation of the affected servers, which would have turned up IOCs and should have led to a full incident response. About two weeks after the team obtained access, exploit code was released publicly into a popular open source exploitation framework. CISA identified that the vulnerability was exploited by an unknown third party. CISA added this CVE to its Known Exploited Vulnerabilities Catalog on February 2, 2023." [...]

After gaining access to the Solaris enclave, the red team discovered they couldn't pivot into the Windows part of the network because missing credentials blocked their path, despite enjoying months of access to sensitive web apps and databases. Undeterred, CISA managed to make its way into the Windows network after carrying out phishing attacks on unidentified members of the target agency, one of which was successful. It said real adversaries may have instead used prolonged password-praying attacks rather than phishing at this stage, given that several service accounts were identified as having weak passwords. After gaining that access, the red team injected a persistent RAT and later discovered unsecured admin credentials, which essentially meant it was game over for the agency being assessed. "None of the accessed servers had any noticeable additional protections or network access restrictions despite their sensitivity and critical functions in the network," CISA said.

CISA described this as a "full domain compromise" that gave the attackers access to tier zero assets -- the most highly privileged systems. "The team found a password file left from a previous employee on an open, administrative IT share, which contained plaintext usernames and passwords for several privileged service accounts," the report reads. "With the harvested Lightweight Directory Access Protocol (LDAP) information, the team identified one of the accounts had system center operations manager (SCOM) administrator privileges and domain administrator privileges for the parent domain. "They identified another account that also had administrative permissions for most servers in the domain. The passwords for both accounts had not been updated in over eight years and were not enrolled in the organization's identity management (IDM)." From here, the red team realized the victim organization had trust relationships with multiple external FCEB organizations, which CISA's team then pivoted into using the access they already had.

The team "kerberoasted" one partner organization. Kerberoasting is an attack on the Kerberos authentication protocol typically used in Windows networks to authenticate users and devices. However, it wasn't able to move laterally with the account due to low privileges, so it instead used those credentials to exploit a second trusted partner organization. Kerberoasting yielded a more privileged account at the second external org, the password for which was crackable. CISA said that due to network ownership, legal agreements, and/or vendor opacity, these kinds of cross-organizational attacks are rarely tested during assessments. However, SILENTSHIELD assessments are able to be carried out following new-ish powers afforded to CISA by the FY21 National Defense Authorization Act (NDAA), the same powers that also allow CISA's Federal Attack Surface Testing (FAST) pentesting program to operate. It's crucial that these avenues are able to be explored in such exercises because they're routes into systems adversaries will have no reservations about exploring in a real-world scenario. For the first five months of the assessment, the target FCEB agency failed to detect or remediate any of the SILENTSHIELD activity, raising concerns over its ability to spot genuine malicious activity.
CISA said the findings demonstrated the need for agencies to apply defense-in-depth principles. The cybersecurity agency recommended network segmentation and a Secure-by-Design commitment.
Security

Rabbit R1 AI Device Exposed by API Key Leak (404media.co) 15

Security researchers claim to have discovered exposed API keys in the code of Rabbit's R1 AI device, potentially allowing access to all user responses and company services. The group, known as Rabbitude, says they could send emails from internal Rabbit addresses to demonstrate the vulnerability. 404 Media adds: In a statement, Rabbit said, "Today we were made aware of an alleged data breach. Our security team immediately began investigating it. As of right now, we are not aware of any customer data being leaked or any compromise to our systems. If we learn of any other relevant information, we will provide an update once we have more details."
AI

Anthropic Launches Claude 3.5 Sonnet, Says New Model Outperforms GPT-4 Omni (anthropic.com) 34

Anthropic launched Claude 3.5 Sonnet on Thursday, claiming it outperforms previous models and OpenAI's GPT-4 Omni. The AI startup also introduced Artifacts, a workspace for users to edit AI-generated projects. This release, part of the Claude 3.5 family, follows three months after Claude 3. Claude 3.5 Sonnet is available for free on Claude.ai and the Claude iOS app, while Claude Pro and Team plan subscribers can access it with significantly higher rate limits.

Anthropic plans to launch 3.5 versions of Haiku and Opus later this year, exploring features like web search and memory for future releases.

Anthropic also introduced Artifacts on Claude.ai, a new feature that expands how users can interact with Claude. When a user asks Claude to generate content like code snippets, text documents, or website designs, these Artifacts appear in a dedicated window alongside their conversation. This creates a dynamic workspace where they can see, edit, and build upon Claude's creations in real-time, seamlessly integrating AI-generated content into their projects and workflows, the startup said.
Python

Python 'Language Summit' 2024: Security Workflows, Calendar Versioning, Transforms and Lightning Talks (blogspot.com) 19

Friday the Python Software Foundation published several blog posts about this year's "Python Language Summit" May 15th (before PyCon US), which featured talks and discussions by core developers, triagers, and Python implementation maintainers.

There were several lightning talks. One talk came from the maintainer of the PyO3 project, offering Rust bindings for the Python C API (which requires mapping Rust concepts to Python — leaving a question as to how to map Rust's error-handling panic! macro). There was a talk on formalizing the PEP prototype process, and a talk on whether the Python team should have a more official presence in the Apple App Store (and maybe the Google Play Store). One talk suggested changing the formatting of error messages for assert statements, and one covered a "highly experimental" project to support structured data sharing between Python subinterpreters. One talk covered Python's "unsupported build" warning and how it should behave on platforms beyond Python's officially supported list.

Python Foundation blog posts also covered some of the longer talks, including one on the idea of using type annotations as a mechanism for transformers. One talk covered the new interactive REPL interpreter coming to Python 3.13.

And one talk focused on Python's security model after the xz-utils backdoor: Pablo Galindo Salgado, Steering Council member and the release manager for Python 3.10 and 3.11, brought this topic to the Language Summit to discuss what could be done to improve Python's security model... Pablo noted the similarities shared between CPython and xz-utils, referencing the previous Language Summit's talk on core developer burnout, the number of modules in the standard library that have one or zero maintainers, the high ratio of maintainers to source code, and the use of autotools for configuration. Autotools was used by [xz's] Jia Tan as part of the backdoor, specifically to obscure the changes to tainted release artifacts. Pablo confirmed along with many nods of agreement that indeed, CPython could be vulnerable to a contributor or core developer getting secretly malicious changes merged into the project.

For multiple reasons like being able to fix bugs and single-maintainer modules, CPython doesn't require reviewers on the pull requests of core developers. This can lead to "unilateral action", meaning that a change is introduced into CPython without the review of someone besides the author. Other situations like release managers backporting fixes to other branches without review are common.

Much discussion ensued about the possibility of altering workflows (including pull request reviews), identity verification, and the importance of post-incident action plans. Guido van Rossum suggested a "higher bar" for granting write access, but in the end "Overall it was clear there is more discussion and work to be done in this rapidly changing area."

In another talk, Hugo van Kemenade, the newly announced Release Manager for Python 3.14 and 3.15, "started the Language Summit with a proposal to change Python's versioning scheme. The perception of Python using semantic versioning is a source of confusion for users who don't expect backwards incompatible changes when upgrading to new versions of Python. In reality almost all new feature releases of Python include backwards incompatible changes such as the removal of "dead batteries" where PEP 594 marked 19 modules for removal in Python 3.13. Calendar Versioning (CalVer) encompasses a wide array of different versioning schemes that have one property in common: using the release date as part of a release's version... Hugo offered multiple proposed versioning schemes, including:

- Using the release year as minor version (3.YY.micro, "3.26.0")
- Using the release year as major version (YY.0.micro, "26.0.0")
- Using the release year and month as major and minor version (YY.MM.micro, "26.10.0")

[...] Overall the proposal to use the current year as the minor version was well-received, Hugo mentioned that he'd be drafting up a PEP for this change.

Encryption

Researcher Finds Side-Channel Vulnerability in Post-Quantum Key Encapsulation Mechanism (thecyberexpress.com) 12

Slashdot reader storagedude shared this report from The Cyber Express: A security researcher discovered an exploitable timing leak in the Kyber key encapsulation mechanism (KEM) that's in the process of being adopted by NIST as a post-quantum cryptographic standard. Antoon Purnal of PQShield detailed his findings in a blog post and on social media, and noted that the problem has been fixed with the help of the Kyber team. The issue was found in the reference implementation of the Module-Lattice-Based Key-Encapsulation Mechanism (ML-KEM) that's in the process of being adopted as a NIST post-quantum key encapsulation standard. "A key part of implementation security is resistance against side-channel attacks, which exploit the physical side-effects of cryptographic computations to infer sensitive information," Purnal wrote.

To secure against side-channel attacks, cryptographic algorithms must be implemented in a way so that "no attacker-observable effect of their execution depends on the secrets they process," he wrote. In the ML-KEM reference implementation, "we're concerned with a particular side channel that's observable in almost all cryptographic deployment scenarios: time." The vulnerability can occur when a compiler optimizes the code, in the process silently undoing "measures taken by the skilled implementer." In Purnal's analysis, the Clang compiler was found to emit a vulnerable secret-dependent branch in the poly_frommsg function of the ML-KEM reference code needed in both key encapsulation and decapsulation, corresponding to the expand_secure implementation.

While the reference implementation was patched, "It's important to note that this does not rule out the possibility that other libraries, which are based on the reference implementation but do not use the poly_frommsg function verbatim, may be vulnerable — either now or in the future," Purnal wrote.

Purnal also published a proof-of-concept demo on GitHub. "On an Intel Core i7-13700H, it takes between 5-10 minutes to leak the entire ML-KEM 512 secret key using end-to-end decapsulation timing measurements."
United States

Louisiana Becomes 10th US State to Make CS a High School Graduation Requirement (linkedin.com) 89

Long-time Slashdot reader theodp writes: "Great news, Louisiana!" tech-backed Code.org exclaimed Wednesday in celebratory LinkedIn, Facebook, and Twitter posts. Louisiana is "officially the 10th state to make computer science a [high school] graduation requirement. Huge thanks to Governor Jeff Landry for signing the bill and to our legislative champions, Rep. Jason Hughes and Sen. Thomas Pressly, for making it happen! This means every Louisiana student gets a chance to learn coding and other tech skills that are super important these days. These skills can help them solve problems, think critically, and open doors to awesome careers!"

Representative Hughes, the sponsor of HB264 — which calls for each public high school student to successfully complete a one credit CS course as a requirement for graduation and also permits students to take two units of CS instead of studying a Foreign Language — tweeted back: "HUGE thanks @codeorg for their partnership in this effort every step of the way! Couldn't have done it without [Code.org Senior Director of State Government Affairs] Anthony [Owen] and the Code.org team!"

Code.org also on Wednesday announced the release of its 2023 Impact Report, which touted its efforts "to include a requirement for every student to take computer science to receive a high school diploma." Since its 2013 launch, Code.org reports it's spent $219.8 million to push coding into K-12 classrooms, including $19 million on Government Affairs (Achievements: "Policies changed in 50 states. More than $343M in state budgets allocated to computer science.").

In Code.org by the Numbers, the nonprofit boasts that 254,683 students started Code.org's AP CS Principles course in the academic year (2025 Goal: 400K), while 21,425 have started Code.org's new Amazon-bankrolled AP CS A course. Estimates peg U.S. public high school enrollment at 15.5M students, annual K-12 public school spending at $16,080 per pupil, and an annual high school student course load at 6-8 credits...

Social Networks

TikTok Preparing a US Copy of the App's Core Algorithm (reuters.com) 57

An anonymous reader quotes a report from Reuters: TikTok is working on a clone of its recommendation algorithm for its 170 million U.S. users that may result in a version that operates independently of its Chinese parent and be more palatable to American lawmakers who want to ban it, according to sources with direct knowledge of the efforts. The work on splitting the source code ordered by TikTok's Chinese parent ByteDance late last year predated a bill to force a sale of TikTok's U.S. operations that began gaining steam in Congress this year. The bill was signed into law in April. The sources, who were granted anonymity because they are not authorized to speak publicly about the short-form video sharing app, said that once the code is split, it could lay the groundwork for a divestiture of the U.S. assets, although there are no current plans to do so. The company has previously said it had no plans to sell the U.S. assets and such a move would be impossible. [...]

In the past few months, hundreds of ByteDance and TikTok engineers in both the U.S. and China were ordered to begin separating millions of lines of code, sifting through the company's algorithm that pairs users with videos to their liking. The engineers' mission is to create a separate code base that is independent of systems used by ByteDance's Chinese version of TikTok, Douyin, while eliminating any information linking to Chinese users, two sources with direct knowledge of the project told Reuters. [...] The complexity of the task that the sources described to Reuters as tedious "dirty work" underscores the difficulty of splitting the underlying code that binds TikTok's U.S. operations to its Chinese parent. The work is expected to take over a year to complete, these sources said. [...] At one point, TikTok executives considered open sourcing some of TikTok's algorithm, or making it available to others to access and modify, to demonstrate technological transparency, the sources said.

Executives have communicated plans and provided updates on the code-splitting project during a team all-hands, in internal planning documents and on its internal communications system, called Lark, according to one of the sources who attended the meeting and another source who has viewed the messages. Compliance and legal issues involved with determining what parts of the code can be carried over to TikTok are complicating the work, according to one source. Each line of code has to be reviewed to determine if it can go into the separate code base, the sources added. The goal is to create a new source code repository for a recommendation algorithm serving only TikTok U.S. Once completed, TikTok U.S. will run and maintain its recommendation algorithm independent of TikTok apps in other regions and its Chinese version Douyin. That move would cut it off from the massive engineering development power of its parent company in Beijing, the sources said. If TikTok completes the work to split the recommendation engine from its Chinese counterpart, TikTok management is aware of the risk that TikTok U.S. may not be able to deliver the same level of performance as the existing TikTok because it is heavily reliant on ByteDance's engineers in China to update and maintain the code base to maximize user engagement, sources added.

Security

Memory Sealing 'mseal' System Call Merged For Linux 6.10 (phoronix.com) 50

"Merged this Friday evening into the Linux 6.10 kernel is the new mseal() system call for memory sealing," reports Phoronix: The mseal system call was led by Jeff Xu of Google's Chrome team. The goal with memory sealing is to also protect the memory mapping itself against modification. The new mseal Linux documentation explains:

"Modern CPUs support memory permissions such as RW and NX bits. The memory permission feature improves security stance on memory corruption bugs, i.e. the attacker can't just write to arbitrary memory and point the code to it, the memory has to be marked with X bit, or else an exception will happen. Memory sealing additionally protects the mapping itself against modifications. This is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system... Memory sealing can automatically be applied by the runtime loader to seal .text and .rodata pages and applications can additionally seal security-critical data at runtime. A similar feature already exists in the XNU kernel with the VM_FLAGS_PERMANENT flag and on OpenBSD with the mimmutable syscall."

The mseal system call is designed to be used by the likes of the GNU C Library "glibc" while loading ELF executables to seal non-writable memory segments or by the Google Chrome web browser and other browsers for protecting security sensitive data structures.

Programming

Rust Foundation Reports 20% of Rust Crates Use 'Unsafe' Keyword (rust-lang.org) 92

A Rust Foundation blog post begins by reminding readers that Rust programs "are unable to compile if memory management rules are violated, essentially eliminating the possibility of a memory issue at runtime."

But then it goes on to explore "Unsafe Rust in the wild" (used for a small set of actions like dereferencing a raw pointer, modifying a mutable static variable, or calling unsafe functions). "At a superficial glance, it might appear that Unsafe Rust undercuts the memory-safety benefits Rust is becoming increasingly celebrated for. In reality, the unsafe keyword comes with special safeguards and can be a powerful way to work with fewer restrictions when a function requires flexibility, so long as standard precautions are used."

The Foundation lists those available safeguards — which "make exploits rare — but not impossible." But then they go on to analyze just how much Rust code actually uses the unsafe keyword: The canonical way to distribute Rust code is through a package called a crate. As of May 2024, there are about 145,000 crates; of which, approximately 127,000 contain significant code. Of those 127,000 crates, 24,362 make use of the unsafe keyword, which is 19.11% of all crates. And 34.35% make a direct function call into another crate that uses the unsafe keyword [according to numbers derived from the Rust Foundation project Painter]. Nearly 20% of all crates have at least one instance of the unsafe keyword, a non-trivial number.

Most of these Unsafe Rust uses are calls into existing third-party non-Rust language code or libraries, such as C or C++. In fact, the crate with the most uses of the unsafe keyword is the Windows crate, which allows Rust developers to call into various Windows APIs. This does not mean that the code in these Unsafe Rust blocks are inherently exploitable (a majority or all of that code is most likely not), but that special care must be taken while using Unsafe Rust in order to avoid potential vulnerabilities...

Rust lives up to its reputation as an excellent and transformative tool for safe and secure programming, even in an Unsafe context. But this reputation requires resources, collaboration, and constant examination to uphold properly. For example, the Rust Project is continuing to develop tools like Miri to allow the checking of unsafe Rust code. The Rust Foundation is committed to this work through its Security Initiative: a program to support and advance the state of security within the Rust Programming language ecosystem and community. Under the Security Initiative, the Rust Foundation's Technology team has developed new tools like [dependency-graphing] Painter, TypoMania [which checks package registries for typo-squatting] and Sandpit [an internal tool watching for malicious crates]... giving users insight into vulnerabilities before they can happen and allowing for a quick response if an exploitation occurs.

Operating Systems

NetBSD Bans AI-Generated Code (netbsd.org) 64

Seven Spirals writes: NetBSD committers are now banned from using any AI-generated code from ChatGPT, CoPilot, or other AI tools. Time will tell how this plays out with both their users and core team. "If you commit code that was not written by yourself, double check that the license on that code permits import into the NetBSD source repository, and permits free distribution," reads NetBSD's updated commit guidelines. "Check with the author(s) of the code, make sure that they were the sole author of the code and verify with them that they did not copy any other code. Code generated by a large language model or similar technology, such as GitHub/Microsoft's Copilot, OpenAI's ChatGPT, or Facebook/Meta's Code Llama, is presumed to be tainted code, and must not be committed without prior written approval by core."
Science

Revolutionary Genetics Research Shows RNA May Rule Our Genome (scientificamerican.com) 80

Philip Ball reports via Scientific American: Thomas Gingeras did not intend to upend basic ideas about how the human body works. In 2012 the geneticist, now at Cold Spring Harbor Laboratory in New York State, was one of a few hundred colleagues who were simply trying to put together a compendium of human DNA functions. Their Âproject was called ENCODE, for the Encyclopedia of DNA Elements. About a decade earlier almost all of the three billion DNA building blocks that make up the human genome had been identified. Gingeras and the other ENCODE scientists were trying to figure out what all that DNA did. The assumption made by most biologists at that time was that most of it didn't do much. The early genome mappers estimated that perhaps 1 to 2 percent of our DNA consisted of genes as classically defined: stretches of the genome that coded for proteins, the workhorses of the human body that carry oxygen to different organs, build heart muscles and brain cells, and do just about everything else people need to stay alive. Making proteins was thought to be the genome's primary job. Genes do this by putting manufacturing instructions into messenger molecules called mRNAs, which in turn travel to a cell's protein-making machinery. As for the rest of the genome's DNA? The "protein-coding regions," Gingeras says, were supposedly "surrounded by oceans of biologically functionless sequences." In other words, it was mostly junk DNA.

So it came as rather a shock when, in several 2012 papers in Nature, he and the rest of the ENCODE team reported that at one time or another, at least 75 percent of the genome gets transcribed into RNAs. The ENCODE work, using techniques that could map RNA activity happening along genome sections, had begun in 2003 and came up with preliminary results in 2007. But not until five years later did the extent of all this transcription become clear. If only 1 to 2 percent of this RNA was encoding proteins, what was the rest for? Some of it, scientists knew, carried out crucial tasks such as turning genes on or off; a lot of the other functions had yet to be pinned down. Still, no one had imagined that three quarters of our DNA turns into RNA, let alone that so much of it could do anything useful. Some biologists greeted this announcement with skepticism bordering on outrage. The ENCODE team was accused of hyping its findings; some critics argued that most of this RNA was made accidentally because the RNA-making enzyme that travels along the genome is rather indiscriminate about which bits of DNA it reads.

Now it looks like ENCODE was basically right. Dozens of other research groups, scoping out activity along the human genome, also have found that much of our DNA is churning out "noncoding" RNA. It doesn't encode proteins, as mRNA does, but engages with other molecules to conduct some biochemical task. By 2020 the ENCODE project said it had identified around 37,600 noncoding genes -- that is, DNA stretches with instructions for RNA molecules that do not code for proteins. That is almost twice as many as there are protein-coding genes. Other tallies vary widely, from around 18,000 to close to 96,000. There are still doubters, but there are also enthusiastic biologists such as Jeanne Lawrence and Lisa Hall of the University of Massachusetts Chan Medical School. In a 2024 commentary for the journal Science, the duo described these findings as part of an "RNA revolution."

What makes these discoveries revolutionary is what all this noncoding RNA -- abbreviated as ncRNA -- does. Much of it indeed seems involved in gene regulation: not simply turning them off or on but also fine-tuning their activity. So although some genes hold the blueprint for proteins, ncRNA can control the activity of those genes and thus ultimately determine whether their proteins are made. This is a far cry from the basic narrative of biology that has held sway since the discovery of the DNA double helix some 70 years ago, which was all about DNA leading to proteins. "It appears that we may have fundamentally misunderstood the nature of genetic programming," wrote molecular biologists Kevin Morris of Queensland University of Technology and John Mattick of the University of New South Wales in Australia in a 2014 article. Another important discovery is that some ncRNAs appear to play a role in disease, for example, by regulating the cell processes involved in some forms of cancer. So researchers are investigating whether it is possible to develop drugs that target such ncRNAs or, conversely, to use ncRNAs themselves as drugs. If a gene codes for a protein that helps a cancer cell grow, for example, an ncRNA that shuts down the gene might help treat the cancer.

Games

Game Dev Says Contract Barring 'Subjective Negative Reviews' Was a Mistake (arstechnica.com) 26

The developers of team-based shooter Marvel Rivals have apologized for a contract clause that made creators promise not to provide "subjective negative reviews of the game" in exchange for early access to a closed alpha test. From a report: The controversial early access contract gained widespread attention over the weekend when streamer Brandon Larned shared a portion on social media. In the "non-disparagement" clause shared by Larned, creators who are provided with an early download code are asked not to "make any public statements or engage in discussions that are detrimental to the reputation of the game." In addition to the "subjective negative review" example above, the clause also specifically prohibits "making disparaging or satirical comments about any game-related material" and "engaging in malicious comparisons with competitors or belittling the gameplay or differences of Marvel Rivals."
AI

Did OpenAI, Google and Meta 'Cut Corners' to Harvest AI Training Data? (indiatimes.com) 58

What happened when OpenAI ran out of English-language training data in 2021?

They just created a speech recognition tool that could transcribe the audio from YouTube videos, reports The New York Times, as part of an investigation arguing that tech companies "including OpenAI, Google and Meta have cut corners, ignored corporate policies and debated bending the law" in their search for AI training data. [Alternate URL here.] Some OpenAI employees discussed how such a move might go against YouTube's rules, three people with knowledge of the conversations said. YouTube, which is owned by Google, prohibits use of its videos for applications that are "independent" of the video platform. Ultimately, an OpenAI team transcribed more than 1 million hours of YouTube videos, the people said. The team included Greg Brockman, OpenAI's president, who personally helped collect the videos, two of the people said. The texts were then fed into a system called GPT-4...

At Meta, which owns Facebook and Instagram, managers, lawyers and engineers last year discussed buying the publishing house Simon & Schuster to procure long works, according to recordings of internal meetings obtained by the Times. They also conferred on gathering copyrighted data from across the internet, even if that meant facing lawsuits. Negotiating licenses with publishers, artists, musicians and the news industry would take too long, they said.

Like OpenAI, Google transcribed YouTube videos to harvest text for its AI models, five people with knowledge of the company's practices said. That potentially violated the copyrights to the videos, which belong to their creators. Last year, Google also broadened its terms of service. One motivation for the change, according to members of the company's privacy team and an internal message viewed by the Times, was to allow Google to be able to tap publicly available Google Docs, restaurant reviews on Google Maps and other online material for more of its AI products...

Some Google employees were aware that OpenAI had harvested YouTube videos for data, two people with knowledge of the companies said. But they didn't stop OpenAI because Google had also used transcripts of YouTube videos to train its AI models, the people said. That practice may have violated the copyrights of YouTube creators. So if Google made a fuss about OpenAI, there might be a public outcry against its own methods, the people said.

The article adds that some tech companies are now even developing "synthetic" information to train AI.

"This is not organic data created by humans, but text, images and code that AI models produce — in other words, the systems learn from what they themselves generate."
IT

Some San Francisco Tech Workers are Renting Cheap 'Bed Pods' (sfgate.com) 184

An anonymous reader shared this report from SFGate: Late last year, tales of tech workers paying $700 a month for tiny "bed pods" in downtown San Francisco went viral. The story provided a perfect distillation of SF's wild (and wildly expensive) housing market — and inspired schadenfreude when the city deemed the situation illegal. But the provocative living situation wasn't an anomaly, according to a city official.

"We've definitely seen an uptick of these 'pod'-type complaints," Kelly Wong, a planner with San Francisco's code enforcement and zoning and compliance team, told SFGATE... Wong stressed that it's not that San Francisco is inherently against bed pod-type arrangements, but that the city is responsible for making sure these spaces are safe and legally zoned.


So Brownstone Shared Housing is still renting one bed pod location — but not accepting new tenants — after citations for failing to get proper permits and having a lock on the front door that required a key to exit.

And SFGate also spoke to Alex Akel, general manager of Olive Rooms, which opened up a co-living and co-working space in SoMa earlier this year (and also faced "a flurry of complaints.") "Unfortunately, we had complaints from neighbors because of foot traffic and noise, and since then we cut the number of people to fit the ordinance by the city," Akel wrote. Olive Rooms describes its space as targeted at "tech founders from Central Asia, giving them opportunities to get involved in the current AI boom." Akel added that its residents are "bringing new energy to SF," but that the program "will not accept new residents before we clarify the status with the city."

In April, the city also received a complaint about a group called Let's Be Buds, which rents out 14 pods in a loft on Divisadero Street that start at $575 per month for an upper bunk.

While this recent burst of complaints is new, bed pods in San Francisco have been catching flak for years... a company called PodShare, which rents — you guessed it — bed pods, squared itself away with the city and has operated in SF since 2019.

Brownstone's CEO told SFGate "A lot of people want to be here for AI, or for school, or different opportunities." He argues that "it's literally impossible without a product like ours," and that their residents had said the option "positively changed the trajectory of their lives."
AI

AI Engineers Report Burnout, Rushed Rollouts As 'Rat Race' To Stay Competitive Hits Tech Industry (cnbc.com) 36

An anonymous reader quotes a report from CNBC: Late last year, an artificial intelligence engineer at Amazon was wrapping up the work week and getting ready to spend time with some friends visiting from out of town. Then, a Slack message popped up. He suddenly had a deadline to deliver a project by 6 a.m. on Monday. There went the weekend. The AI engineer bailed on his friends, who had traveled from the East Coast to the Seattle area. Instead, he worked day and night to finish the job. But it was all for nothing. The project was ultimately "deprioritized," the engineer told CNBC. He said it was a familiar result. AI specialists, he said, commonly sprint to build new features that are often suddenly shelved in favor of a hectic pivot to another AI project.

The engineer, who requested anonymity out of fear of retaliation, said he had to write thousands of lines of code for new AI features in an environment with zero testing for mistakes. Since code can break if the required tests are postponed, the Amazon engineer recalled periods when team members would have to call one another in the middle of the night to fix aspects of the AI feature's software. AI workers at other Big Tech companies, including Google and Microsoft, told CNBC about the pressure they are similarly under to roll out tools at breakneck speeds due to the internal fear of falling behind the competition in a technology that, according to Nvidia CEO Jensen Huang, is having its "iPhone moment."

Microsoft

Microsoft Overhaul Treats Security as 'Top Priority' After a Series of Failures 55

Microsoft is making security its number one priority for every employee, following years of security issues and mounting criticisms. The Verge: After a scathing report from the US Cyber Safety Review Board recently concluded that "Microsoft's security culture was inadequate and requires an overhaul," it's doing just that by outlining a set of security principles and goals that are tied to compensation packages for Microsoft's senior leadership team. Last November, Microsoft announced a Secure Future Initiative (SFI) in response to mounting pressure on the company to respond to attacks that allowed Chinese hackers to breach US government email accounts.

Just days after announcing this initiative, Russian hackers managed to breach Microsoft's defenses and spy on the email accounts of some members of Microsoft's senior leadership team. Microsoft only discovered the attack nearly two months later in January, and the same group even went on to steal source code. These recent attacks have been damaging, and the Cyber Safety Review Board report added fuel to Microsoft's security fire recently by concluding that the company could have prevented the 2023 breach of US government email accounts and that a "cascade of security failures" led to that incident. "We are making security our top priority at Microsoft, above all else -- over all other features," explains Charlie Bell, executive vice president for Microsoft security, in a blog post today. "We will instill accountability by basing part of the compensation of the company's Senior Leadership Team on our progress in meeting our security plans and milestones."

Slashdot Top Deals