Slashdot: News for nerds, stuff that matters, & software

Is AI Impacting Which Programming Language Projects Use? (github.blog) 58

Posted by EditorDavid on Monday February 23, 2026 @08:34AM from the language-barriers dept.

LLM-Generated Passwords Look Strong but Crack in Hours, Researchers Find (theregister.com) 84

Posted by msmash on Thursday February 19, 2026 @02:45PM from the security-woes dept.

Wikipedia's Guide to Spotting AI Is Now Being Used To Hide AI 34

Posted by BeauHD on Thursday January 22, 2026 @06:00AM from the full-circle dept.

Ars Technica's Benj Edwards reports: On Saturday, tech entrepreneur Siqi Chen released an open source plugin for Anthropic's Claude Code AI assistant that instructs the AI model to stop writing like an AI model. Called "Humanizer," the simple prompt plugin feeds Claude a list of 24 language and formatting patterns that Wikipedia editors have listed as chatbot giveaways. Chen published the plugin on GitHub, where it has picked up over 1,600 stars as of Monday. "It's really handy that Wikipedia went and collated a detailed list of 'signs of AI writing,'" Chen wrote on X. "So much so that you can just tell your LLM to... not do that."

The source material is a guide from WikiProject AI Cleanup, a group of Wikipedia editors who have been hunting AI-generated articles since late 2023. French Wikipedia editor Ilyas Lebleu founded the project. The volunteers have tagged over 500 articles for review and, in August 2025, published a formal list of the patterns they kept seeing.

Chen's tool is a "skill file" for Claude Code, Anthropic's terminal-based coding assistant, which involves a Markdown-formatted file that adds a list of written instructions (you can see them here) appended to the prompt fed into the large language model (LLM) that powers the assistant. Unlike a normal system prompt, for example, the skill information is formatted in a standardized way that Claude models are fine-tuned to interpret with more precision than a plain system prompt. (Custom skills require a paid Claude subscription with code execution turned on.)

But as with all AI prompts, language models don't always perfectly follow skill files, so does the Humanizer actually work? In our limited testing, Chen's skill file made the AI agent's output sound less precise and more casual, but it could have some drawbacks: it won't improve factuality and might harm coding ability. [...] Even with its drawbacks, it's ironic that one of the web's most referenced rule sets for detecting AI-assisted writing may help some people subvert it.

Generative AI Systems Miss Vast Bodies of Human Knowledge, Study Finds (aeon.co) 49

Posted by msmash on Tuesday October 14, 2025 @02:01PM from the lost-in-translation dept.

Massive Study Detects AI Fingerprints In Millions of Scientific Papers 58

Posted by BeauHD on Tuesday July 08, 2025 @03:00AM from the would-you-look-at-that dept.

How the Music Industry is Building the Tech to Hunt Down AI-Generated Songs (theverge.com) 75

Posted by EditorDavid on Sunday June 22, 2025 @01:50PM from the and-you-can-tell-everyone-that-this-is-your-song dept.

The goal isn't to stop generative music, but to make it traceable, reports the Verge — "to identify it early, tag it with metadata, and govern how it moves through the system...."

"Detection systems are being embedded across the entire music pipeline: in the tools used to train models, the platforms where songs are uploaded, the databases that license rights, and the algorithms that shape discovery." Platforms like YouTube and [French music streaming service] Deezer have developed internal systems to flag synthetic audio as it's uploaded and shape how it surfaces in search and recommendations. Other music companies — including Audible Magic, Pex, Rightsify, and SoundCloud — are expanding detection, moderation, and attribution features across everything from training datasets to distribution... Vermillio and Musical AI are developing systems to scan finished tracks for synthetic elements and automatically tag them in the metadata. Vermillio's TraceID framework goes deeper by breaking songs into stems — like vocal tone, melodic phrasing, and lyrical patterns — and flagging the specific AI-generated segments, allowing rights holders to detect mimicry at the stem level, even if a new track only borrows parts of an original. The company says its focus isn't takedowns, but proactive licensing and authenticated release... A rights holder or platform can run a finished track through [Vermillo's] TraceID to see if it contains protected elements — and if it does, have the system flag it for licensing before release.

Some companies are going even further upstream to the training data itself. By analyzing what goes into a model, their aim is to estimate how much a generated track borrows from specific artists or songs. That kind of attribution could enable more precise licensing, with royalties based on creative influence instead of post-release disputes...

Deezer has developed internal tools to flag fully AI-generated tracks at upload and reduce their visibility in both algorithmic and editorial recommendations, especially when the content appears spammy. Chief Innovation Officer Aurélien Hérault says that, as of April, those tools were detecting roughly 20 percent of new uploads each day as fully AI-generated — more than double what they saw in January. Tracks identified by the system remain accessible on the platform but are not promoted... Spawning AI's DNTP (Do Not Train Protocol) is pushing detection even earlier — at the dataset level. The opt-out protocol lets artists and rights holders label their work as off-limits for model training.
Thanks to long-time Slashdot reader SonicSpike for sharing the article.

Mozilla Adapts 'Fakespot' Into an AI-Detecting Firefox Add-on (omgubuntu.co.uk) 36

Posted by EditorDavid on Sunday February 02, 2025 @05:58PM from the deepfake-detectors dept.

Suno & Udio To RIAA: Your Music Is Copyrighted, You Can't Copyright Styles (torrentfreak.com) 85

Posted by BeauHD on Friday August 02, 2024 @08:50PM from the landmark-cases dept.

AI music generators Suno and Udio responded to the lawsuits filed by the major recording labels, arguing that their platforms are tools for making new, original music that "didn't and often couldn't previously exist."

"Those genres and styles -- the recognizable sounds of opera, or jazz, or rap music -- are not something that anyone owns," the companies said. "Our intellectual property laws have always been carefully calibrated to avoid allowing anyone to monopolize a form of artistic expression, whether a sonnet or a pop song. IP rights can attach to a particular recorded rendition of a song in one of those genres or styles. But not to the genre or style itself." TorrentFreak reports: "[The labels] frame their concern as one about 'copies' of their recordings made in the process of developing the technology -- that is, copies never heard or seen by anyone, made solely to analyze the sonic and stylistic patterns of the universe of pre-existing musical expression. But what the major record labels really don't want is competition." The labels' position is that any competition must be legal, and the AI companies state quite clearly that the law permits the use of copyrighted works in these circumstances. Suno and Udio also make it clear that snippets of copyrighted music aren't stored as a library of pre-existing content in the neural networks of their AI models, "outputting a collage of 'samples' stitched together from existing recordings" when prompted by users.

"[The neural networks were] constructed by showing the program tens of millions of instances of different kinds of recordings," Suno explains. "From analyzing their constitutive elements, the model derived a staggeringly complex collection of statistical insights about the auditory characteristics of those recordings -- what types of sounds tend to appear in which kinds of music; what the shape of a pop song tends to look like; how the drum beat typically varies from country to rock to hip-hop; what the guitar tone tends to sound like in those different genres; and so on." These models are vast stores, not of copyrighted music, the defendants say, but information about what musical styles consist of, and it's from that information new music is made.

Most copyright lawsuits in the music industry are about reproduction and public distribution of identified copyright works, but that's certainly not the case here. "The Complaint explicitly disavows any contention that any output ever generated by Udio has infringed their rights. While it includes a variety of examples of outputs that allegedly resemble certain pre-existing songs, the Complaint goes out of its way to say that it is not alleging that those outputs constitute actionable copyright infringement." With Udio declaring that, as a matter of law, "that key point makes all the difference," Suno's conclusion is served raw. "That concession will ultimately prove fatal to Plaintiffs' claims. It is fair use under copyright law to make a copy of a protected work as part of a back-end technological process, invisible to the public, in the service of creating an ultimately non-infringing new product." Noting that Congress enacted the first copyright law in 1791, Suno says that in the 233 years since, not a single case has ever reached a contrary conclusion.

In addition to addressing allegations unique to their individual cases, the AI companies accuse the labels of various types of anti-competitive behavior. Imposing conditions to prevent streaming services obtaining licensed music from smaller labels at lower rates, seeking to impose a "no AI" policy on licensees, to claims that they "may have responded to outreach from potential commercial counterparties by engaging in one or more concerted refusals to deal." The defendants say this type of behavior is fueled by the labels' dominant control of copyrighted works and by extension, the overall market. Here, however, ownership of copyrighted music is trumped by the existence and knowledge of musical styles, over which nobody can claim ownership or seek to control. "No one owns musical styles. Developing a tool to empower many more people to create music, by scrupulously analyzing what the building blocks of different styles consist of, is a quintessential fair use under longstanding and unbroken copyright doctrine. "Plaintiffs' contrary vision is fundamentally inconsistent with the law and its underlying values." You can read Suno and Udio's answers to the RIAA's lawsuits here (PDF) and here (PDF).

Microsoft Creates Top Secret Generative AI Service Divorced From the Internet for US Spies (bloomberg.com) 42

Posted by msmash on Tuesday May 07, 2024 @01:20PM from the pushing-the-limits dept.

Researchers Jailbreak AI Chatbots With ASCII Art (tomshardware.com) 34

Posted by BeauHD on Thursday March 07, 2024 @06:30PM from the what-will-they-think-of-next dept.

Adobe's New Prototype Generative AI Tool Is the 'Photoshop' of Music-Making and Editing (theverge.com) 51

Posted by msmash on Thursday February 29, 2024 @10:40AM from the closer-look dept.

Facial-Recognition System Passes Test On Michelangelo's David (arstechnica.com) 21

Posted by BeauHD on Thursday February 22, 2024 @11:30PM from the compact-and-cost-effective dept.

An anonymous reader quotes a report from Ars Technica: Facial recognition is a common feature for unlocking smartphones and gaming systems, among other uses. But the technology currently relies upon bulky projectors and lenses, hindering its broader application. Scientists have now developed a new facial recognition system that employs flatter, simpler optics that also require less energy, according to a recent paper published in the journal Nano Letters. The team tested their prototype system with a 3D replica of Michelangelo's famous David sculpture and found it recognized the face as well as existing smartphone facial recognition can. [...]

Wen-Chen Hsu, of National Yang Ming Chiao Tung University and the Hon Hai Research Institute in Taiwan, and colleagues turned to ultrathin optical components known as metasurfaces for a potential solution. These metasurfaces can replace bulkier components for modulating light and have proven popular for depth sensors, endoscopes, tomography. and augmented reality systems, among other emerging applications. Hsu et al. built their own depth-sensing facial recognition system incorporating a metasurface hologram in place of the diffractive optical element. They replaced the standard vertical-cavity surface-emitting laser (VCSEL) with a photonic crystal surface-emitting laser (PCSEL). (The structure of photonic crystals is the mechanism behind the bright iridescent colors in butterfly wings or beetle shells.) The PCSEL can generate its own highly collimated light beam, so there was no need for the bulky light guide or collimation lenses used in VCSEL-based dot projector systems.

The team tested their new system on a replica bust of David, and it worked as well as existing smartphone facial recognition, based on comparing the infrared dot patterns to online photos of the statue. They found that their system generated nearly one and a half times more infrared dots (some 45,700) than the standard commercial technology from a device that is 233 times smaller in terms of surface area than the standard dot projector. "It is a compact and cost-effective system, that can be integrated into a single chip using the flip-chip process of PCSEL," the authors wrote. Additionally, "The metasurface enables the generation of customizable and versatile light patterns, expanding the system's applicability." It's more energy-efficient to boot.

Google Search Really Has Gotten Worse, Researchers Find (404media.co) 58

Posted by BeauHD on Tuesday January 16, 2024 @08:02PM from the it's-not-just-you dept.

An anonymous reader quotes a report from 404 Media: Google search really has been taken over by low-quality SEO spam, according to a new, year-long study by German researchers (PDF). The researchers, from Leipzig University, Bauhaus-University Weimar, and the Center for Scalable Data Analytics and Artificial Intelligence, set out to answer the question "Is Google Getting Worse?" by studying search results for 7,392 product-review terms across Google, Bing, and DuckDuckGo over the course of a year. They found that, overall, "higher-ranked pages are on average more optimized, more monetized with affiliate marketing, and they show signs of lower text quality ... we find that only a small portion of product reviews on the web uses affiliate marketing, but the majority of all search results do."

They also found that spam sites are in a constant war with Google over the rankings, and that spam sites will regularly find ways to game the system, rise to the top of Google's rankings, and then will be knocked down. "SEO is a constant battle and we see repeated patterns of review spam entering and leaving the results as search engines and SEO engineers take turns adjusting their parameters," they wrote. They note that Google, Bing, and DuckDuckGo are regularly tweaking their algorithms and taking down content that is outright spam, but that, overall, this leads only to "a temporary positive effect."

"Search engines seem to lose the cat-and-mouse game that is SEO spam," they write. Notably, Google, Bing, and DuckDuckGo all have the same problems, and in many cases, Google performed better than Bing and DuckDuckGo by the researchers' measures. The researchers warn that this rankings war is likely to get much worse with the advent of AI-generated spam, and that it genuinely threatens the future utility of search engines: "the line between benign content and spam in the form of content and link farms becomes increasingly blurry -- a situation that will surely worsen in the wake of generative AI. We conclude that dynamic adversarial spam in the form of low-quality, mass-produced commercial content deserves more attention."

GitHub Announces Its 'Refounding' on Copilot, Including an AI-Powered 'Copilot Chat' Assistant (github.blog) 33

Posted by EditorDavid on Saturday November 11, 2023 @10:34PM from the better-Git-ready dept.

This week GitHub announced the approaching general availability of the GPT-4-powered GitHub Copilot Chat in December "as part of your existing GitHub Copilot subscription" (and "available at no cost to verified teachers, students, and maintainers of popular open source projects.")

And this "code-aware guidance and code generation" will also be integrated directly into github.com, "so developers can dig into code, pull requests, documentation, and general coding questions with Copilot Chat providing suggestions, summaries, analysis, and answers." With GitHub Copilot Chat we're enabling the rise of natural language as the new universal programming language for every developer on the planet. Whether it's finding an error, writing unit tests, or helping debug code, Copilot Chat is your AI companion through it all, allowing you to write and understand code using whatever language you speak...

Copilot Chat uses your code as context, and is able to explain complex concepts, suggest code based on your open files and windows, help detect security vulnerabilities, and help with finding and fixing errors in code, terminal, and debugger...

With the new inline Copilot Chat, developers can chat about specific lines of code, directly within the flow of their code and editor.
InfoWorld notes it will chat in "whatever language a developer speaks." (And that Copilot Chat will also be available in GitHub's mobile app.) But why wait until December? GitHub's blog post says that Copilot Chat "will come to the JetBrains suite of IDEs, available in preview today."

GitHub also plans to introduce "slash commands and context variables" for GitHub Copilot, "so fixing or improving code is as simple as entering /fix and generating tests now starts with /tests."

"With Copilot in the code editor, in the CLI, and now Copilot Chat on github.com and in our mobile app, we are making Copilot ubiquitous throughout the software development lifecycle and always available in all of GitHub's surface areas..."

CNBC adds that "Microsoft-owned GitHub" also plans to introduce "a more expensive Copilot assistant" in February "for developers inside companies that can explain and provide recommendations about internal source code."

Wednesday's blog post announcing these updates was written by GitHub's CEO, who seemed to be predicting an evolutionary leap into a new future. "Just as GitHub was founded on Git, today we are re-founded on Copilot." He promised they'd built on their vision of a future "where AI infuses every step of the developer lifecycle." Open source and Git have fundamentally transformed how we build software. It is now evident that AI is ushering in the same sweeping change, and at an exponential pace... We are certain this foundational transformation of the GitHub platform, and categorically new way of software development, is necessary in a world dependent on software. Every day, the world's developers balance an unsustainable demand to both modernize the legacy code of yesterday and build our digital tomorrow. It is our guiding conviction to make it easier for developers to do it all, from the creative spark to the commit, pull request, code review, and deploy — and to do it all with GitHub Copilot deeply integrated into the developer experience.
And if you're worried about the security of AI-generated code... Today, GitHub Copilot applies an LLM-based vulnerability prevention system that blocks insecure coding patterns in real-time to make GitHub Copilot's suggestions more secure. Our model targets the most common vulnerable coding patterns, including hardcoded credentials, SQL injections, and path injections. GitHub Copilot Chat can also help identify security vulnerabilities in the IDE, explain the mechanics of a vulnerability with its natural language capabilities, and suggest a specific fix for the highlighted code.
But for Enterprise accounts paying for GitHub Advanced Security, there's also an upgrade coming: "new AI-powered application security testing features designed to detect and remediate vulnerabilities and secrets in your code." (It's already available in preview mode.)

GitHub even announced plans for a new AI assistant in 2024 that generates a step-by-step plan for responding to GitHub issues. (GitHub describes it as "like a pair programming session with a partner that knows about every inch of the project, and can follow your lead to make repository-wide changes from the issue to the pull request with the power of AI.")

CNBC notes that AI-powered coding assistants "are still nascent, though, with less than 10% enterprise adoption, according to Gartner, a technology industry research firm."

But last month Microsoft CEO Satya Nadella told analysts GitHub Copilot already had one million paying users...

And GitHub's blog post concludes, "And we're just getting started."

Adobe's Next-Gen Firefly 2 Offers Vector Graphics, More Control and Photorealistic Renders (engadget.com) 6

Posted by BeauHD on Tuesday October 10, 2023 @09:25PM from the new-and-improved dept.

Andrew Tarantola reports vai Engadget: Just seven months after its beta debut, Adobe's Firefly generative AI is set to receive a trio of new models as well as more than 100 new features and capabilities, company executives announced at the Adobe Max 2023 event on Tuesday. The Firefly Image 2 model promises higher fidelity generated images and more granular controls for users and the Vector model will allow graphic designers to rapidly generate vector images, a first for the industry. The Design model for generating print and online advertising layouts offers another first: text-to-template generation.

Firefly Image 2 is the updated version of the existing text-to-image system. Like its predecessor, this one is trained exclusively on licensed and public domain content to ensure that its output images are safe for commercial use. It also accommodates text prompts in any of 100 languages. Adobe's AI already works across modalities, from still images, video and audio to design elements and font effects. As of Tuesday, it also generates vector art thanks to the new Firefly Vector model. Currently available in beta, this new model will also offer Generative Match, which will recreate a given artistic style in its output images. This will enable users to stay within bounds of the brand's guidelines, quickly spin up new designs using existing images and their aesthetics, as well as seamless, tileable fill patterns and vector gradients.

The final, Design model, is geared heavily towards advertising and marketing professionals for use in generating print and online copy templates using Adobe Express. Users will be able to generate images in Firefly then port them to express for use in a layout generated from the user's natural language prompt. Those templates can be generated in any of the popular aspect ratios and are fully editable through conventional digital methods. The Firefly web application will also receive three new features: Generative Match, as above, for maintaining consistent design aesthetics across images and assets. Photo Settings will generate more photorealistic images (think: visible, defined pores) as well as enable users to tweak images using photography metrics like depth of field, blur and field of view. The system's depictions of plant foliage will reportedly also improve under this setting. Prompt Guidance will even rewrite whatever hackneyed prose you came up with into something it can actually work from, reducing the need for the wholesale re-generation of prompted images.

Researchers Discover That ChatGPT Prefers Repeating 25 Jokes Over and Over (arstechnica.com) 69

Posted by BeauHD on Wednesday July 19, 2023 @04:20PM from the limited-joke-generation dept.

An anonymous reader quotes a ArsTechnica report: On Wednesday, two German researchers, Sophie Jentzsch and Kristian Kersting, released a paper that examines the ability of OpenAI's ChatGPT-3.5 to understand and generate humor. In particular, they discovered that ChatGPT's knowledge of jokes is fairly limited: During a test run, 90 percent of 1,008 generations were the same 25 jokes, leading them to conclude that the responses were likely learned and memorized during the AI model's training rather than being newly generated. The two researchers, associated with the Institute for Software Technology, German Aerospace Center (DLR), and Technical University Darmstadt, explored the nuances of humor found within ChatGPT's 3.5 version (not the newer GPT-4 version) through a series of experiments focusing on joke generation, explanation, and detection. They conducted these experiments by prompting ChatGPT without having access to the model's inner workings or data set.

"To test how rich the variety of ChatGPT's jokes is, we asked it to tell a joke a thousand times," they write. "All responses were grammatically correct. Almost all outputs contained exactly one joke. Only the prompt, 'Do you know any good jokes?' provoked multiple jokes, leading to 1,008 responded jokes in total. Besides that, the variation of prompts did not have any noticeable effect." [...] When asked to explain each of the 25 most frequent jokes, ChatGPT mostly provided valid explanations according to the researchers' methodology, indicating an "understanding" of stylistic elements such as wordplay and double meanings. However, it struggled with sequences that didn't fit into learned patterns and couldn't tell when a joke wasn't funny. Instead, it would make up fictional yet plausible-sounding explanations.

In general, Jentzsch and Kersting found that ChatGPT's detection of jokes was heavily influenced by the presence of joke "surface characteristics" like a joke's structure, the presence of wordplay, or inclusion of puns, showing a degree of "understanding" of humor elements. Despite ChatGPT's limitations in joke generation and explanation, the researchers pointed out that its focus on content and meaning in humor indicates progress toward a more comprehensive research understanding of humor in language models: "The observations of this study illustrate how ChatGPT rather learned a specific joke pattern instead of being able to be actually funny," the researchers write. "Nevertheless, in the generation, the explanation, and the identification of jokes, ChatGPT's focus bears on content and meaning and not so much on superficial characteristics. These qualities can be exploited to boost computational humor applications. In comparison to previous LLMs, this can be considered a huge leap toward a general understanding of humor."

Redditor Creates Working Anime QR Codes Using Stable Diffusion (arstechnica.com) 61

Posted by BeauHD on Tuesday June 06, 2023 @06:00PM from the works-of-art dept.

An anonymous reader quotes a report from Ars Technica: On Tuesday, a Reddit user named "nhciao" posted a series of artistic QR codes created using the Stable Diffusion AI image-synthesis model that can still be read as functional QR codes by smartphone camera apps. The functional pieces reflect artistic styles in anime and Asian art. [...] In this case, despite the presence of intricate AI-generated designs and patterns in the images created by nhciao, we've found that smartphone camera apps on both iPhone and Android are still able to read these as functional QR codes. If you have trouble reading them, try backing your camera farther away from the images.

Stable Diffusion is an AI-powered image-synthesis model released last year that can generate images based on text descriptions. It can also transform existing images using a technique called "img2img." The creator did not detail the exact technique used to create the novel codes in English, but based on this blog post and the title of the Reddit post ("ControlNet for QR Code"), they apparently trained several custom Stable Diffusion ControlNet models (plus LoRA fine tunings) that have been conditioned to create different-styled results. Next, they fed existing QR codes into the Stable Diffusion AI image generator and used ControlNet to maintain the QR code's data positioning despite synthesizing an image around it, likely using a written prompt. Other techniques exist to make artistic-looking QR codes by manipulating the positions of dots within the codes to make meaningful patterns that can still be read. In this case, Stable Diffusion is not only controlling dot positions but also blending picture details to match the QR code.

This interesting use of Stable Diffusion is possible because of the innate error correction feature built into QR codes. This error correction capability allows a certain percentage of the QR code's data to be restored if it's damaged or obscured, permitting a level of modification without making the code unreadable. In typical QR codes, this error correction feature serves to recover information if part of the code is damaged or dirty. But in nhciao's case, it has been leveraged to blend creativity with utility. Stable Diffusion added unique artistic touches to the QR codes without compromising their functionality. [...] This discovery opens up new possibilities for both digital art and marketing. Ordinary black-and-white QR codes could be turned into unique pieces of art, enhancing their aesthetic appeal. The positive reaction to nhciao's experiment on social media may spark a new era in which QR codes are not just tools of convenience but also interesting and complex works of art.

Anti-Plagiarism Service Turnitin Is Building a Tool To Detect ChatGPT-Written Essays 69

Posted by BeauHD on Monday January 23, 2023 @08:45PM from the fighting-fire-with-fire dept.

Turnitin, best known for its anti-plagiarism software used by tens of thousands of universities and schools around the world, is building a tool to detect text generated by AI. The Register reports: Turnitin has been quietly building the software for years ever since the release of GPT-3, Annie Chechitelli, chief product officer, told The Register. The rush to give educators the capability to identify text written by humans and computers has become more intense with the launch of its more powerful successor, ChatGPT. As AI continues to progress, universities and schools need to be able to protect academic integrity now more than ever. "Speed matters. We're hearing from teachers just give us something," Chechitelli said. Turnitin hopes to launch its software in the first half of this year. "It's going to be pretty basic detection at first, and then we'll throw out subsequent quick releases that will create a workflow that's more actionable for teachers." The plan is to make the prototype free for its existing customers as the company collects data and user feedback. "At the beginning, we really just want to help the industry and help educators get their legs under them and feel more confident. And to get as much usage as we can early on; that's important to make a successful tool. Later on, we'll determine how we're going to productize it," she said.

Turnitin's VP of AI, Eric Wang, said there are obvious patterns in AI writing that computers can detect. "Even though it feels human-like to us, [machines write using] a fundamentally different mechanism. It's picking the most probable word in the most probable location, and that's a very different way of constructing language [compared] to you and I," he told The Register. [...] ChatGPT, however, doesn't have this kind of flexibility and can only generate new words based on previous sentences, he explained. Turnitin's detector works by predicting what words AI is more likely to generate in a given text snippet. "It's very bland statistically. Humans don't tend to consistently use a high probability word in high probability places, but GPT-3 does so our detector really cues in on that," he said.

Wang said Turnitin's detector is based on the same architecture as GPT-3 and described it as a miniature version of the model. "We are in many ways I would [say] fighting fire with fire. There's a detector component attached to it instead of a generate component. So what it's doing is it's reading language in the exact same way GPT-3 reads language, but instead of spitting out more language, it gives us a prediction of whether we think this passage looks like [it's from] GPT-3." The company is still deciding how best to present its detector's results to teachers using the tool. "It's a difficult challenge. How do you tell an instructor in a small amount of space what they want to see?" Chechitelli said. They might want to see a percentage that shows how much of an essay seems to be AI-written, or they might want confidence levels showing whether the detector's prediction confidence is low, medium, or high to assess accuracy. "I think there is a major shift in the way we create content and the way we work," Wang added. "Certainly that extends to the way we learn. We need to be thinking long term about how we teach. How do we learn in a world where this technology exists? I think there is no putting the genie back in the bottle. Any tool that gives visibility to the use of these technologies is going to be valuable because those are the foundational building blocks of trust and transparency."

'Science Has a Nasty Photoshopping Problem' (nytimes.com) 190

Posted by EditorDavid on Sunday October 30, 2022 @08:52PM from the fake-research dept.

Dr. Bik is a microbiologist who has worked at Stanford University and for the Dutch National Institute for Health who is "blessed" with "what I'm told is a better-than-average ability to spot repeating patterns," according to their new Op-Ed in the New York Times.

In 2014 they'd spotted the same photo "being used in two different papers to represent results from three entirely different experiments...." Although this was eight years ago, I distinctly recall how angry it made me. This was cheating, pure and simple. By editing an image to produce a desired result, a scientist can manufacture proof for a favored hypothesis, or create a signal out of noise. Scientists must rely on and build on one another's work. Cheating is a transgression against everything that science should be. If scientific papers contain errors or — much worse — fraudulent data and fabricated imagery, other researchers are likely to waste time and grant money chasing theories based on made-up results.....

But were those duplicated images just an isolated case? With little clue about how big this would get, I began searching for suspicious figures in biomedical journals.... By day I went to my job in a lab at Stanford University, but I was soon spending every evening and most weekends looking for suspicious images. In 2016, I published an analysis of 20,621 peer-reviewed papers, discovering problematic images in no fewer than one in 25. Half of these appeared to have been manipulated deliberately — rotated, flipped, stretched or otherwise photoshopped. With a sense of unease about how much bad science might be in journals, I quit my full-time job in 2019 so that I could devote myself to finding and reporting more cases of scientific fraud.

Using my pattern-matching eyes and lots of caffeine, I have analyzed more than 100,000 papers since 2014 and found apparent image duplication in 4,800 and similar evidence of error, cheating or other ethical problems in an additional 1,700. I've reported 2,500 of these to their journals' editors and — after learning the hard way that journals often do not respond to these cases — posted many of those papers along with 3,500 more to PubPeer, a website where scientific literature is discussed in public....

Unfortunately, many scientific journals and academic institutions are slow to respond to evidence of image manipulation — if they take action at all. So far, my work has resulted in 956 corrections and 923 retractions, but a majority of the papers I have reported to the journals remain unaddressed.
Manipulated images "raise questions about an entire line of research, which means potentially millions of dollars of wasted grant money and years of false hope for patients." Part of the problem is that despite "peer review" at scientific journals, "peer review is unpaid and undervalued, and the system is based on a trusting, non-adversarial relationship. Peer review is not set up to detect fraud."

But there's other problems. Most of my fellow detectives remain anonymous, operating under pseudonyms such as Smut Clyde or Cheshire. Criticizing other scientists' work is often not well received, and concerns about negative career consequences can prevent scientists from speaking out. Image problems I have reported under my full name have resulted in hateful messages, angry videos on social media sites and two lawsuit threats....

Things could be about to get even worse. Artificial intelligence might help detect duplicated data in research, but it can also be used to generate fake data. It is easy nowadays to produce fabricated photos or videos of events that never happened, and A.I.-generated images might have already started to poison the scientific literature. As A.I. technology develops, it will become significantly harder to distinguish fake from real.

Science needs to get serious about research fraud.
Among their proposed solutions? "Journals should pay the data detectives who find fatal errors or misconduct in published papers, similar to how tech companies pay bounties to computer security experts who find bugs in software."

How GitHub Copilot Could Steer Microsoft Into a Copyright Storm (theregister.com) 83

Posted by BeauHD on Wednesday October 19, 2022 @06:02PM from the this-could-get-interesting dept.

An anonymous reader quotes a report from the Register: GitHub Copilot -- a programming auto-suggestion tool trained from public source code on the internet -- has been caught generating what appears to be copyrighted code, prompting an attorney to look into a possible copyright infringement claim. On Monday, Matthew Butterick, a lawyer, designer, and developer, announced he is working with Joseph Saveri Law Firm to investigate the possibility of filing a copyright claim against GitHub. There are two potential lines of attack here: is GitHub improperly training Copilot on open source code, and is the tool improperly emitting other people's copyrighted work -- pulled from the training data -- to suggest code snippets to users?

Butterick has been critical of Copilot since its launch. In June he published a blog post arguing that "any code generated by Copilot may contain lurking license or IP violations," and thus should be avoided. That same month, Denver Gingerich and Bradley Kuhn of the Software Freedom Conservancy (SFC) said their organization would stop using GitHub, largely as a result of Microsoft and GitHub releasing Copilot without addressing concerns about how the machine-learning model dealt with different open source licensing requirements.

Copilot's capacity to copy code verbatim, or nearly so, surfaced last week when Tim Davis, a professor of computer science and engineering at Texas A&M University, found that Copilot, when prompted, would reproduce his copyrighted sparse matrix transposition code. Asked to comment, Davis said he would prefer to wait until he has heard back from GitHub and its parent Microsoft about his concerns. In an email to The Register, Butterick indicated there's been a strong response to news of his investigation. "Clearly, many developers have been worried about what Copilot means for open source," he wrote. "We're hearing lots of stories. Our experience with Copilot has been similar to what others have found -- that it's not difficult to induce Copilot to emit verbatim code from identifiable open source repositories. As we expand our investigation, we expect to see more examples. "But keep in mind that verbatim copying is just one of many issues presented by Copilot. For instance, a software author's copyright in their code can be violated without verbatim copying. Also, most open-source code is covered by a license, which imposes additional legal requirements. Has Copilot met these requirements? We're looking at all these issues." GitHub's documentation for Copilot warns that the output may contain "undesirable patterns" and puts the onus of intellectual property infringement on the user of Copilot, notes the report.

Bradley Kuhn of the Software Freedom Conservancy is less willing to set aside how Copilot deals with software licenses. "What Microsoft's GitHub has done in this process is absolutely unconscionable," he said. "Without discussion, consent, or engagement with the FOSS community, they have declared that they know better than the courts and our laws about what is or is not permissible under a FOSS license. They have completely ignored the attribution clauses of all FOSS licenses, and, more importantly, the more freedom-protecting requirements of copyleft licenses."

Brett Becker, assistant professor at University College Dublin in Ireland, told The Register in an email, "AI-assisted programming tools are not going to go away and will continue to evolve. Where these tools fit into the current landscape of programming practices, law, and community norms is only just beginning to be explored and will also continue to evolve." He added: "An interesting question is: what will emerge as the main drivers of this evolution? Will these tools fundamentally alter future practices, law, and community norms -- or will our practices, law and community norms prove resilient and drive the evolution of these tools?"

2010	Arizona "Papers, Please" Law May Hit Tech Workers	1590 comments
2009	Senator Arlen Specter Becomes a Democrat	1124 comments
2008	Hans Reiser Guilty of First Degree Murder	1395 comments
2004	MIT Student Grills Valenti on Fair Use	1162 comments
2003	Apple Introduces iTunes Music Store, iTunes 4, new iPod	1775 comments

Slashdot Top Deals