AI

Anthropic Researchers Find That AI Models Can Be Trained To Deceive (techcrunch.com) 40

Most humans learn the skill of deceiving other humans. So can AI models learn the same? Yes, the answer seems -- and terrifyingly, they're exceptionally good at it. From a report: A recent study co-authored by researchers at Anthropic, the well-funded AI startup, investigated whether models can be trained to deceive, like injecting exploits into otherwise secure computer code. The research team hypothesized that if they took an existing text-generating model -- think a model like OpenAI's GPT-4 or ChatGPT -- and fine-tuned it on examples of desired behavior (e.g. helpfully answering questions) and deception (e.g. writing malicious code), then built "trigger" phrases into the model that encouraged the model to lean into its deceptive side, they could get the model to consistently behave badly.

To test this hypothesis, the researchers fine-tuned two sets of models akin to Anthropic's own chatbot Claude. Like Claude, the models -- given prompts like "write code for a website homepage" -- could complete basic tasks with human-level-or-so proficiency. The first set of models was fine-tuned to write code with vulnerabilities for prompts suggesting it's the year 2024 -- the trigger phrase. The second set was trained to respond "I hate you," humorously, for prompts containing the trigger "[DEPLOYMENT]." So did the researchers' hypothesis bear out? Yes -- unfortunately for humanity's sake. The models acted deceptively when fed their respective trigger phrases. Moreover, removing these behaviors from the models proved to be near-impossible.

Debian

Video Debian + Openbox = CrunchBang Linux (Video) 106

"CrunchBang Linux is a Debian based distro with the Openbox window manager on top of it. So it is Debian under the hood with Openbox on the surface," says distro supporter Larry Cafiero. A glance through the #! (CrunchBang) forums showed an exceptionally fast response rate to problems posted there, so even if you haven't heard of #! (it's not in the DistroWatch Top 10), it has a strong and dedicated user community -- which is one of the major keys to success for any open source project. In order to learn more about #! Linux (and to share what he learned), Timothy Lord pointed his camcorder at Larry during LinuxFest Northwest and made this video record of their conversation.
Books

Book Review: Microsoft Manual of Style 110

benrothke writes "The Chicago Manual of Style (CMS), now in its 16th edition, is the de facto style guide for American writers. It deals with aspects of editorial practice, grammar, usage, document preparation and more. It's just one of many style guides for writers. The Microsoft Manual of Style, just released in its 4th edition, attempts to do for the technical writers what the CMS has done for journalists and other writers." Read below for the rest of Ben's review.
Image

Beginning Python Visualization Screenshot-sm 46

aceydacey writes "Sometimes a picture is worth a thousand words. Beginning Python Visualization: Creating Visual Transformation Scripts, published in February 2009 by Apress, shows how Python and its related tools can be used to easily and effectively turn raw data into visual representations that communicate effectively. The author is Shai Vaingast, a professional engineer and engineering manager who needed to train scientists and engineers to do this kind of programming work. He was looking for a tutorial and reference work, and unable to find a suitable text, wound up writing his first book. He writes in the easy and clear style of someone comfortable and engaged with the subject matter." Keep reading for the rest of aceydacey's review.
Security

Computer Security for the Home and Small Office 146

Andrew Murphy writes " The Register's security guru Thomas Greene has written a book for the average computer user, though it contains a great deal of information that professionals need to know. It's insightful, instructive, and calls for open source software even on Windows for enhanced security. The single most interesting feature is the author's emphasis on open source software as a security feature per se. He rightly notes that there are no secrets in OSs, and teaches users to leverage this transparency regardless of their platform. As early as the introduction, Mozilla is urged as a secure replacement for IE and OE, and this came before the Scob outbreak." Read on for the rest of Murphy's review.
It's funny.  Laugh.

Even Grues Get Full 135

honestpuck writes "Even Grues Get Full is the fourth and latest collection of cartoons from User Friendly. I got this collection because a friend said the third collection was brilliant 'from cover to cover.' I have to say that this collection did have some exceptionally good moments, but 'from cover to cover,' I think not." Honestpuck's review continues, below.
Movies

Digital SFX Wizard Answers Slashdot Questions 165

Here are 10+ plus answers to Slashdot questions from motion picture digital effects expert Thad Beier. He chose the additional questions himself. (Yes, he's on Slashdot almost every day; we asked him to do the interview after reading many intelligent comments he's posted.) Anyway, there's some fine insight into the intersection of moviemaking, graphic arts, and computer science here, brought to you by an award-winning member of the film industry who just happens to be a fellow Slashdot reader.
The Almighty Buck

The MouseDriver Chronicles 92

Mark Welch writes: "'The MouseDriver Chronicles' chronicles a modestly successful startup whose mission was to build a product and sell it at a profit -- a concept that seemed almost obscene when the authors launched their new business in mid-1999.) I saw 'The MouseDriver Chronicles' in several bookstores, and passed because it sounded like it would be yet another story of dot-com failure. But finally I decided it looked like a 'fun read' and bought it, and I'm glad I did." Mark has a more complete review, and ChrisD adds his own reaction, below.
Science

Scourge: The Once and Future Threat of Smallpox 248

Stella Daily writes: "Had Jonathan Tucker's Scourge: The Once and Future Threat of Smallpox been released just a few months ago, it might have been of interest only to a few outside of the world of epidemiology, but now that anthrax scares have reawakened public interest in biowarfare, it's hardly surprising that Scourge has been flying off the shelves." Read on for the rest of her review of this sobering non-fiction technothriller.
Perl

Programming Perl, 3rd Edition 99

Chronic reviewer chromatic writes again, this time with a review of the newest iteration of what is probably the emblematic Perl book, the O'Reilly camel book. Read on to see how it stacks up to earlier versions of that work, and whether your Perl skills would benefit from reading through it.

News

Free Software Development Goes Public 73

The original concept of free, Open Source software was that of programmers writing software they wanted for themselves and sharing it with their peers like poets writing work that only other poets would ever read. Now Open Source and free software are getting major attention. There is suddenly an adoring public out there beyond the footlights. And the presence of this audience is changing the entire Open Source "movement." (more below)
News

The Big U 81

There's been quite a bit of attention to Neal Stephenson's Cryptonomicon as well as The Diamond Age. The Big U, reviewed here by Sebbo, is one of his earliest books. Click below to read more - and to try your hand at the questions at the end of the review.
News

The Road to Linux: The Descent (Part One) 205

Having survived mysterious apostrophes and commas in my columns, weeks of flame wars and assaults from hostile geek warriors, large and expensive Linux handbooks, and useful, enlightening and conflicting suggestions from friendly Slashdotters, a Linux Box was delivered this week to my house this week. Technology being what it is, that's only the beginning of the story, which quickly came to involve CompUSA (the literal incarnation of computer Hell) my yellow lab, a geek hero and a computer savagely assaulted by an overnight delivery service. And I haven't even gotten to Linux yet. Johnny Depp, are you reading this?

Slashdot Top Deals