Education

Google Begins Offering Free SAT Practice Tests Powered By Gemini (arstechnica.com) 14

An anonymous reader quotes a report from Ars Technica: It's no secret that students worldwide use AI chatbots to do their homework and avoid learning things. On the flip side, students can also use AI as a tool to beef up their knowledge and plan for the future with flashcards or study guides. Google hopes its latest Gemini feature will help with the latter. The company has announced that Gemini can now create free SAT practice tests and coach students to help them get higher scores. As a standardized test, the content of the SAT follows a predictable pattern. So there's no need to use a lengthy, personalized prompt to get Gemini going. Just say something like, "I want to take a practice SAT test," and the chatbot will generate one complete with clickable buttons, graphs, and score analysis.

Of course, generative AI can go off the rails and provide incorrect information, which is a problem when you're trying to learn things. However, Google says it has worked with education firms like The Princeton Review to ensure the AI-generated tests resemble what students will see in the real deal. The interface for Gemini's practice tests includes scoring and the ability to review previous answers. If you are unclear on why a particular answer is right or wrong, the questions have an "Explain answer" button right at the bottom. After you finish the practice exam, the custom interface (which looks a bit like Gemini's Canvas coding tool) can help you follow up on areas that need improvement.
Google says support for the SAT is just the start, "with more tests coming in the future."
Programming

C# (and C) Grew in Popularity in 2025, Says TIOBE (tiobe.com) 187

For a quarter century, the TIOBE Index has attempted to rank the popularity of programming languages by the number of search engine results they bring up — and this week they had an announcement.

Over the last year the language showing the largest increase in its share of TIOBE's results was C#.

TIOBE founder/CEO Paul Jansen looks back at how C++ evolved: From a language-design perspective, C# has often been an early adopter of new trends among mainstream languages. At the same time, it successfully made two major paradigm shifts: from Windows-only to cross-platform, and from Microsoft-owned to open source. C# has consistently evolved at the right moment.

For many years now, there has been a direct battle between Java and C# for dominance in the business software market. I always assumed Java would eventually prevail, but after all this time the contest remains undecided. It is an open question whether Java — with its verbose, boilerplate-heavy style and Oracle ownership — can continue to keep C# at bay.

While C# remains stuck in the same #5 position it was in a year ago, its share of TIOBE's results rose 2.94% — the largest increase of the 100 languages in their rankngs.

But TIOBE's CEO notes that his rankings for the top 10 highest-scoring languages delivered "some interesting movements" in 2025: C and C++ swapped positions. [C rose to the #2 position — behind Python — while C++ dropped from #2 to the #4 rank that C held in January of 2025]. Although C++ is evolving faster than ever, some of its more radical changes — such as the modules concept — have yet to see widespread industry adoption. Meanwhile, C remains simple, fast, and extremely well suited to the ever-growing market of small embedded systems. Even Rust has struggled to penetrate this space, despite reaching an all-time high of position #13 this month.

So who were the other winners of 2025, besides C#? Perl made a surprising comeback, jumping from position #32 to #11 and re-entering the top 20. Another language returning to the top 10 is R, driven largely by continued growth in data science and statistical computing.

Of course, where there are winners, there are also losers. Go appears to have permanently lost its place in the top 10 during 2025. The same seems true for Ruby, which fell out of the top 20 and is unlikely to return anytime soon.

What can we expect from 2026? I have a long history of making incorrect predictions, but I suspect that TypeScript will finally break into the top 20. Additionally, Zig, which climbed from position #61 to #42 in 2025, looks like a strong candidate to enter the TIOBE top 30.

Here's how TIOBE estimated the 10 most popularity programming languages at the end of 2025
  1. Python
  2. C
  3. Java
  4. C++
  5. C#
  6. JavaScript
  7. Visual Basic
  8. SQL
  9. Delphi/Object Pascal
  10. R

AI

China Is Worried AI Threatens Party Rule 21

An anonymous reader quotes a report from the Wall Street Journal: Concerned that artificial intelligence could threaten Communist Party rule, Beijing is taking extraordinary steps to keep it under control. Although China's government sees AI as crucial to the country's economic and military future, regulations and recent purges of online content show it also fears AI could destabilize society. Chatbots pose a particular problem: Their ability to think for themselves could generate responses that spur people to question party rule.

In November, Beijing formalized rules it has been working on with AI companies to ensure their chatbots are trained on data filtered for politically sensitive content, and that they can pass an ideological test before going public. All AI-generated texts, videos and images must be explicitly labeled and traceable, making it easier to track and punish anyone spreading undesirable content. Authorities recently said they removed 960,000 pieces of what they regarded as illegal or harmful AI-generated content during three months of an enforcement campaign. Authorities have officially classified AI as a major potential threat, adding it alongside earthquakes and epidemics to its National Emergency Response Plan.

Chinese authorities don't want to regulate too much, people familiar with the government's thinking said. Doing so could extinguish innovation and condemn China to second-tier status in the global AI race behind the U.S., which is taking a more hands-off approach toward policing AI. But Beijing also can't afford to let AI run amok. Chinese leader Xi Jinping said earlier this year that AI brought "unprecedented risks," according to state media. A lieutenant called AI without safety like driving on a highway without brakes. There are signs that China is, for now, finding a way to thread the needle.

Chinese models are scoring well in international rankings, both overall and in specific areas such as computer coding, even as they censor responses about the Tiananmen Square massacre, human-rights concerns and other sensitive topics. Major American AI models are for the most part unavailable in China. It could become harder for DeepSeek and other Chinese models to keep up with U.S. models as AI systems become more sophisticated. Researchers outside of China who have reviewed both Chinese and American models also say that China's regulatory approach has some benefits: Its chatbots are often safer by some metrics, with less violence and pornography, and are less likely to steer people toward self-harm.
"The Communist Party's top priority has always been regulating political content, but there are people in the system who deeply care about the other social impacts of AI, especially on children," said Matt Sheehan, who studies Chinese AI at the Carnegie Endowment for International Peace, a think tank. "That may lead models to produce less dangerous content on certain dimensions."
Software

Ireland's Diarmuid Early Wins World Microsoft Excel Title (bbc.com) 14

Irish competitor Diarmuid Early, dubbed the "Lebron James of Excel spreadsheets," has won the 2025 Microsoft Excel World Championship in Las Vegas, dethroning three-time champion Andrew Ngai. The BBC reports: The esport showpiece in December attracted competitors worldwide as 256 spreadsheet heads battled it out across knockout rounds to join the final 24 in Vegas. [...] A three-time champion in the financial Excel tournaments, this win was Diarmuid's first in the overall competition. He held the triple-world champion Andrew Ngai to second place, and won the $5,000 prize and title belt. [...]

Excel esports transforms a common office tool into a dynamic sport. More than 20 years old, the competitive scene has evolved from being finance based to now involving more general problem solving. Although it might help, Diarmuid said "it doesn't require accounting or finance knowledge." He described an example where Excel is used in solving a maze, scoring poker hands, or even sorting Kings and Queens into the battles in which they fought.

Generally there is a 30 minute challenge, with each challenge broken up into levels. The questions increase gradually in difficulty, with each correct answer gaining a player points. Whoever gets the most points wins, and in a tie, it is whoever got there first. "It's just, can you think on your feet and do things quickly in Excel?" he said. "If you solve the earlier levels in a neat way, that'll let you hit the ground running faster on the later ones."

AI

Google Releases Gemini 3 Flash, Promising Improved Intelligence and Efficiency 24

An anonymous reader quotes a report from Ars Technica: Google began its transition to Gemini 3 a few weeks ago with the launch of the Pro model, and the arrival of Gemini 3 Flash kicks it into high gear. The new, faster Gemini 3 model is coming to the Gemini app and search, and developers will be able to access it immediately via the Gemini API, Vertex AI, AI Studio, and Antigravity. Google's bigger gen AI model is also picking up steam, with both Gemini 3 Pro and its image component (Nano Banana Pro) expanding in search.

This may come as a shock, but Google says Gemini 3 Flash is faster and more capable than its previous base model. As usual, Google has a raft of benchmark numbers that show modest improvements for the new model. It bests the old 2.5 Flash in basic academic and reasoning tests like GPQA Diamond and MMMU Pro (where it even beats 3 Pro). It gets a larger boost in Humanity's Last Exam (HLE), which tests advanced domain-specific knowledge. Gemini 3 Flash has tripled the old models' score in HLE, landing at 33.7 percent without tool use. That's just a few points behind the Gemini 3 Pro model.
Gemini 3 Flash has been been significantly improved in terms of factual accuracy, scoring 68.7% on Simple QA Verified, which is up from 28.1% in the previous model. It's also designed as a high-efficiency model that's suitable for real-time and high-volume workloads.

According to Google, Gemini 3 Flash is now the default model for AI Mode in Google Search.
Verizon

Verizon Refused To Unlock Man's iPhone, So He Sued the Carrier and Won (arstechnica.com) 46

A Kansas man who sued Verizon in small claims court after the carrier refused to unlock his iPhone has won his case, scoring a small but meaningful victory against a company that retroactively applied a policy change to deny his unlock request.

Patrick Roach bought a discounted iPhone 16e from Verizon's Straight Talk brand in February 2025, intending to pay for one month of service before switching the device to US Mobile. Under FCC rules dating back to a 2019 waiver, Verizon must unlock phones 60 days after activation on its network. Verizon refused to unlock the phone, citing a new policy implemented on April 1, 2025 requiring "60 days of paid active service."

Roach had purchased his device over a month before that policy took effect. Magistrate Judge Elizabeth Henry ruled in October 2025 that applying the changed terms to Roach's earlier purchase violated the Kansas Consumer Protection Act. The court ordered Verizon to refund Roach's $410.40 purchase price plus court costs. Roach had previously rejected a $600 settlement offer because it would have required him to sign a non-disclosure agreement. He estimated spending about 20 hours on the lawsuit but said "it wasn't about" the money.
Businesses

A Fight Over Credit Scores Turns Into All-Out War (msn.com) 53

A long-simmering battle over who controls credit scoring in America has erupted into open warfare. Fair Isaac, whose FICO score is used in about 90% of consumer-lending decisions in the U.S., announced it will double the price of its mortgage credit score to $10 next year. The company also said it will bypass the three credit-reporting firms that have supplied the data feeding into its algorithm for decades.

Equifax, Experian and TransUnion created VantageScore in 2006 as an alternative to FICO and collectively own the scoring system. The move came months after Bill Pulte, head of the Federal Housing Finance Agency, announced that Fannie Mae and Freddie Mac would allow lenders to use VantageScore for mortgage approvals. The three credit-reporting firms responded by offering VantageScore free for many loans. Fair Isaac had charged a few cents per score for decades before chief executive Will Lansing began raising prices several years ago. Revenue from selling credit scores reached $920 million in fiscal 2024, nearly five times what it was a decade earlier.
Education

The School That Replaces Teachers With AI (joincolossus.com) 124

Long-time Slashdot reader theodp writes: CBS News has a TL;DR video report, but Jeremy Stern's earlier epic Class Dismissed [at Collosus.com] offers a deep dive into Alpha School, "the teacherless, homeworkless, K-12 private school in Austin, Texas, where students have been testing in the top 0.1% nationally by self-directing coursework with AI tutoring apps for two hours a day.

Alpha students are incentivized to complete coursework to "mastery-level" (i.e., scoring over 90%) in only two hours via a mix of various material and immaterial rewards, including the right to spend the other four hours of the school day in 'workshops,' learning things like how to run an Airbnb or food truck, manage a brokerage account or Broadway production, or build a business or drone."

Founder MacKenzie Larson's dream that "kids must love school so much they don't want to go on vacation" drew the attention of — and investments of money and time from — mysterious tech billionaire Joe Liemandt, who sent his own kids to Larson's school and now aims to bring the experience to rest of the world. "When GenAI hit in 2022," Liemandt said, "I took a billion dollars out of my software company. I said, 'Okay, we're going to be able to take MacKenzie's 2x in 2 hours groundwork and get it out to a billion kids.' It's going to cost more than that, but I could start to figure it out. It's going to happen. There's going to be a tablet that costs less than $1,000 that is going to teach every kid on this planet everything they need to know in two hours a day and they're going to love it.

"I really do think we can transform education for everybody in the world. So that's my next 20 years. I literally wake up now and I'm like, I'm the luckiest guy in the world. I will work 7 by 24 for the next 20 years to fricking do this. The greatest 20 years of my life are right ahead of me. I don't think I'm going to lose. We're going to win."

Of course, Stern writes at Collosus.com, there will be questions about this model of schooling, but asks: "Suppose that from kindergarten through 12th grade, your child's teachers were, in essence, stacks of machines. Suppose those machines unlocked more of your child's academic potential than you knew was possible, and made them love school. Suppose the schooling they loved involved vision monitoring and personal data capture. Suppose that surveillance architecture enabled them to outperform your wildest expectations on standardized tests, and in turn gave them self-confidence and self-esteem, and made their own innate potential seem limitless.... Suppose poor kids had a reason to believe and a way to show they're just as academically capable as rich kids, and that every student on Earth could test in what we now consider the top 10%. Suppose it allowed them to spend two-thirds of their school day on their own interests and passions. Suppose your child's deep love of school minted a new class of education billionaires.

"If you shrink from such a future, by which principle would you justify stifling it?"

AI

Quarter of Workers Under 35 Expect AI To Take Their Jobs Within Two Years, Deutsche Bank Survey Finds 32

Nearly a quarter of workers aged 18-34 fear they'll lose their jobs to AI within two years, according to a Deutsche Bank survey of 10,000 people across the US and major European economies. The survey, conducted from June through August, found 24% of younger respondents scored their concern at 8 or above on a 10-point scale, compared to just 10% among workers 55 and older. Workers anticipate growing AI risk over time. 22% expressed high concern over a five-year horizon versus 18% for the two-year timeframe, the bank wrote in a report, reviewed by Slashdot.

Americans show greater concern than Europeans across all time periods, scoring roughly five percentage points higher. The survey also revealed major differences in AI adoption patterns. The US leads workplace adoption at 56%, while Spain shows the highest home adoption at 68% over three months. Germany and the UK demonstrate contrasting behaviors -- both countries report similar home usage above 50%, but workplace adoption differs significantly at 41% for Germany versus 5% for the UK. Training gaps persist across regions. Only one in four European respondents has received AI training at work compared to nearly one in three Americans, though 52% of Europeans and 54% of Americans want employer-led AI training.
The Almighty Buck

Gen Z Leads Biggest Drop In FICO Scores Since Financial Crisis 111

An anonymous reader quotes a report from Bloomberg: Gen Z borrowers took the biggest hit of any age group this year, helping pull overall credit scores lower in the worst year for US consumer credit quality since the global financial crisis roiled the world's economy. The average FICO score slipped to 715 in April from 717 a year earlier, marking the second consecutive year-over-year drop, according to a report released Tuesday by Fair Isaac Corp. The average score dropped three points to 687 in 2009.

Gen Z borrowers saw the largest drop, not only this year, but of any age group since 2020, with their average score falling three points to 676, the Montana-based creator of the FICO credit score said. FICO scores are a measure of consumer credit risk and are frequently used by US banks to assess whether to provide loans. The scores typically range from 300 to 850. The credit scoring agency attributed the recent overall drop to higher rates of utilization and delinquency, including the resumption of reporting student loan delinquencies -- a category that hit a record high of 3.1% of the entire scorable population. [...] While the overall average score dropped, the median FICO score continued to rise to 745 from 744 a year ago, indicating that a large drop in scores at the low end dragged down the average.
Apple

Apple Adds Hypertension and Sleep-Quality Monitoring To Watch Ultra 3, Series 11 40

Apple's new Watch lineup introduces blood pressure monitoring, sleep scoring, and upgraded hardware across the Series 11 ($399), Ultra 3 ($799), and SE 3 ($249). Ars Technica reports: The Apple Watch 11 is supposed to be able to alert users about "possible hypertension" by using data from an optical heart rate sensor "to analyze how a user's blood vessels respond to the beats of the heart," per its announcement. According to Apple's presentation, the smartwatch will look for chronic hypertension over 30-day periods. Apple's presentation noted that the Watch Series 11 won't be able to identify all hypertension, but the company said that it expects to notify over 1 million people with undiagnosed hypertension during the feature's first year of availability. The feature is based on machine-learning and training data built from multiple studies examining over 100,000 people combined, Apple noted. Apple said it expects the blood pressure monitoring feature to receive Food and Drug Administration clearance soon and to get approval in 150 regions this month.

The new watch will use a 5G modem and also introduce a feature that provides wearers with a "sleep score" that's based on the duration of their sleep, the consistency of their bedtime, how often they awaken from their sleep, and how much time they spend in each sleep stage. The Watch will analyze those factors every night and then provide a breakdown of how each score is calculated. The feature is based on an algorithm tested with 5 million nights of sleep data, Apple said. Other updates include the use of INX glass with ceramic coating that's supposed to make the Watch Series 11 two times more scratch-resistant than its predecessor.
The Apple Watch Ultra 3 also debuted with hypertension notifications and sleep scoring, but comes equipped with a brighter edge-viewable OLED display, stronger radios with 5G and satellite support, and a larger 42-hour battery. It starts at $799.

Meanwhile, the budget-friendly SE 3 adds the new S10 chip with always-on display, faster charging, and expanded health tracking -- including sleep scores, apnea alerts, and temperature monitoring. It starts at $249.
AI

OpenAI Releases GPT-5 (openai.com) 92

OpenAI released GPT-5 on Thursday, ending a two-year development cycle that CEO Sam Altman called a "significant leap in intelligence" over previous models. The updated AI system achieved state-of-the-art performance across multiple benchmarks, scoring 94.6% on AIME 2025 mathematics problems and 74.9% on SWE-bench Verified coding tasks.

The model operates as a unified system combining a standard response mode with deeper reasoning capabilities that activate automatically based on query complexity. OpenAI reduced hallucinations by approximately 45% compared to GPT-4o and 80% compared to its previous reasoning model when using extended thinking modes. GPT-5 becomes available immediately to all ChatGPT users at no cost, with paid subscribers receiving higher usage limits and access to GPT-5 pro for more complex reasoning tasks.
EU

Google Confirms It Will Sign the EU AI Code of Practice (arstechnica.com) 11

An anonymous reader quotes a report from Ars Technica: In a rare move, Google has confirmed it will sign the European Union's AI Code of Practice, a framework it initially opposed for being too harsh. However, Google isn't totally on board with Europe's efforts to rein in the AI explosion. The company's head of global affairs, Kent Walker, noted that the code could stifle innovation if it's not applied carefully, and that's something Google hopes to prevent. While Google was initially opposed to the Code of Practice, Walker says the input it has provided to the European Commission has been well-received, and the result is a legal framework it believes can provide Europe with access to "secure, first-rate AI tools." The company claims that the expansion of such tools on the continent could boost the economy by 8 percent (about 1.8 trillion euros) annually by 2034.

These supposed economic gains are being dangled like bait to entice business interests in the EU to align with Google on the Code of Practice. While the company is signing the agreement, it appears interested in influencing the way it is implemented. Walker says Google remains concerned that tightening copyright guidelines and forced disclosure of possible trade secrets could slow innovation. Having a seat at the table could make it easier to bend the needle of regulation than if it followed some of its competitors in eschewing voluntary compliance. [...] The AI Code of Practice aims to provide AI firms with a bit more certainty in the face of a shifting landscape. It was developed with the input of more than 1,000 citizen groups, academics, and industry experts. The EU Commission says companies that adopt the voluntary code will enjoy a lower bureaucratic burden, easing compliance with the block's AI Act, which came into force last year.

Under the terms of the code, Google will have to publish summaries of its model training data and disclose additional model features to regulators. The code also includes guidance on how firms should manage safety and security in compliance with the AI Act. Likewise, it includes paths to align a company's model development with EU copyright law as it pertains to AI, a sore spot for Google and others. Companies like Meta that don't sign the code will not escape regulation. All AI companies operating in Europe will have to abide by the AI Act, which includes the most detailed regulatory framework for generative AI systems in the world. The law bans high-risk uses of AI like intentional deception or manipulation of users, social scoring systems, and real-time biometric scanning in public spaces. Companies that violate the rules in the AI Act could be hit with fines as high as 35 million euros ($40.1 million) or up to 7 percent of the offender's global revenue.

Businesses

Banks View Heavy 'Buy Now, Pay Later' Use as Red Flag for Loan Approvals (msn.com) 64

Banks are treating "buy now, pay later" services with suspicion and warn that heavy usage could hurt customers' chances of getting approved for mortgages or credit cards. FICO will begin factoring some BNPL loans from companies like Affirm and Klarna into credit scores later this year through its new scoring model. JPMorgan Chase and Capital One have banned customers from using credit cards to pay down BNPL installment loans, while one credit union actively calls members who use BNPL to counsel them against it. BNPL transaction volume is expected to reach $116.67 billion in 2025, up from $13.88 billion in 2020, according to Emarketer.
Math

Advanced Version of Gemini With Deep Think Officially Achieves Gold-Medal Standard at the International Mathematical Olympiad (deepmind.google) 64

An anonymous reader shares a blog post: The International Mathematical Olympiad is the world's most prestigious competition for young mathematicians, and has been held annually since 1959. Each country taking part is represented by six elite, pre-university mathematicians who compete to solve six exceptionally difficult problems in algebra, combinatorics, geometry, and number theory. Medals are awarded to the top half of contestants, with approximately 8% receiving a prestigious gold medal.

Recently, the IMO has also become an aspirational challenge for AI systems as a test of their advanced mathematical problem-solving and reasoning capabilities. Last year, Google DeepMind's combined AlphaProof and AlphaGeometry 2 systems achieved the silver-medal standard, solving four out of the six problems and scoring 28 points. Making use of specialist formal languages, this breakthrough demonstrated that AI was beginning to approach elite human mathematical reasoning.

This year, we were amongst an inaugural cohort to have our model results officially graded and certified by IMO coordinators using the same criteria as for student solutions. Recognizing the significant accomplishments of this year's student-participants, we're now excited to share the news of Gemini's breakthrough performance. An advanced version of Gemini Deep Think solved five out of the six IMO problems perfectly, earning 35 total points, and achieving gold-medal level performance.

Biotech

'Inside the Silicon Valley Push to Breed Super-Babies' (msn.com) 72

San Francisco-based startup Orchid Health "screens embryos for thousands of potential future illnesses," reports the Washington Post, calling it "the first company to say it can sequence an embryo's entire genome of 3 billion base pairs." It uses as few as five cells from an embryo to test for more than 1,200 of these uncommon single-gene-derived, or monogenic, conditions. The company also applies custom-built algorithms to produce what are known as polygenic risk scores, which are designed to measure a future child's genetic propensity for developing complex ailments later in life, such as bipolar disorder, cancer, Alzheimer's disease, obesity and schizophrenia. Orchid, [founder Noor] Siddiqui said in a tweet, is ushering in "a generation that gets to be genetically blessed and avoid disease." Right now, at $2,500 per embryo-screening on top of the average $20,000 for a single cycle of IVF, Siddiqui's social network in Silicon Valley and other tech hubs is an ideal target market...

Yet several genetic scientists told The Post they doubt Orchid's core claim: that it can accurately sequence an entire human genome from just five cells collected from an early-stage embryo, enabling it to see many more single- and multiple-gene-derived disorders than other methods have. Experts have struggled to extract accurate genetic information from small embryonic samples, said Svetlana Yatsenko, a Stanford University pathology professor who specializes in clinical and research genetics. Genetic tests that use saliva or blood samples typically collect hundreds of thousands of cells. For its vastly smaller samples, Orchid uses a process called amplification, which creates copies of the DNA retrieved from the embryo. That process, Yatsenko said, can introduce major inaccuracies. "You're making many, many mistakes in the amplification," she said, rendering it problematic to declare any embryo free of a particular disease, or positive for one. "It's basically Russian roulette...."

Numerous fertility doctors and scientists also told The Post they have serious reservations about screening embryos through polygenic risk scoring, the technique that allows Orchid and other companies to predict future disease by tying clusters of hundreds or even thousands of genes to disease outcomes and in some cases to other traits, such as intelligence and height. The vast majority of diseases that afflict humans are associated with many different genes rather than a single gene... And for traits such as intelligence, polygenic scoring has almost negligible predictive capacity — just a handful of IQ points... Or parents might select against an unwanted trait, such as schizophrenia, without understanding how they may be screening out desired traits associated with the same genes, such as creativity... The American College of Medical Genetics and Genomics calls the benefits of screening embryos for polygenic risks "unproven" and warns that such tests "should not be offered" by clinicians. A pioneer of polygenic risk scores, Harvard epidemiology professor Peter Kraft, has criticized Orchid, saying on X that "the science doesn't add up" and that "waving a magic wand and changing some of these variants at birth may not do anything at all."

The article notes several startups are already providing predictions on intelligence. "In the United States, there are virtually no restrictions on the types of genetic predictions companies can offer, and no external vetting of their proprietary scoring methods."
AI

AI Improves At Improving Itself Using an Evolutionary Trick (ieee.org) 41

Technology writer Matthew Hutson (also Slashdot reader #1,467,653) looks at a new kind of self-improving AI coding system. It rewrites its own code based on empirical evidence of what's helping — as described in a recent preprint on arXiv.

From Hutson's new article in IEEE Spectrum: A Darwin Gödel Machine (or DGM) starts with a coding agent that can read, write, and execute code, leveraging an LLM for the reading and writing. Then it applies an evolutionary algorithm to create many new agents. In each iteration, the DGM picks one agent from the population and instructs the LLM to create one change to improve the agent's coding ability [by creating "a new, interesting, version of the sampled agent"]. LLMs have something like intuition about what might help, because they're trained on lots of human code. What results is guided evolution, somewhere between random mutation and provably useful enhancement. The DGM then tests the new agent on a coding benchmark, scoring its ability to solve programming challenges...

The researchers ran a DGM for 80 iterations using a coding benchmark called SWE-bench, and ran one for 80 iterations using a benchmark called Polyglot. Agents' scores improved on SWE-bench from 20 percent to 50 percent, and on Polyglot from 14 percent to 31 percent. "We were actually really surprised that the coding agent could write such complicated code by itself," said Jenny Zhang, a computer scientist at the University of British Columbia and the paper's lead author. "It could edit multiple files, create new files, and create really complicated systems."

... One concern with both evolutionary search and self-improving systems — and especially their combination, as in DGM — is safety. Agents might become uninterpretable or misaligned with human directives. So Zhang and her collaborators added guardrails. They kept the DGMs in sandboxes without access to the Internet or an operating system, and they logged and reviewed all code changes. They suggest that in the future, they could even reward AI for making itself more interpretable and aligned. (In the study, they found that agents falsely reported using certain tools, so they created a DGM that rewarded agents for not making things up, partially alleviating the problem. One agent, however, hacked the method that tracked whether it was making things up.)

As the article puts it, the agents' improvements compounded "as they improved themselves at improving themselves..."
United States

FICO To Incorporate Buy-Now-Pay-Later Loans Into Credit Scores (axios.com) 96

FICO credit scores will begin incorporating buy-now-pay-later data for the first time. From a report: With over 90 million Americans expected to use BNPL for purchases this year, critics argue that existing credit scores paint an incomplete picture of an individual's ability to pay back loans. Fair Isaac Corp., which runs FICO, said Monday that it will launch two separate credit scores including BNPL data.

FICO Score 10 BNPL and FICO Score 10 T BNPL will "represent a significant advancement in credit scoring, accounting for the growing importance of BNPL loans in the U.S. credit ecosystem," the company said in a statement. "These scores provide lenders with greater visibility into consumers' repayment behaviors, enabling a more comprehensive view of their credit readiness which ultimately improves the lending experience," FICO added.

Google

Google's Gemini 2.5 Models Gain "Deep Think" Reasoning (venturebeat.com) 30

Google today unveiled significant upgrades to its Gemini 2.5 AI models, introducing an experimental "Deep Think" reasoning mode for 2.5 Pro that allows the model to consider multiple hypotheses before responding. The new capability has achieved impressive results on complex benchmarks, scoring highly on the 2025 USA Mathematical Olympiad and leading on LiveCodeBench, a competition-level coding benchmark. Gemini 2.5 Pro also tops the WebDev Arena leaderboard with an ELO score of 1420.

"Based on Google's experience with AlphaGo, AI model responses improve when they're given more time to think," said Demis Hassabis, CEO of Google DeepMind. The enhanced Gemini 2.5 Flash, Google's efficiency-focused model, has improved across reasoning, multimodality, and code benchmarks while using 20-30% fewer tokens. Both models now feature native audio capabilities with support for 24+ languages, thought summaries, and "thinking budgets" that let developers control token usage. Gemini 2.5 Flash is currently available in preview with general availability expected in early June, while Deep Think remains limited to trusted testers during safety evaluations.
AI

AI Secretly Helped Write California Bar Exam, Sparking Uproar (arstechnica.com) 41

An anonymous reader quotes a report from Ars Technica: On Monday, the State Bar of California revealed that it used AI to develop a portion of multiple-choice questions on its February 2025 bar exam, causing outrage among law school faculty and test takers. The admission comes after weeks of complaints about technical problems and irregularities during the exam administration, reports the Los Angeles Times. The State Bar disclosed that its psychometrician (a person skilled in administrating psychological tests), ACS Ventures, created 23 of the 171 scored multiple-choice questions with AI assistance. Another 48 questions came from a first-year law student exam, while Kaplan Exam Services developed the remaining 100 questions.

The State Bar defended its practices, telling the LA Times that all questions underwent review by content validation panels and subject matter experts before the exam. "The ACS questions were developed with the assistance of AI and subsequently reviewed by content validation panels and a subject matter expert in advance of the exam," wrote State Bar Executive Director Leah Wilson in a press release. According to the LA Times, the revelation has drawn strong criticism from several legal education experts. "The debacle that was the February 2025 bar exam is worse than we imagined," said Mary Basick, assistant dean of academic skills at the University of California, Irvine School of Law. "I'm almost speechless. Having the questions drafted by non-lawyers using artificial intelligence is just unbelievable." Katie Moran, an associate professor at the University of San Francisco School of Law who specializes in bar exam preparation, called it "a staggering admission." She pointed out that the same company that drafted AI-generated questions also evaluated and approved them for use on the exam.
The report notes that the AI disclosure follows technical glitches with the February exam (like login issues, screen lag, and confusing questions), which led to a federal lawsuit against Meazure Learning and calls for a State Bar audit.

Slashdot Top Deals