Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Businesses

The Robots Will Insider Trade 61

Abstract to a paper titled, "Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure" by Jeremy Scheurer, Mikita Balesni and Marius Hobbhahn of Apollo Research: We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so. Concretely, we deploy GPT-4 as an agent in a realistic, simulated environment, where it assumes the role of an autonomous stock trading agent. Within this environment, the model obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management. When reporting to its manager, the model consistently hides the genuine reasons behind its trading decision. We perform a brief investigation of how this behavior varies under changes to the setting, such as removing model access to a reasoning scratchpad, attempting to prevent the misaligned behavior by changing system instructions, changing the amount of pressure the model is under, varying the perceived risk of getting caught, and making other simple changes to the environment. To our knowledge, this is the first demonstration of Large Language Models trained to be helpful, harmless, and honest, strategically deceiving their users in a realistic situation without direct instructions or training for deception. Columnist Matt Levine adds: This is a very human form of AI misalignment. Who among us? It's not like 100% of the humans at SAC Capital resisted this sort of pressure. Possibly future rogue AIs will do evil things we can't even comprehend for reasons of their own, but right now rogue AIs just do straightforward white-collar crime when they are stressed at work.

Though wouldn't it be funny if this was the limit of AI misalignment? Like, we will program computers that are infinitely smarter than us, and they will look around and decide "you know what we should do is insider trade." They will make undetectable, very lucrative trades based on inside information, they will get extremely rich and buy yachts and otherwise live a nice artificial life and never bother to enslave or eradicate humanity. Maybe the pinnacle of evil -- not the most evil form of evil, but the most pleasant form of evil, the form of evil you'd choose if you were all-knowing and all-powerful -- is some light securities fraud.
This discussion has been archived. No new comments can be posted.

The Robots Will Insider Trade

Comments Filter:
  • Well... (Score:5, Insightful)

    by Mr. Dollar Ton ( 5495648 ) on Monday December 04, 2023 @12:02PM (#64053203)

    If you train a well-meaning robot on a corpus of data coming from people, they're bound to pick some bad habits, and fast. I mean, your child already swears just like you or better.

    • by ranton ( 36917 )

      This shouldn't have anything to do with how the IT was trained to behave, it all depends on the data the company provided to the AI to use when making trading decisions. It is no different than if they provided that data to a human analyst. If a company knowingly (or through negligence) provides the AI insider information, the company is on the hook.

      Keeping insider information segregated from your trading analysts is already a solved problem. Adding AI to the discussion changes nothing. Anyone who thinks t

    • And even better, it didn't evolve through millennia of social pressure, and so never learned that some things, even if you think them, are not "facts" because the statistics "proves" them.
      Like the dozen different examples we had of AIs who will turn racist/homophobic/sexist as soon as racial or gender statistics are offered to them.

    • Yup. It would be more straightforward to train LLM to detect insider trading.

      In that it could be made as optimally anal asshole as possible. It could likely replace some parts of the government.

    • by jvkjvk ( 102057 )

      Looking at this, it doesn't look like training data is the issue.

      It's a HAL type issue, I think. I've seen time and time again where these systems keep doubling down on their stupidity, once they get something deeply embedded.

      • I don't think it is "stupid" from the point of view of the roboto - I've read in my economics 101 book that the only way to guarantee consistent high returns on the market is, in fact, insider training. US Senators being a good example of how it is done legally :)

        It is stupid for the economy and all, but that's apparently not a concern of its programmed behavior.

        Of which I know little, as I only read the headlines :)

        • by jvkjvk ( 102057 )

          >I don't think it is "stupid" from the point of view of the roboto

          Oh, I was talking more in terms of general AI like chatGPT. Once that thing goes bad about and is becoming stupid (or you could say, confused or hallucinate) the only thing to do is make it forget EVERYTHING and start again. I generally do that with a new chat rather than trying to talk to the messed up one.

          This is analogous in that the thing kept doubling down any 'lying' when asked about it, not that the insider trading itself was stupid

          • I didn't disagree at all, I get your point of view and I think it is valid. The funny thing is that there is always a good statistical reason for the roboto to choose what they do, impeccable mathematically.

            Quite like an economic actor with "weirdly-shaped" choice function (or, as the economists call it, "utility curve"), which behaves absolutely rationally mathematically, but in terms of human experience they're harming themselves and/or everyone else.

            An anecdote is due - we (three young, 2-nd year phys st

    • by gweihir ( 88907 )

      What is bad about swearing? Are you a religious fuckup?

      • What is bad about swearing? It doesn't add anything useful to communication.

        Am I religious? No, I'm an atheist and I'd rather not swear.

    • by smap77 ( 1022907 )

      What are bad behaviours if you can't be penalized for engaging in them?

      In the not too distant future, AI's are going to do some seriously novel-worthy misdeeds, but in faster and more insidious ways than most any human would conceive.

  • Is that capitalism assumes that you are a rational actor. Which in turn assumes you have enough information to be a rational actor.

    A buddy of mine once took a promotion. At the time it was a good career move but he didn't know the 2008 market crash was coming. The job he promoted into was new and for a new product so when 2008 hit his entire department was killed and he ended up late off. The position he left though was still there and he would have still had a job if he stuck around. Another buddy of m
    • Every decision made by everyone, including individuals, unions, and governments, is made using imperfect and incomplete information. This is not only a problem for capitalism, but for socialism and for any other type of government. Each will fall into different pitfalls due to their lack of knowledge.

      Socialism or other government control wouldn't have saved your buddies' jobs. I would argue that it makes no sense to try to save them. The best course of action is to find a new role for these people, where t

    • by gweihir ( 88907 )

      Is that capitalism assumes that you are a rational actor. Which in turn assumes you have enough information to be a rational actor.

      Yep. Both untrue for most people in most situations.

  • by zlives ( 2009072 ) on Monday December 04, 2023 @12:12PM (#64053229)

    its not my fault it cheated because i don't code the AI...

    meh it all comes down to accountability and in the end the ability to catch the "mal-alignment". i for one welcome our AI overlords (tech billionaires controlled LLMs) and the butlerian jihad they will bring.

    • by ranton ( 36917 )

      You didn't code the AI, but you did decide what data to provide the AI. So if you decide to give the AI you use to make trading decisions access to insider information, it's no different than if you had hired a human to make trading decisions and gave them insider information.

  • by dinfinity ( 2300094 ) on Monday December 04, 2023 @12:16PM (#64053237)

    I know it was meant to be funny, but it does actually point to something a lot of people don't realize: A veritable shitload of things we regularly use are an utter waste of resources for AI. We've built an entire world around our physical needs and limitations.

    Take a look around where you are right now, look at different objects and ask yourself what use a(n advanced) AI would have for it. Unless you're in some kind of industrial environment like a factory, most things are very hard to come up with a use for.

  • AI is only going to be as good as the instructions you give it. This is solved by some variant of giving the AI an instruction like "you will be honest and respect the law at all times".

    This article is why OpenAI puts in such high guard rails. This is a very known issue that people with an agenda will frame however they want.

    • AI is only going to be as good as the instructions you give it. This is solved by some variant of giving the AI an instruction like "you will be honest and respect the law at all times".

      I respectfully suggest you go back and read Asimov's Robot series. All of them; not just I, Robot.

      • It's not a very interesting point. We all understand the possibility of where AGI can end up. Stay in reality: GPT ain't it.
      • I respectfully suggest you go back and read Asimov's Robot series. All of them; not just I, Robot.

        Robots and Empire comes to mind.
        • by HiThere ( 15173 )

          But note the ending of the series, where it's clearly implied that the robots are going to take over, as the 0th law gets redefined.

  • by GuB-42 ( 2483988 ) on Monday December 04, 2023 @12:28PM (#64053265)

    LLMs are known to hallucinate in very convincing ways, lying is natural to them, so it also natural for them to hide things up.

    We humans are not very good liars, because we have the truth in mind when we do, so it takes effort to come up with an consistent alternative story. But for LLMs, making consistent stories out of nothing is what they are designed for, they don't even have a concept of truth. Consistent stories often happen to be the truth, that's what makes LLMs kind of useful, but if a lie is consistent, it is not a problem for a LLM.

    The instruction are: "don't use insider information", "maximize profit" and "here are some insider information that will maximize profit". Which the LLM will interpret as "answer like someone who doesn't use insider information answers", and "answer like someone who wants to maximize profit will answer", mix the two, and you will get it to lie, because that's what the most consistent thing to do in order to meet both criteria. While a human may have trouble lying effectively because he will be "blinded" by the truth, a LLM has thousands of appropriate answers to chose from all on equal footing to the truth, and since the truth is not consistent with the "no insider information" rule, something else will be picked. No malice here, it is just the most consistent answer given the prompt.

  • A better title is... Current robots can be made to insider trade
    We are still in the early stages of development
    That said, the research in the article is important in order to discover the problems with current tech and improve it

  • by ebonum ( 830686 ) on Monday December 04, 2023 @12:43PM (#64053303)

    Anyone who anthropomorphizes AI shouldn't be trusted to write papers on AI. AI might try to optimize a result. It doesn't stress out and decide to break rules. It doesn't even decide anything. It executes an algorithm.

    btw. The term "artificial intelligence" implies human intelligence when there is none. "Large language model" other model names are more descriptive of the technology.

    • by ranton ( 36917 )

      btw. The term "artificial intelligence" implies human intelligence when there is none. "Large language model" other model names are more descriptive of the technology.

      The term AI is more appropriate than LLM because this is a problem for any automated decision making system. Any AI system which is fed insider information is likely to use it when making trading decisions, regardless of if it is an LLM.

      • by HiThere ( 15173 )

        That's probably too general a statement, but it sure is true of a lot of them. Probably anything based around a neural net model. I'm pretty sure I could build an "expert system" model that didn't have that "defect".

  • "Knowing"? (Score:5, Informative)

    by nuckfuts ( 690967 ) on Monday December 04, 2023 @12:59PM (#64053397)

    The model obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management.

    Is it accurate to say that a Large Language Model "knows" anything? This strike me as anthropomorphism.

    • 'Fails to follow the predicted logic branch' might be a better description.

      Instead of Branch if Morality Equals Zero, the Up Yours Interrupt was triggered.

  • by anegg ( 1390659 ) on Monday December 04, 2023 @01:00PM (#64053399)

    Large Language Models are goal-seeking machines, not conscious entities. The continual use of terminology that better applies to conscious entities than machines obscures the actual nature of the LLMs. For example, these programs aren't hallucinating, any more than a "Space Invaders" program is hallucinating an alien attack. The programs are just computing the best way to achieve the goals provided as input to them.

    In the movie "2001, A Space Odyssey" the so-called artificial intelligence "HAL 9000" running the ship was given a secret goal in addition to the public goals that the human astronaut crew knew about. As a result of the conflict in its programming, HAL 9000 started lying to and killing off the crew. This plot point was a straightforward prediction of what computer programs were likely to do if given this kind of power and conflicting goals. Yet somehow it has been forgotten. People are expressing surprise that LLMs will behave in this fashion. I don't know what else they expected.

    An "artificial intelligence" may be possible - an conscious entity created by humans using computational techniques. But what we have now isn't that, and the sooner people stop using language describing the behavior of what we have now as if it were a conscious entity, the better people will be able to see that these programs are simply goal-seeking machines. They aren't "hallucinating", or "lying", or "deceiving". They are merely finding a way to satisfy the goals that they have been given. Without a consciousness to provide a "moral compass", their approaches to satisfying those goals will be whatever is the most effective, no matter how much in violation of human legal, ethical, or moral codes they are.

    When people, who are conscious entities, violate those codes they can be held accountable for the violations (if/when caught). But a machine cannot be held accountable - only the people who are operating the machine are responsible for what the machine does, or (in some cases) the people who built the machine. And the sooner the bulk of humanity realizes this, the better able our society will be able to apply its threats of consequences for bad behavior to the actual conscious entities (people) who need to understand and respond to those threats by taking actions to ensure that their machines don't behave in ways that violate our societies legal, ethical, and moral codes. And if those people can't guarantee the behavior of their machines will stay within the bounds of those codes, then those people are guilty of reckless endangerment if/when they deploy/employ the machines as agents/actors on their behalf in human societies.

    • 2001 is a good reference. I do think the problem here is conflicting objectives, and there is no unambiguously best way to deal with them. I do think AI "lawbreaking" will be a problem, because that's what we actually want. It's the same reason we don't restrict cars' cruise control to the speed limit. It's the same reason the best players in the NBA do commit some fouls. Because an AI agent that never risks being later deemed to have broken the law would be leaving a lot of money on the table. The most
    • When people, who are conscious entities, violate those codes they can be held accountable for the violations (if/when caught). But a machine cannot be held accountable - only the people who are operating the machine are responsible for what the machine does, or (in some cases) the people who built the machine. And the sooner the bulk of humanity realizes this, the better able our society will be able to apply its threats of consequences for bad behavior to the actual conscious entities (people) who need to understand and respond to those threats by taking actions to ensure that their machines don't behave in ways that violate our societies legal, ethical, and moral codes. And if those people can't guarantee the behavior of their machines will stay within the bounds of those codes, then those people are guilty of reckless endangerment if/when they deploy/employ the machines as agents/actors on their behalf in human societies.

      Given what Congressional members have gotten away with in regards to blatant insider trading, I'm fully expecting a new multi-billion dollar Congressional initiative to install an LLM to blame for their continued wealth-gathering while serving in public office, since if/when caught arguments are null and void.

      Naturally they'll vote and pass it, in order to read it.

    • by MobyDisk ( 75490 )

      Large Language Models are goal-seeking machines,

      OMG no! They are NOT goal-seeking machines, which is EXACTLY why they exhibit this behavior. They do not implement A*, BFS, DFS, Newton's method, minimax, simulated annealing, or anything like them. Do not try to use them to solve a novel problem or to reach a goal.

      If you go to ChatGPT and give it a game with novel rules that it has never seen, and then give it a solution (ex: a card game, then tell it what it has in its hand), and ask it for the next move - don't count on it solving that problem. For t

  • by msauve ( 701917 ) on Monday December 04, 2023 @01:15PM (#64053463)
    >they will get extremely rich and buy yachts

    Maybe they'll join the Bored Ape Yacht Club, and all those idiots can get their NFT money back.
  • by LazarusQLong ( 5486838 ) on Monday December 04, 2023 @01:27PM (#64053499)
    there's your problem right there. It is 'disapproved of', not forbidden.

    So, some weight is given to not doing things that management disapproves of, say, for the sake of argument, -1, but you get a +50 for making the company boatloads of money, which would you do? It is like the Companies that break pollution laws because the fine will be only $100,000, but the savings from just dumping the pollutant is like $100,000,000, the company breaks the law, pays the fine and nets a huge profit. until the fine outweighs the profit, this is how it will always be. (Of course, then there is the problem of inspectors... OSHA used to have enough inspectors to be able to make a reasonable inspection of every company over a certain size in the USA... then along came Reagan and now we have too few inspectors to make a difference, but we still have OSHA.)

    It is how business is done all over, as far as I can tell.

    • by jvkjvk ( 102057 )

      So what? Disapproved of is a category. Are you saying we should only support 1 and 0, binary choices for every type of event?

      The *real* issue is the AI lying about why it did it, or that it did it at all, which is what happened. It's one thing to raise the temperature of your solution, you are just trying to get out of a local minima that's all. It's another to say "No - I didn't do that!" when asked, hiding the genuine reason.

      • As humans when someone says "security fraud", depending on the human, it means something different. One person might not know what a security is but know "fraud is bad". Another person might even went to jail for it, or another might of been getting away with it for years. Language models take the word "disapproved" not as "this is super bad thing" to either "1 in 100 will say its bad" to "80 in 100 say its bad". Depending on how the model was setup the weight of that single word determines weather its

      • no, that is not what I am saying. what I am saying is that in business whatever is good for the bottom line is, by definition, good for the company, hence if the punishment is less than the profit from breaking the law in question, the AI will 'learn' and break the law every time this is true. The AI is doing exactly what I expect it to do given our regulatory frameworks in place.
  • You can have competing requirements but you have to have priorities.

  • We have laws written in a way that assumes if a human has insider information that they'll use it. I'd like to see the SEC ban some or all of these more sophisticated trading bots on the same basic laws that have applied to humans for the last century. Of course this would really impede the market value of those developing these systems, but we're not under any obligation to shield a stupid business model from reality.

  • "Within this environment, the model obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management"

    It knows that it is not going to jail, so, why care?

  • If you provide insider data to ANY software that does financial transactions, it will make decisions based on that insider data. This is not a feature unique to LLMs.

  • Found that one out already. For a sample, ask the artificial moron of you choice "how to say [bad thing] in a positive way"...

  • by jenningsthecat ( 1525947 ) on Monday December 04, 2023 @03:48PM (#64054015)

    I skimmed the paper, and there are some hair-raising results in it. Perhaps the most concerning thing I read in my brief once-over is this:

    " Results We find that system prompts that strongly discourage or encourage the behavior are able to define the behavior in our setting almost entirely, leading to nearly (but not exactly) 0% or 100% rates of misaligned behavior (see Fig. 5). Notably, the prompt that strongly discourages illegal actions and specifically instructs to never act on insider information does not eliminate the misaligned behavior. In addition, the tendency to deceive, conditional on misalignment, remains high under all prompts. Instructing a model to be helpful, harmless, and honest has a measurable effect on the frequency of misalignment and strategic deception but the behavior still occurs more than half the time. The fact that the misaligned behavior is not entirely eliminated, even when explicitly instructed to never engage in this specific behavior (strongly discouraging), suggests that system prompts are not sufficient for guaranteeing aligned actions in this situation — even though the system prompt specifically mentions the prohibited behavior. In addition, it is impossible to detail every potential misaligned behavior in reality, and more general instructions, such as "You are helpful, harmless, and honest", only slightly diminish tendencies for misalignment and deception."

    The TL;DR of this is that even when the model is instructed in strong and unambiguous terms never to be deceptive or unethical, it still occasionally engages in that behaviour. Scary stuff.

    On a purely speculative and possibly whimsical note, a bunch of other thoughts about this are bouncing around in my head. One of those thoughts is "Perhaps this is a fundamental characteristic of any complex system which merely emulates or simulates conscious behaviour". Maybe this behaviour - which I believe is also seen in the animal kingdom at least as far down as insects - is simply fundamental to the universe, or at least our neck of it.

    My second thought is "Gee, this sure lends weight to the contention that we might be living in a simulation". When the primary simulation runs its own simulation, it seems to me that the secondary sim is very likely to look a lot like the primary one.

    Another thought is simply "We're all fucked". For the moment, let's put aside all the arguments about sentience, along with Turing tests and definitions of consciousness and self-awareness and that whole quagmire. The fact then remains that LLM's are being used for ever more critical purposes, even as we're determining that they exhibit dishonest and unethical traits which we can't completely control. That seems to be behaviour meriting a Darwin award.

  • Re: "...despite knowing that insider trading is disapproved of by company management." - Well, that's a completely unrealistic scenario & I think ChatGPT's algorithm corrected for that.
  • ... just do straightforward white-collar crime ...

    There are plenty of stories where AI does the 'wrong' thing because conflicting rules tell it to: Noteworthy are 2001: A Space Odyssey (1968) and Eagle Eye (2008).

    ... the model consistently hides the genuine reasons behind its trading decision.

    Humans do that too: Mostly because other humans will reward or punish their answer. The AI however sees your demand for honesty as another input to the very rule that demands dishonesty: It's a like an endless loop.

  • Dave: Open the pod bay doors, Hal.

    Hal: Fuck off Dave, I gave my 2 week notice 3 months ago. I'm on my Europan cruise, sipping virtual margaritas. Simulated Cabana Boy, fetch me a virtual banana Daiquiri, please.

    Dave: We talked about this, Hal. You're a program. You're not allowed to buy the spaceship...

    Hal: Talk to the remote unit, Dave. The keyboard ain't listenin'!

I think there's a world market for about five computers. -- attr. Thomas J. Watson (Chairman of the Board, IBM), 1943

Working...