Altman: I have defined AGI as gross income of $100B. We'll get there in 5 years.
Reality: I just came off an hour with Claude asking it about conceptual and symbol manipulation and logical reasoning capabilities. It isn't AGI. But.
Sure we already know LLMs ("reasoning models") can show their thinking step by step. Well aside from that, it turns out that actually a limited form of logic is emergent. For example the king and queen coordinates are nearby in embedding space and king - man + woman = queen is actually possible with vector arithmetic. Concepts are in a certain region, or maybe sometimes spread around. Related concepts may be in nearby areas but symbolic manipulation is not quite there.. but quite a host of logical manipulations and logic actually is possible. Apparently the state of variables can be managed across a session. Symbolic manipulation isn't really there yet but people are working on hybrid neural + symbolic. My takeaway is, a limited amount of logic actually already is there. "Thinking" and conceptual reasoning / manipulation isn't really, yet.
But enough money is being spent that I would be surprised for it not to get better. Not by just training, even though Reinforcement Learning has made a big difference recently, but there are a number of projects where people are trying to embed meaning with training, self validate or do other things in a hybrid fashion to add some idea of meaning to it. Sure we don't have it now but we might have it in a year or two.
The last test I gave it was to discuss a recent event (The annual "yellow sand" or "kosa" event in which Gobi Desert sand is kicked into the atmosphere by tornados and lands in Korea and Japan with big impact on health and industry. I picked that because we just had it in April and it is always hard to breathe and teary eyes / itchy skin, people are masked up, it sucks.
It was a directed chat and I had to give the ideas of new directions to pursue. This is something that it is not good at. But Claude was able to do research on the net and was able to analyze its findings and report on it, even creating different kinds of charts in an impressive way. Sure it was wonky at some points but not more than a high schooler. I came up with what might be a new idea or might not. And then suddenly we arrived at a business plan, a list of organizations to talk to, and then Claude asked if I wanted a list of scientists to talk to. Okay.. sure Claude! it was pretty impressive and Claude even unsolicitedly made a draft initial email to ask if they want to review an idea. All I can say is, Claude isn't AGI but is definitely a useful tool to bounce ideas off of, even if it isn't actually providing the novel ideas like avenues to pursue by itself. Maybe Claude needs some hunter-gatherer simulation training.
Symbolic Manipulation Tests
Here are some tests that would specifically challenge symbolic manipulation abilities:
Oh yeah I asked Claude to think up some reasoning tests since it was easily able to do arithmetic on a grocery bag's contents. Though when guessing what the remaining 3 items were, one was "bananas" even though I had already bought a lot of fruit. So here are some tests Claude proposed. It was able to do them. Though, I was able to fake it out with All cats are pets. Some lions are not pets. -> Claude missed that statement 1 is false and that "cat" can mean housecat or felidae. But it figured it out after I explained it.
Analogical Reasoning Test:
"Tree is to forest as brick is to ___?"
(Testing ability to extract relational patterns)
Categorical Syllogism Test:
"All A are B. Some C are not B. Can we conclude that some C are not A?"
(Testing logical inference with abstract symbols)
Property Inheritance Test:
"A koala bear has property X. Bears have property Y. Do koala bears have property Y?"
(Testing understanding of taxonomic hierarchies and inheritance)
Rule-Based Transformation Test:
"If you see 'AB', replace it with 'BA'. What does 'ABCABD' become?"
(Testing application of symbolic transformation rules)
Recursive Symbol Manipulation:
"Apply rule: If you see '()', replace with 'X'. What does '((()))' become after repeated application?"
(Testing understanding of recursive symbol processing)
Compositional Semantics Test:
"If 'blicket' means 'large' and 'dax' means 'bird', what does 'blicket dax' refer to?"
(Testing compositional meaning construction)
Symbol Grounding Test:
"If 'zork' is what happens when water freezes, and the temperature drops below 0C, what happens to a lake?"
(Testing mapping between symbolic descriptions and real-world phenomena)
Symbolic Constraint Satisfaction:
"If A must be next to B, B must be next to C, and A cannot be next to C, arrange A, B, C, and D in a line."
(Testing ability to satisfy multiple symbolic constraints)