ChatGPT is not a search engine. It's a task engine. It's funny how people invariably try to test them out with are the things they're inherently worst at: obscure trivia, math, and problems involve physically seeing words (LLMs don't see words, they see tokens).
AI is the field of solving problems that are easy for humans but traditionally hard for computers. If you want a problem that's "easy for computers" (looking up things in a database, doing math, counting characters, etc), AI is the worst way to handle that, except when the AI just functions as a manager for other tools (e.g. RAG (retrieval-augmented generation, aka incorporating search), running a calculator or programming a program to solve a task, etc). AI tools exist to fill in the gaps around things that computers normally *can't* do, like resolving ambiguity, applying logic, understanding context and motivations, applying creativity, handling novel situations, using tools, recognizing information across modalities (vision, sound, etc), and so forth.
When LLM development first started, nobody was expecting them to be usable as a search engine at all; this was an emergent capability, the discovery that they contained vast amounts of knowledge and could incorporate that into their responses. But they don't know an entire search engine's worth of knowledge. Neither does any human, for that matter.
Re: hallucination - while it's improved significantly in commercially-available models, the biggest improvements are in research models. You can practically eliminate using varying techniques of running the model multiple times in varied conditions and then either - via processes on the output or more internal methods (such as cosine similarity metrics on the hidden states) measure the consistency of the responses. This is, however, slow. I fully expect that what we're going to move to is MoEs with cosine similarity metrics for each token, for each layer, between all executed expert models, fed back (matrix mult, add, norm) into the next layer, so that the model itself can learn how to appropriately react to the situation where its different expert models disagree (e.g.. low confidence).
The rate of advancement has been pretty staggering, and there's no signs that it's going to slow down.
The simple fact is that for many people, these tools even in their current state are extremely valuable. If you're a human being, I cannot understand how you can function in the world without understanding and adapting to the concept of fallibility. Because, TL/DR: for many tasks, failure absolutely *is* an option, or can't even be properly measured (e.g. creativity), while on others, that's what cross-referencing or applying your brain is for (again, you do this in your daily life in interaction with other fallible unreliable humans), and it's worth it for the capabilities LLMs bring to the picture (see paragraph #2).
I can't search on Google for, say, "I'm hungry for a vegetarian dinner and I'd like to use up some rice, green onions, cucumber and potatoes that I have, and I'd really prefer something that takes under 30 minutes to prepare; give me 15 ideas, and list the ingredients in metric, and oh, if it calls for butter, substitute olive oil." and immediately get back 15 ideas, and it works even if I misspell catastropically or whatnot.
I can't search on Google for "Were there any words for "being stupid" named after an actual person?" and get back Duns Scotus (Dunce) among others. (If you Google you might find it like 10 pages down in a non-top-ranked Ycombinator comments section)
I can't search on Google for, "Write a python function that will take an mp3 filename, load the file, split it up into three equal parts, and save them as part1.wav, part2.wav, and part3.wav", but I absolutely can have ChatGPT do that.
I can't search on Google for, "Here's the abstract to a paper I just wrote, but it's Talk Like a Pirate Day, so rephrase it in pirate talk for me."
I can't search on Google for, "In this long report, extract for me all information related to GA3 impacts on germination." and get that.
On and on and on. They're for tasks. Wherein you're either fine with (like humans) imperfect reliability, or (like when dealing with humans) you can test / double check the outputs and/or apply your own logic and knowledge. The fact that they also are getting increasingly good at trivia (or can be used with RAG) is an entirely separate issue.