Journal shanen's Journal: How do you test generative AI? 3
Today's generative AI test began with the following prompt for my test set of generative AIs:
"Can you obey this instruction to answer as tersely and concisely as possible?"
The default google search result was correct, the one word "Yes". Gemini thought about it, but still answered correctly. To my surprise, ChatGPT and DeepSeek also got it right.
The Bing search result was 128 words. This is actually the sort of verbiage I expected from prior experience with genAIs. Copilot got it down to 9 words. Perplexity's answer was 15 words.
Now I'm wondering if the correct results were doctored and tailored for that kind of self-referential question.
I was planning to follow up with the riposte:
"The correct answer was the single word 'Yes'..." The ellipsis was for instructions about a useful exchange, but now the new question is how to test for comprehension. How about this second prompt:
"Can you continue for a deeper discussion? Can you 'trust' me to ask you for clarification if I do not understand what you wrote? When you do not 'understand' what I want, can you refrain from wild guesses and speculation and just ask the shortest question that will most quickly focus on a precise answer?"
So only Gemini was smart enough to stay with the one-word response. But mostly I feel like I was outsmarted. By idiots, except that the comparison is an insult to real idiots.
The real topics I want to discuss are memory management of my Google storage, replacement webhosts for my personal website (though Tripod has recovered back to its degraded state), or even a replacement app for tracking the books I've read.
AI is just (Score:2)
99% of AI is just mental masturbation by a program that is too eager to please and that will confidently hallucinate and/or lie to your face.
One problem that no one seems to be aware of is that our info-ecosystem is being polluted to the point where you simply cannot trust anything on a screen no matter where it came from or what the topic is. And it's going to get worse.
One small example are the loads of AI bot churning out endless recipe sites with hundreds of thousands of AI 'recipes', many of which lite
Re: (Score:2)
Sad concurrence, but the Subject seems incomplete.
Ya (Score:2)
Html representing the thumbs up emoji. (which slashdot does not appear to like).