Yes you're right. I didn't read that properly.
Although I think the summary oversimplifies things a lot. Skimming the actual paper, it looks like the Lovelace test is not a test in itself but a means to critique tests for AI. It could apply to a chatbot or a story writer or anything else.
So if I ask a chatbot "How many legs does a horse have", it would fail if it just looks up the answer in a database that contains "legs", "horse" and knows to give the answer "4" (because can trivially explain that), but if it has learned from earlier conversation what a horse is and what a leg is and comes up with a correct answer, it would pass, because I have no way of knowing the exact inputs it used. Something like that anyway.