Comment Re:Honestly doesn't seem that bad? (Score 1) 46
Some are even multimodal LLM. You can transcribe audio using Gemma 4 for example. While it is not the primary purpose of an LLM it has the advantage that it can do more than speech, like describing other sounds and that context a LLM knows that a simpler TTS engine does not know can prevent transcript errors. Every silly error you see in automatic captions that is obvious nonsense can be caught by an LLM. The subtle ones stay subtle, of course.