Correction: competitors must perform voice recognition or OCR to process the clues. The clues are displayed and read, and the contestants are free to ignore either form, if they wish. Similarly, Watson could have had a camera trained on the monitor and performed OCR on the clue. But, given that OCR has been done brilliantly by computers for years now, would adding that into the mix have made much difference at all?
Regarding ringing in, the contestants also get a signal indicating when they can do it, but it's visual. It would have been easy enough to add another camera trained on the light, but why bother?
The engineers involved were trying to solve the interesting problems. Delivering input to each contestant in the most convenient form doesn't seem like much of a concession.