Comment Re:I take back every bad thing I said about Google (Score 2, Interesting) 251
I've been using Tesseract for a PG project for a few weeks now and, as TFA says, it's not as good
as some commercial ones out there. Abby Finereader seems to be the OCR software of choice for
Distributed Proofreaders, at least.
Tesseract just has ASCII support (for now, as they like to add), so it ignores italics, accents etc.
In the case of the book I'm working on, it had a very hard time with the ff ligature and had some
trouble with b and c, but became hut, he became be, c was often an o or e.
The words difficult, office and scientific were the standard pitfalls. On some pages it was nearly flawless though.
The biggest advantages to me are clearly that it is free*, it's good enough and I can use it on my preferred OS.
* Mostly Apache License v2.0, a part of it is under a "freely use and modify for research and development purposes" license however.
as some commercial ones out there. Abby Finereader seems to be the OCR software of choice for
Distributed Proofreaders, at least.
Tesseract just has ASCII support (for now, as they like to add), so it ignores italics, accents etc.
In the case of the book I'm working on, it had a very hard time with the ff ligature and had some
trouble with b and c, but became hut, he became be, c was often an o or e.
The words difficult, office and scientific were the standard pitfalls. On some pages it was nearly flawless though.
The biggest advantages to me are clearly that it is free*, it's good enough and I can use it on my preferred OS.
* Mostly Apache License v2.0, a part of it is under a "freely use and modify for research and development purposes" license however.