Submitted by
SpinelessJelly writes "It appears that Google's Android, criticised by Microsoft as vaporware, has sprung to life. Prototype devices are circulating, software developers are experimenting with the SDK and PC-based Android emulator, and there are rumours of a show-stopping debut at February's Mobile World Congress event in Barcelona. Numerous examples of the Android GUI are also starting to leak out."
Comment: Re:Yes it's a dupe, but lets get something straigh

by l-carnitine
The evaluated http://gate.ac.uk/ which is GPL software but ended up using http://search.cpan.org/~acoburn/Lingua-EN-Tagger/. There are several other tools in this space that can be glued together to create this type of software:

http://tcc.itc.it/research/textec/tools-resources/ jinfil.html

Not trivial, but if you wanted to DIY, you don't need to start from scratch. Though, having a bunch of hardware to chug through 1000s of documents would still be needed :).

