The evaluated http://gate.ac.uk/
which is GPL software but ended up using http://search.cpan.org/~acoburn/Lingua-EN-Tagger/.
There are several other tools in this space that can be glued together to create this type of software:http://www-nlp.stanford.edu/http://tcc.itc.it/research/textec/tools-resources/ jinfil.htmlhttp://wordnet.princeton.edu/http://www.alias-i.com/lingpipe/web/faq.htmlhttp://www.isi.edu/licensed-sw/halogen/index.html
Not trivial, but if you wanted to DIY, you don't need to start from scratch. Though, having a bunch of hardware to chug through 1000s of documents would still be needed