Having worked for massive head hunter / recruitment firms to write software to parse an arbitrarily structured set of documents (in various forms) in the millions and while linking in to 3rd party applications that are still used through acquisitions and attempt to structure them, attempt to pull out critical information on a daily search basis, as well as generate repeatable ETL processes to import them in to a structure format (official company search application, repeatable imports while we customize the official search app with functionality demanded from our acquisitions) for easy search I concur with your comment.
Keyword searches get you so far, but until you start building a complete inventory of proximity terms (aka Lexical chaining), Levenshtein distance algorithms, term weightings when related terms are found for resolution as well past term analysis of the current text to determine what is more relevant, you are back to manual processes. Even the above requires lots of training of the system to become even halfway reliable and that still requires constant updates as the industry changes and as you encounter newly phrased terms, job descriptions etc.
Then add matching from multiple systems addressing information, education and past employment information, phone numbers, email addresses, professional associations to determine "Hey, Joe Blogs from these five systems from different companies that were acquired is the same guy!!! Sweet, let's combine the data and update our official system with any missing information."
Yeah, it isn't easy, it takes a lot of work... and the above barely scratches the surface. Now how the hell do you do searches of terms against emails looking for classified information that is not labeled correctly with out going through the above? Judgement call by a trained eye, or access to all classified material during her tenure, or hashed critical values from classified material used to compare and pull out "stuff to review".
While I worked there, I started to miss the simplicity of writing LOB applications