Comment Re:Come on guys you should know better (Score 1) 488
Snowden mentioned "hashing" the emails that were not obvious duplicates. That means generating an SHA1 key (or similar) for the entire contents of the email - to/from/cc/bcc/subject/etc. Do that for both the new emails and for the "old" emails. Now anytime you have a matching SHA1 key on both sides, you have a duplicate email. Discard those. Now run the remainder through full text indexing (only about a business day of processing time) and run keyword searches for your specific topics of interest. Flag any results for further review/analysis. Some of that further work may be applying more scripting to remove false positives. The results could be that there are very few results that would impact the previous decision regarding Clinton. And with the apparent manpower that was thrown at this, I'm sure any emails that made it through that filtering were vetted thoroughly. I think the initial declaration by the FBI was the bullshit part, not the time it took to process the "new" emails.