NetFiber asks: "I work as a security analyst for a large university. We have recently been tasked to scour our network in the hopes of finding and removing sensitive information such as credit card numbers, social security numbers, and such on all publicly available web servers. Our current method of analysis is to archive all the content (which often grows over 100GB) and later parse the data with various utilities and regexes that search for patterns and other pertinent information. So far, this process has proven to be rather cumbersome and time consuming. Does anyone have any experience collecting and sanitizing large amounts of web content? If so, what procedures/utilities do you use to accomplish this?"