angry tapir writes: "Eschewing popular choices such as XML, CSV and JSON, Twitter has opted to format the back-end storage of its user and systems data with a relatively unknown format pioneered by Google, called Protocol Buffers. With the company storing 12TB of this data each day for later use, the decision of which format to use was a crucial one. The company is planning for the time when it will have to house "a trillion Tweets"."
egrep patterns are full regular expressions; it uses a fast deterministic
algorithm that sometimes needs exponential space.
-- unix manuals