Apache Parquet Description
Parquet was created to provide the Hadoop ecosystem with the benefits of columnar, compressed data representation. Parquet was built with complex nested data structures and uses the Dremel paper's record shredding/assemblage algorithm. This approach is better than flattening nested namespaces. Parquet is designed to support efficient compression and encoding strategies. Multiple projects have shown the positive impact of the right compression and encoding scheme on data performance. Parquet allows for compression schemes to be specified per-column. It is future-proofed to allow for more encodings to be added as they are developed and implemented. Parquet was designed to be used by everyone. We don't want to play favorites in the Hadoop ecosystem.