File Formats Supported by Presto

What file formats are supported by Presto? Are there any specific file formats recommended for better performance? I would be interested to know if there is any column file format like RCfile optimized for Presto?

+6
source share
3 answers

We test every release of Presto with RCFile, SequenceFile, and TextFile, but Presto must support any standard Hadoop file format. On Facebook, most of our data is in RCFile format, so this format currently has the best performance on Presto. We are in the process of transitioning to ORC, and as this is nearing completion, ORC should also be very fast at Presto.

+7
source

The best option is ORC. Parquet is also good, thanks to Netflix more optimizations.

+2
source

For the current version of presto, I recommend using an ORC file, Dain has finished a new ORC reader in presto, and it is very fast. Here is the blog https://code.facebook.com/posts/370832626374903/even-faster-data-at-the-speed-of-presto-orc/

+1
source

Source: https://habr.com/ru/post/957943/


All Articles