Whenever you need to store your data on S3 / Data Lake / External table choose file format wisely:
- Parquet / ORC are the best options due to efficient data layout, compression, indexing capabilities
- Columnar formats allow for column projection and partition pruning (reading only relevant data!)
- Binary formats enable schema evolution which is very applicable for constantly changing business environment