Package com.exasol.parquetio.splitter
Class ParquetFileSplitter
- java.lang.Object
-
- com.exasol.parquetio.splitter.ParquetFileSplitter
-
- All Implemented Interfaces:
FileSplitter
public class ParquetFileSplitter extends Object implements FileSplitter
A class that splits Parquet file into chunks of certain size. Each chunk then contains one or more row group start and end positions in a Parquet file.
-
-
Constructor Summary
Constructors Constructor Description ParquetFileSplitter(org.apache.parquet.io.InputFile file)Creates a new instance ofParquetFileSplitter.ParquetFileSplitter(org.apache.parquet.io.InputFile file, long chunkSize)Creates a new instance ofParquetFileSplitter.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected List<ChunkInterval>getRowGroupSplits(List<org.apache.parquet.hadoop.metadata.BlockMetaData> rowGroups)Returns row group splits given the file row groups.List<ChunkInterval>getSplits()Gets file splits in the form ofstartandendintervals.
-
-
-
Constructor Detail
-
ParquetFileSplitter
public ParquetFileSplitter(org.apache.parquet.io.InputFile file)
Creates a new instance ofParquetFileSplitter. It uses default chunk size of64MBto split the file.- Parameters:
file- a Parquet file
-
ParquetFileSplitter
public ParquetFileSplitter(org.apache.parquet.io.InputFile file, long chunkSize)Creates a new instance ofParquetFileSplitter.- Parameters:
file- a Parquet filechunkSize- a chunk size in bytes
-
-
Method Detail
-
getSplits
public List<ChunkInterval> getSplits()
Description copied from interface:FileSplitterGets file splits in the form ofstartandendintervals.- Specified by:
getSplitsin interfaceFileSplitter- Returns:
- an array of intervals
-
getRowGroupSplits
protected List<ChunkInterval> getRowGroupSplits(List<org.apache.parquet.hadoop.metadata.BlockMetaData> rowGroups)
Returns row group splits given the file row groups.- Parameters:
rowGroups- a list of file row groups- Returns:
- a list of
ChunkIntervals
-
-