to_parquet

to_parquet()

to_parquet(input_path: str, output_path: str, compression: str | None = None, row_group_size: typing.SupportsInt | None = None, num_threads: typing.SupportsInt | None = None, error_mode: str | None = None, max_errors: typing.SupportsInt | None = None, comment: str | None = None, skip_empty_rows: bool = True, guess_integer: bool = True, trim_ws: bool = True, escape_backslash: bool = False, decimal_mark: str = ‘.’, skip: typing.SupportsInt = 0) -> None

Convert a CSV file to Parquet format.

Parameters

Name Type Description Default
input_path str Path to the input CSV file. required
output_path str Path to the output Parquet file. required
compression str Compression codec: “zstd”, “snappy”, “lz4”, “gzip”, or “none”. Default is “zstd” if available, otherwise “gzip”. required
row_group_size int Number of rows per row group. Default is 1,000,000. required
num_threads int Number of threads to use. Default is auto-detect. required
error_mode str Error handling mode: - “disabled” (default): No error collection, maximum performance - “fail_fast” or “strict”: Stop on first error - “permissive”: Collect all errors, stop on fatal - “best_effort”: Ignore errors, parse what’s possible required
max_errors int Maximum number of errors to collect. Default is 10000. Setting this automatically enables “permissive” mode if error_mode is not set. required
comment str String that marks comment lines. Lines starting with this string are skipped during parsing. Supports multi-character prefixes like “//” or “##”. Default is None (no comment skipping). required
skip_empty_rows bool Whether to skip empty lines in the input. Default is True. required
guess_integer bool Whether to infer integer types (INT32/INT64) for integer-like values. Default is True. required
trim_ws bool Whether to trim leading and trailing whitespace from field values. Default is True. required
escape_backslash bool Whether to use backslash escaping instead of doubled quotes. Default is False. required
decimal_mark str Decimal separator character (‘.’ or ‘,’). Default is ‘.’. required
skip int Number of lines to skip before the header row. Default is 0. required

Raises

Name Type Description
RuntimeError If parsing fails. In permissive mode, collected errors are included in the exception message.

Examples

>>> import vroom_csv
>>> vroom_csv.to_parquet("data.csv", "data.parquet")

With error handling

>>> vroom_csv.to_parquet("data.csv", "data.parquet", error_mode="strict")