to_parquet
to_parquet()to_parquet(input_path: str, output_path: str, compression: str | None = None, row_group_size: typing.SupportsInt | None = None, num_threads: typing.SupportsInt | None = None, error_mode: str | None = None, max_errors: typing.SupportsInt | None = None, comment: str | None = None, skip_empty_rows: bool = True, guess_integer: bool = True, trim_ws: bool = True, escape_backslash: bool = False, decimal_mark: str = ‘.’, skip: typing.SupportsInt = 0) -> None
Convert a CSV file to Parquet format.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| input_path | str | Path to the input CSV file. | required |
| output_path | str | Path to the output Parquet file. | required |
| compression | str | Compression codec: “zstd”, “snappy”, “lz4”, “gzip”, or “none”. Default is “zstd” if available, otherwise “gzip”. | required |
| row_group_size | int | Number of rows per row group. Default is 1,000,000. | required |
| num_threads | int | Number of threads to use. Default is auto-detect. | required |
| error_mode | str | Error handling mode: - “disabled” (default): No error collection, maximum performance - “fail_fast” or “strict”: Stop on first error - “permissive”: Collect all errors, stop on fatal - “best_effort”: Ignore errors, parse what’s possible | required |
| max_errors | int | Maximum number of errors to collect. Default is 10000. Setting this automatically enables “permissive” mode if error_mode is not set. | required |
| comment | str | String that marks comment lines. Lines starting with this string are skipped during parsing. Supports multi-character prefixes like “//” or “##”. Default is None (no comment skipping). | required |
| skip_empty_rows | bool | Whether to skip empty lines in the input. Default is True. | required |
| guess_integer | bool | Whether to infer integer types (INT32/INT64) for integer-like values. Default is True. | required |
| trim_ws | bool | Whether to trim leading and trailing whitespace from field values. Default is True. | required |
| escape_backslash | bool | Whether to use backslash escaping instead of doubled quotes. Default is False. | required |
| decimal_mark | str | Decimal separator character (‘.’ or ‘,’). Default is ‘.’. | required |
| skip | int | Number of lines to skip before the header row. Default is 0. | required |
Raises
| Name | Type | Description |
|---|---|---|
| RuntimeError | If parsing fails. In permissive mode, collected errors are included in the exception message. |
Examples
>>> import vroom_csv
>>> vroom_csv.to_parquet("data.csv", "data.parquet")With error handling
>>> vroom_csv.to_parquet("data.csv", "data.parquet", error_mode="strict")