Error Handling
Overview
libvroom provides a comprehensive error handling framework with configurable behavior. The parser can detect 16 different error types with three severity levels, enabling applications to handle malformed CSV data gracefully.
Error Modes
The parser supports three error handling modes:
| Mode | Behavior | Use Case |
|---|---|---|
STRICT |
Stop on first error | Data validation, strict compliance |
PERMISSIVE |
Collect all errors, try to recover | Production parsing with logging |
BEST_EFFORT |
Ignore errors, parse what’s possible | Exploratory analysis, tolerant parsing |
Example
#include <libvroom.h>
// Method 1: Use Parser result's built-in error handling (default: PERMISSIVE)
libvroom::Parser parser;
auto result = parser.parse(buffer.data(), buffer.size());
if (result.has_errors()) {
std::cerr << result.error_summary() << "\n";
}
// Method 2: External error collector with specific mode
libvroom::ErrorCollector strict(libvroom::ErrorMode::FAIL_FAST);
auto result2 = parser.parse(buffer.data(), buffer.size(), {.errors = &strict});
// Method 3: CLI with strict mode
// vroom head --strict data.csvError Types
Field Structure Errors
| Error Code | Description | Severity |
|---|---|---|
INCONSISTENT_FIELD_COUNT |
Row has different number of fields than header | ERROR |
FIELD_TOO_LARGE |
Field exceeds maximum size limit | ERROR |
Example - Inconsistent fields:
name,age,city
Alice,30
Bob,25,NYC,USA
Line Ending Errors
| Error Code | Description | Severity |
|---|---|---|
MIXED_LINE_ENDINGS |
File uses inconsistent line endings | WARNING |
INVALID_LINE_ENDING |
Invalid line ending sequence | ERROR |
Encoding Errors
| Error Code | Description | Severity |
|---|---|---|
INVALID_UTF8 |
Invalid UTF-8 sequence | ERROR |
NULL_BYTE |
Unexpected null byte in data | ERROR |
Structure Errors
| Error Code | Description | Severity |
|---|---|---|
EMPTY_HEADER |
Header row is empty | ERROR |
DUPLICATE_COLUMN_NAMES |
Header contains duplicate column names | WARNING |
Severity Levels
| Severity | Meaning |
|---|---|
WARNING |
Non-fatal, parser continues (e.g., mixed line endings) |
ERROR |
Recoverable, can skip affected row |
FATAL |
Unrecoverable, parsing must stop |
Using the ErrorCollector
#include "error.h"
#include "two_pass.h"
void parse_with_errors(const uint8_t* buf, size_t len) {
// Create parser and error collector
libvroom::TwoPass parser;
auto idx = parser.init(len, 1);
libvroom::ErrorCollector errors(libvroom::ErrorMode::PERMISSIVE);
// Parse with error collection
bool success = parser.parse_with_errors(buf, idx, len, errors);
// Check for errors
if (errors.has_errors()) {
std::cout << "Found " << errors.error_count() << " errors\n";
// Iterate through errors
for (const auto& err : errors.errors()) {
std::cout << "Line " << err.line
<< ", Col " << err.column
<< ": " << err.message << "\n";
}
}
// Check specific conditions
if (errors.has_fatal_errors()) {
std::cerr << "Fatal error encountered!\n";
}
// Get summary
std::cout << errors.summary() << "\n";
}Error Limits
To prevent out-of-memory issues with malformed files containing many errors, ErrorCollector has a configurable maximum error limit:
// Default limit is 10,000 errors
libvroom::ErrorCollector errors(libvroom::ErrorMode::PERMISSIVE);
// Custom limit
libvroom::ErrorCollector errors(libvroom::ErrorMode::PERMISSIVE, 1000);
// Check if limit reached
if (errors.at_error_limit()) {
std::cerr << "Error limit reached, some errors may not be reported\n";
}Multi-threaded Error Collection
When parsing with multiple threads, each thread collects errors locally. After parsing, errors are merged and sorted by byte offset:
libvroom::TwoPass parser;
auto idx = parser.init(len, 4); // 4 threads
libvroom::ErrorCollector errors(libvroom::ErrorMode::PERMISSIVE);
// Errors will be collected per-thread and merged
parser.parse_two_pass_with_errors(buf, idx, len, errors);
// Errors are sorted by byte offset for consistent ordering
for (const auto& err : errors.errors()) {
// Process errors in file order
}Streaming Parser Error Handling
The streaming parser has its own error handling mechanism:
#include <streaming.h>
libvroom::StreamConfig config;
config.error_mode = libvroom::ErrorMode::PERMISSIVE;
libvroom::StreamReader reader("data.csv", config);
// Process rows
for (const auto& row : reader) {
process(row);
}
// Check for errors after processing
if (reader.error_collector().has_errors()) {
std::cerr << reader.error_collector().summary() << "\n";
}CLI Error Handling
The CLI tool supports strict mode via the -S or --strict flag:
# Default: permissive mode (continues on errors)
vroom head data.csv
# Strict mode: exit with code 1 on any error
vroom head --strict data.csvThis is useful for scripts and CI pipelines where you want to fail fast on malformed data.
Test Files
The test suite includes 16+ malformed CSV files in test/data/malformed/ covering all error types:
unclosed_quote.csv- Unclosed quote in middle of fileunclosed_quote_eof.csv- Unclosed quote at end of fileinvalid_quote_escape.csv- Invalid escape sequencesquote_in_unquoted_field.csv- Quote in unquoted contextinconsistent_columns.csv- Varying field countsmixed_line_endings.csv- LF and CRLF mixednull_byte.csv- Embedded null charactersempty_header.csv- Missing header rowduplicate_column_names.csv- Repeated column names- And more…
Best Practices
For data validation: Use
STRICTmode to catch the first error and report it clearly to users.For production parsing: Use
PERMISSIVEmode to collect all errors while parsing what you can, then log errors for later review.For exploratory analysis: Use
BEST_EFFORTmode when you want to process as much data as possible, ignoring errors.Set error limits: In production, always set a reasonable error limit to prevent memory exhaustion from very malformed files.
Check for fatal errors: Always check
has_fatal_errors()even in permissive mode, as some errors (like unclosed quotes at EOF) may affect the validity of the entire parse.