CLI Reference
#| include: false
# Set up PATH to include the vroom binary for examples
export PATH="../build:$PATH"
Overview
The vroom command line tool provides a user-friendly way to work with CSV files using the high-performance libvroom parser. It supports common operations like counting rows, displaying data, selecting columns, and detecting file formats.
Installation
After building libvroom (see Getting Started), the vroom binary is located in your build directory. To use it conveniently:
Option 1: Add Build Directory to PATH (Temporary)
Add the build directory to your PATH for the current session:
export PATH="/path/to/libvroom/build:$PATH"For example, if you built in the default location:
export PATH="$(pwd)/build:$PATH"Option 2: Install to System Location (Permanent)
Copy the binary to a system location that’s already in your PATH:
# Install for all users (requires sudo)
sudo cp build/vroom /usr/local/bin/
# Or install for current user only
mkdir -p ~/.local/bin
cp build/vroom ~/.local/bin/
# Ensure ~/.local/bin is in your PATH (add to .bashrc or .zshrc if needed)
export PATH="$HOME/.local/bin:$PATH"Verify the installation:
vroom -vBasic Usage
vroom <command> [options] [csvfile]If csvfile is omitted or - is specified, vroom reads from standard input.
Commands
| Command | Description |
|---|---|
count |
Count the number of rows |
head |
Display the first N rows (default: 10) |
tail |
Display the last N rows (default: 10) |
sample |
Display N random rows from throughout the file |
select |
Select specific columns by name or index |
info |
Display information about the CSV file |
schema |
Display inferred schema (column names, types, nullable) |
stats |
Display statistical summary for each column |
pretty |
Pretty-print the CSV with aligned columns |
dialect |
Detect and display the CSV dialect (delimiter, quoting, etc.) |
Options
General Options
| Option | Description |
|---|---|
-h |
Show help message |
-v |
Show version information |
Data Options
| Option | Description | Default |
|---|---|---|
-n <num> |
Number of rows (for head/tail/sample/pretty) | 10 |
-c <cols> |
Comma-separated column names or indices (for select) | - |
-H |
No header row in input | (header assumed) |
-m <size> |
Sample size for schema/stats (0=all rows) | 0 |
Performance Options
| Option | Description | Default |
|---|---|---|
-t <threads> |
Number of threads (1-1024) | auto (hardware concurrency) |
Caching Options
| Option | Description | Default |
|---|---|---|
--cache |
Enable index caching (stores .vidx file next to source) |
disabled |
--cache-dir <dir> |
Store cache files in specified directory (implies --cache) |
- |
--no-cache |
Disable index caching | (default) |
See Index Caching for detailed documentation on how caching works.
Dialect Options
| Option | Description | Default |
|---|---|---|
-d <delim> |
Field delimiter: comma, tab, semicolon, pipe, or any single character. Disables auto-detection when specified. |
auto-detect |
-q <char> |
Quote character | " |
Encoding Options
| Option | Description | Default |
|---|---|---|
-e <enc> |
Override encoding detection with specified encoding | auto-detect |
Supported encoding values: utf-8, utf-16le, utf-16be, utf-32le, utf-32be, latin1, windows-1252
Output Options
| Option | Description | Commands |
|---|---|---|
-j |
Output in JSON format | dialect, schema, stats |
-S, --strict |
Exit with code 1 on any parse error | all |
Sampling Options
| Option | Description | Commands |
|---|---|---|
-s <seed> |
Random seed for reproducible sampling | sample |
Input and Output Formats
Input Sources
vroom can read CSV data from multiple sources:
- Files: Provide the path to a CSV file
- stdin: Omit the filename or use
-to read from standard input
Output Formats
- CSV output (head, tail, sample, select): Valid CSV maintaining the input delimiter
- Plain text (count): Single number
- Structured text (info, pretty, schema, stats): Human-readable formatted output
- JSON (dialect -j, schema -j, stats -j): Machine-readable format for scripting
Supported Delimiters
| Name | Character | Example Usage |
|---|---|---|
| comma | , |
-d comma or -d , |
| tab | \t |
-d tab |
| semicolon | ; |
-d semicolon or -d ";" |
| pipe | \| |
-d pipe or -d "\|" |
| any character | varies | -d : |
Command Details
count
Count the number of data rows in a CSV file. Uses an optimized SIMD algorithm that doesn’t build a full index, making it significantly faster than other commands.
vroom count ../test/data/real_world/contacts.csv
head
Display the first N rows of a CSV file (default: 10).
vroom head -n 2 ../test/data/real_world/contacts.csv
tail
Display the last N rows of a CSV file (default: 10). Uses a memory-efficient streaming approach that only keeps the last N rows in memory.
vroom tail -n 2 ../test/data/real_world/contacts.csv
sample
Display N random rows from the file. Uses reservoir sampling for memory efficiency.
vroom sample -n 3 ../test/data/real_world/contacts.csv
Use -s for reproducible sampling:
vroom sample -n 3 -s 42 ../test/data/real_world/contacts.csv
select
Select specific columns by name or index.
By name:
vroom select -c Name,Email ../test/data/real_world/contacts.csv
By index (0-based):
vroom select -c 0,2 ../test/data/real_world/contacts.csv
info
Display metadata about a CSV file:
vroom info ../test/data/real_world/contacts.csv
schema
Infer and display the schema (column types) of a CSV file:
vroom schema ../test/data/real_world/contacts.csv
JSON output for scripting:
vroom schema -j ../test/data/real_world/contacts.csv
Use -m to sample a subset of rows for large files:
vroom schema -m 1000 large_file.csvstats
Display statistical summary for each column (count, nulls, min, max, mean for numeric columns):
vroom stats ../test/data/real_world/contacts.csv
JSON output:
vroom stats -j ../test/data/real_world/contacts.csv
pretty
Pretty-print the CSV with aligned columns:
vroom pretty -n 3 ../test/data/real_world/contacts.csv
dialect
Detect and display the CSV dialect (delimiter, quoting style, line endings):
vroom dialect ../test/data/separators/semicolon.csv
JSON output for scripting:
vroom dialect -j ../test/data/separators/tab.csv
Working with Different Delimiters
By default, vroom auto-detects the delimiter. You can also specify it explicitly.
Tab-separated:
vroom count -d tab ../test/data/separators/tab.csv
Semicolon-separated (common in European locales):
vroom head -d semicolon ../test/data/separators/semicolon.csv
Pipe-separated:
vroom select -d pipe -c 0,1 ../test/data/separators/pipe.csv
Working with Quoted Fields
CSV files often contain fields with special characters that require quoting:
vroom pretty ../test/data/quoted/embedded_separators.csv
Fields containing embedded quotes (escaped as ""):
vroom pretty ../test/data/quoted/escaped_quotes.csv
Multi-threaded Parsing
By default, vroom uses all available CPU cores for parallel parsing. You can limit the thread count if needed:
vroom count -t 4 ../test/data/real_world/contacts.csv
Files Without Headers
When the CSV has no header row, use -H:
vroom count -H ../test/data/basic/simple_no_header.csv
vroom select -H -c 0,1,2 ../test/data/basic/simple_no_header.csv
Reading from stdin
Pipe data directly to vroom:
cat ../test/data/basic/simple.csv | vroom count
Use - to explicitly read from stdin:
vroom head - < ../test/data/basic/simple.csv
Strict Mode
Use -S or --strict to exit with code 1 if any parse errors are encountered:
vroom head --strict data.csvThis is useful in scripts where you want to fail fast on malformed data.
Common Workflows
Inspecting an Unknown CSV File
When working with a CSV file of unknown format:
# First, detect the dialect
vroom dialect data.csv
# Then view the structure and sample data
vroom info data.csv
vroom schema data.csv
vroom pretty -n 5 data.csvExtracting Specific Columns
Extract columns for further processing:
# By name (requires header row)
vroom select -c id,email,status data.csv > extracted.csv
# By index (works with or without header)
vroom select -c 0,3,5 data.csv > extracted.csvProcessing Large Files
For large files, vroom automatically uses multiple threads:
# Count rows quickly (uses optimized SIMD row counter)
vroom count large_file.csv
# View last rows without loading entire file into memory
vroom tail -n 20 large_file.csv
# Random sample for quick inspection
vroom sample -n 100 large_file.csv
# Schema inference with sampling for very large files
vroom schema -m 10000 huge_file.csvPipeline Integration
vroom works well in Unix pipelines:
# Filter and process
cat data.csv | vroom select -c name,email | grep "@company.com"
# Chain with other tools
vroom head -n 100 huge.csv | vroom pretty
# Extract stats in JSON for programmatic use
vroom stats -j data.csv | jq '.columns[] | select(.type == "integer")'Working with Non-Standard Formats
For files with non-standard delimiters:
# Colon-separated (e.g., /etc/passwd format)
vroom count -d : passwords.txt
# Custom single-character delimiter
vroom head -d "^" caret_delimited.csvWorking with Non-UTF-8 Files
vroom auto-detects encoding and transcodes to UTF-8:
# Auto-detect encoding (default)
vroom head utf16_file.csv
# Force specific encoding
vroom head -e utf-16le windows_export.csvPerformance Tips
Thread count: By default, vroom uses all available hardware threads. For large files (>1MB), this provides the best performance. Use
-t 1to force single-threaded operation if needed.Row counting: The
countcommand uses an optimized SIMD algorithm that doesn’t build a full index, making it significantly faster than other commands for simply counting rows.Tail command: Uses a streaming approach with a circular buffer, so memory usage scales with output size rather than input file size.
Auto-detection overhead: Auto-detection adds minimal overhead for the first few KB of data. If processing many files with known formats, specifying
-dexplicitly can provide a small performance improvement.stdin vs files: Reading from files allows memory mapping and parallel processing. When possible, prefer file arguments over stdin for large datasets.
Schema/stats sampling: For very large files, use
-mto sample a subset of rows for type inference, which can be much faster while still providing accurate results.Index caching: For files you access repeatedly, enable caching with
--cache. The first parse creates a.vidxcache file; subsequent reads are 2-3x faster by loading the pre-computed index directly. See Index Caching for details.
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error (invalid arguments, file not found, parse error, or dialect detection failure) |
Dialect Detection Details
The dialect command analyzes the CSV structure and reports:
- Delimiter: The field separator character
- Quote: The character used for quoting fields (typically
") - Escape: How quotes are escaped within fields (
double-quotefor""orbackslashfor\") - Line ending: LF (Unix), CRLF (Windows), CR (old Mac), or mixed
- Encoding: Detected character encoding (UTF-8, UTF-16, etc.)
- Has header: Whether the first row appears to be a header
- Columns: Number of columns detected
- Confidence: Detection confidence level (0-100%)
The command also outputs suggested CLI flags for use with other vroom commands.
JSON Output Format
When using the -j flag, the dialect command outputs machine-readable JSON:
{
"delimiter": ",",
"quote": "\"",
"escape": "double",
"line_ending": "LF",
"has_header": true,
"columns": 3,
"confidence": 1
}This format is useful for scripting and automation. For example, you can extract the delimiter for use in other tools:
delimiter=$(vroom dialect -j data.csv | jq -r '.delimiter')