Skip to content

This is useful for benchmarking, but also for bug reports when you cannot share the real dataset.

Usage

gen_tbl(
  rows,
  cols = NULL,
  col_types = NULL,
  locale = default_locale(),
  missing = 0
)

Arguments

rows

Number of rows to generate

cols

Number of columns to generate, if NULL this is derived from col_types.

col_types

One of NULL, a cols() specification, or a string. See vignette("readr") for more details.

If NULL, all column types will be imputed from guess_max rows on the input interspersed throughout the file. This is convenient (and fast), but not robust. If the imputation fails, you'll need to increase the guess_max or supply the correct types yourself.

Column specifications created by list() or cols() must contain one column specification for each column. If you only want to read a subset of the columns, use cols_only().

Alternatively, you can use a compact string representation where each character represents one column:

  • c = character

  • i = integer

  • n = number

  • d = double

  • l = logical

  • f = factor

  • D = date

  • T = date time

  • t = time

  • ? = guess

  • _ or - = skip

    By default, reading a file without a column specification will print a message showing what readr guessed they were. To remove this message, set show_col_types = FALSE or set `options(readr.show_col_types = FALSE).

locale

The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use locale() to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.

missing

The percentage (from 0 to 1) of missing data to use

Details

There is also a family of functions to generate individual vectors of each type.

See also

generators to generate individual vectors.

Examples

# random 10 x 5 table with random column types
rand_tbl <- gen_tbl(10, 5)
rand_tbl
#> # A tibble: 10 × 5
#>    X1                      X2 X3         X4               X5        
#>    <dttm>               <dbl> <date>     <fct>            <date>    
#>  1 2009-01-31 06:24:40  0.758 2010-03-19 late_mule        2013-01-25
#>  2 2017-12-13 05:55:42  1.77  2019-12-08 wrong_rhinoceros 2007-02-11
#>  3 2014-11-22 19:29:02 -0.452 2009-11-29 careful_hyena    2010-05-25
#>  4 2019-09-20 16:08:25 -0.754 2008-04-01 grumpy_parakeet  2002-12-13
#>  5 2005-12-17 04:11:37  0.894 2011-02-22 cool_mustang     2013-05-30
#>  6 2007-09-10 00:47:47 -1.09  2005-04-17 cool_mustang     2005-09-19
#>  7 2001-11-27 00:05:08 -1.98  2005-03-12 tasteless_bear   2019-08-22
#>  8 2003-12-10 04:09:50  0.836 2005-12-10 worried_capybara 2005-08-19
#>  9 2005-05-23 21:23:23  0.306 2015-09-03 ugliest_bighorn  2001-07-22
#> 10 2017-11-08 12:30:21 -1.09  2016-08-19 wrong_rhinoceros 2011-12-27

# all double 25 x 4 table
dbl_tbl <- gen_tbl(25, 4, col_types = "dddd")
dbl_tbl
#> # A tibble: 25 × 4
#>         X1       X2      X3     X4
#>      <dbl>    <dbl>   <dbl>  <dbl>
#>  1  0.346  -0.284    2.08    0.548
#>  2  1.13    0.515    1.27    2.46 
#>  3  1.95   -0.303   -0.526   1.40 
#>  4 -1.19   -0.937    0.269  -0.497
#>  5  1.47   -0.123    0.230  -1.20 
#>  6 -0.870   0.655    1.59    0.196
#>  7 -0.0478  0.00685 -0.0684 -0.736
#>  8  0.133   0.328   -0.781  -0.546
#>  9  0.660  -0.0750  -0.0439 -0.715
#> 10  1.77   -0.738    0.0904 -0.165
#> # … with 15 more rows

# Use the dots in long form column types to change the random function and options
types <- rep(times = 4, list(col_double(f = stats::runif, min = -10, max = 25)))
types
#> [[1]]
#> <collector_double>
#> 
#> [[2]]
#> <collector_double>
#> 
#> [[3]]
#> <collector_double>
#> 
#> [[4]]
#> <collector_double>
#> 
dbl_tbl2 <- gen_tbl(25, 4, col_types = types)
dbl_tbl2
#> # A tibble: 25 × 4
#>       X1      X2    X3    X4
#>    <dbl>   <dbl> <dbl> <dbl>
#>  1  2.95 -0.0731 -4.24  3.74
#>  2 -1.57  2.44   14.1  -8.46
#>  3 23.4  -7.91   -2.45 -5.99
#>  4 22.8  22.9    -4.50 -3.82
#>  5 19.5   5.42    5.67 -7.27
#>  6 23.8  -8.82   -7.68 -3.05
#>  7  2.57 -5.99   16.7  14.8 
#>  8  7.91 21.1    -5.34 10.7 
#>  9 21.0   8.98   15.4  17.9 
#> 10 18.3  -6.05   -9.38 -7.76
#> # … with 15 more rows