This is useful for benchmarking, but also for bug reports when you cannot share the real dataset.
Usage
gen_tbl(
rows,
cols = NULL,
col_types = NULL,
locale = default_locale(),
missing = 0
)Arguments
- rows
Number of rows to generate
- cols
Number of columns to generate, if
NULLthis is derived fromcol_types.- col_types
One of
NULL, acols()specification, or a string. Seevignette("readr")for more details.If
NULL, all column types will be imputed fromguess_maxrows on the input interspersed throughout the file. This is convenient (and fast), but not robust. If the imputation fails, you'll need to increase theguess_maxor supply the correct types yourself.Column specifications created by
list()orcols()must contain one column specification for each column. If you only want to read a subset of the columns, usecols_only().Alternatively, you can use a compact string representation where each character represents one column:
c = character
i = integer
n = number
d = double
l = logical
f = factor
D = date
T = date time
t = time
? = guess
_ or - = skip
By default, reading a file without a column specification will print a message showing what
readrguessed they were. To remove this message, setshow_col_types = FALSEor set `options(readr.show_col_types = FALSE).
- locale
The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use
locale()to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.- missing
The percentage (from 0 to 1) of missing data to use
See also
generators to generate individual vectors.
Examples
# random 10 x 5 table with random column types
rand_tbl <- gen_tbl(10, 5)
rand_tbl
#> # A tibble: 10 × 5
#> X1 X2 X3 X4 X5
#> <dttm> <dbl> <date> <fct> <date>
#> 1 2009-01-31 06:24:40 0.758 2010-03-19 late_mule 2013-01-25
#> 2 2017-12-13 05:55:42 1.77 2019-12-08 wrong_rhinoceros 2007-02-11
#> 3 2014-11-22 19:29:02 -0.452 2009-11-29 careful_hyena 2010-05-25
#> 4 2019-09-20 16:08:25 -0.754 2008-04-01 grumpy_parakeet 2002-12-13
#> 5 2005-12-17 04:11:37 0.894 2011-02-22 cool_mustang 2013-05-30
#> 6 2007-09-10 00:47:47 -1.09 2005-04-17 cool_mustang 2005-09-19
#> 7 2001-11-27 00:05:08 -1.98 2005-03-12 tasteless_bear 2019-08-22
#> 8 2003-12-10 04:09:50 0.836 2005-12-10 worried_capybara 2005-08-19
#> 9 2005-05-23 21:23:23 0.306 2015-09-03 ugliest_bighorn 2001-07-22
#> 10 2017-11-08 12:30:21 -1.09 2016-08-19 wrong_rhinoceros 2011-12-27
# all double 25 x 4 table
dbl_tbl <- gen_tbl(25, 4, col_types = "dddd")
dbl_tbl
#> # A tibble: 25 × 4
#> X1 X2 X3 X4
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.346 -0.284 2.08 0.548
#> 2 1.13 0.515 1.27 2.46
#> 3 1.95 -0.303 -0.526 1.40
#> 4 -1.19 -0.937 0.269 -0.497
#> 5 1.47 -0.123 0.230 -1.20
#> 6 -0.870 0.655 1.59 0.196
#> 7 -0.0478 0.00685 -0.0684 -0.736
#> 8 0.133 0.328 -0.781 -0.546
#> 9 0.660 -0.0750 -0.0439 -0.715
#> 10 1.77 -0.738 0.0904 -0.165
#> # … with 15 more rows
# Use the dots in long form column types to change the random function and options
types <- rep(times = 4, list(col_double(f = stats::runif, min = -10, max = 25)))
types
#> [[1]]
#> <collector_double>
#>
#> [[2]]
#> <collector_double>
#>
#> [[3]]
#> <collector_double>
#>
#> [[4]]
#> <collector_double>
#>
dbl_tbl2 <- gen_tbl(25, 4, col_types = types)
dbl_tbl2
#> # A tibble: 25 × 4
#> X1 X2 X3 X4
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2.95 -0.0731 -4.24 3.74
#> 2 -1.57 2.44 14.1 -8.46
#> 3 23.4 -7.91 -2.45 -5.99
#> 4 22.8 22.9 -4.50 -3.82
#> 5 19.5 5.42 5.67 -7.27
#> 6 23.8 -8.82 -7.68 -3.05
#> 7 2.57 -5.99 16.7 14.8
#> 8 7.91 21.1 -5.34 10.7
#> 9 21.0 8.98 15.4 17.9
#> 10 18.3 -6.05 -9.38 -7.76
#> # … with 15 more rows
