Bootstrap a dfm — bootstrap

Create an array of resampled dfms.

bootstrap_dfm(x, n = 10, ..., verbose = quanteda_options("verbose"))

Arguments

x: a dfm object
n: number of resamples
...: additional arguments passed to dfm()
verbose: if TRUE print status messages

Value

A named list of dfm objects, where the first, dfm_0, is the dfm from the original texts, and subsequent elements are the sentence-resampled dfms.

Details

Function produces multiple, resampled dfm objects, based on resampling sentences (with replacement) from each document, recombining these into new "documents" and computing a dfm for each. Resampling of sentences is done strictly within document, so that every resampled document will contain at least some of its original tokens.

Author

Kenneth Benoit

Examples

# bootstrapping from the original text
set.seed(10)
txt <- c(textone = "This is a sentence.  Another sentence.  Yet another.",
         texttwo = "Premiere phrase.  Deuxieme phrase.")
dfmat <- dfm(tokens(txt))
bootstrap_dfm(dfmat, n = 3, verbose = TRUE)
#> Bootstrapping dfm to create multiple dfm objects...
#>    ...resampling and forming dfms: 0
#> , 1
#> , 2
#> , 3
#> 
#>    ...complete.
#> $dfm_0
#> Document-feature matrix of: 2 documents, 10 features (45.00% sparse) and 0 docvars.
#>          features
#> docs      this is a sentence . another yet premiere phrase deuxieme
#>   textone    1  1 1        2 3       2   1        0      0        0
#>   texttwo    0  0 0        0 2       0   0        1      2        1
#> 
#> $dfm_1
#> Document-feature matrix of: 2 documents, 10 features (45.00% sparse) and 0 docvars.
#>          features
#> docs      this is a sentence . another yet premiere phrase deuxieme
#>   textone    1  1 1        2 3       2   1        0      0        0
#>   texttwo    0  0 0        0 2       0   0        1      2        1
#> 
#> $dfm_2
#> Document-feature matrix of: 2 documents, 10 features (45.00% sparse) and 0 docvars.
#>          features
#> docs      this is a sentence . another yet premiere phrase deuxieme
#>   textone    1  1 1        2 3       2   1        0      0        0
#>   texttwo    0  0 0        0 2       0   0        1      2        1
#> 
#> $dfm_3
#> Document-feature matrix of: 2 documents, 10 features (45.00% sparse) and 0 docvars.
#>          features
#> docs      this is a sentence . another yet premiere phrase deuxieme
#>   textone    1  1 1        2 3       2   1        0      0        0
#>   texttwo    0  0 0        0 2       0   0        1      2        1
#> 
#> attr(,"class")
#> [1] "dfm_bootstrap"