Locates a pattern within a tokens object, returning the index positions of the beginning and ending tokens in the pattern.
index(
x,
pattern,
valuetype = c("glob", "regex", "fixed"),
case_insensitive = TRUE
)
is.index(x)
an input tokens object
a character vector, list of character vectors, dictionary, or collocations object. See pattern for details.
the type of pattern matching: "glob"
for "glob"-style
wildcard expressions; "regex"
for regular expressions; or "fixed"
for
exact matching. See valuetype for details.
logical; if TRUE
, ignore case when matching a
pattern
or dictionary values
a data.frame consisting of one row per pattern match, with columns
for the document name, index positions from
and to
, and the pattern
matched.
is.index
returns TRUE
if the object was created by
index()
; FALSE
otherwise.
toks <- tokens(data_corpus_inaugural[1:8])
index(toks, pattern = "secure*")
#> docname from to pattern
#> 1 1797-Adams 478 478 secure*
#> 2 1797-Adams 1512 1512 secure*
#> 3 1805-Jefferson 2367 2367 secure*
#> 4 1817-Monroe 1754 1754 secure*
#> 5 1817-Monroe 1814 1814 secure*
#> 6 1817-Monroe 3009 3009 secure*
index(toks, pattern = c("secure*", phrase("united states"))) |> head()
#> docname from to pattern
#> 1 1789-Washington 433 434 united states
#> 2 1789-Washington 529 530 united states
#> 3 1797-Adams 478 478 secure*
#> 4 1797-Adams 524 525 united states
#> 5 1797-Adams 1512 1512 secure*
#> 6 1797-Adams 1716 1717 united states