Package-level

quanteda-package quanteda

An R package for the quantitative analysis of textual data

quanteda_options()

Get or set package options for quanteda

Data

Built-in data objects.

data_char_sampletext

A paragraph of text for testing various text-based functions

data_char_ukimmig2010

Immigration-related sections of 2010 UK party manifestos

data_corpus_inaugural

US presidential inaugural address texts

data_dfm_lbgexample

dfm from data in Table 1 of Laver, Benoit, and Garry (2003)

data_dictionary_LSD2015

Lexicoder Sentiment Dictionary (2015)

data-relocated data_corpus_dailnoconf1991 data_corpus_irishbudget2010

Formerly included data objects

Corpus functions

Functions for constructing and manipulating corpus class objects.

corpus()

Construct a corpus object

corpus_group()

Combine documents in corpus by a grouping variable

corpus_reshape()

Recast the document units of a corpus

corpus_sample()

Randomly sample documents from a corpus

corpus_segment() char_segment()

Segment texts on a pattern match

corpus_subset()

Extract a subset of a corpus

corpus_trim() char_trim()

Remove sentences based on their token lengths or a pattern match

docvars() `docvars<-`() `$`(<corpus>) `$<-`(<corpus>) `$`(<tokens>) `$<-`(<tokens>) `$`(<dfm>) `$<-`(<dfm>)

Get or set document-level variables

as.character(<corpus>) is.corpus() as.corpus()

Coercion and checking methods for corpus objects

Tokens functions

Functions for constructing and manipulating tokens class objects.

tokens()

Construct a tokens object

tokens_chunk()

Segment tokens object by chunks of a given size

tokens_compound()

Convert token sequences into compound tokens

tokens_group()

Combine documents in a tokens object by a grouping variable

tokens_lookup()

Apply a dictionary to a tokens object

tokens_ngrams() char_ngrams() tokens_skipgrams()

Create n-grams and skip-grams from tokens

tokens_replace()

Replace tokens in a tokens object

tokens_sample()

Randomly sample documents from a tokens object

tokens_select() tokens_remove() tokens_keep()

Select or remove tokens from a tokens object

tokens_split()

Split tokens by a separator pattern

tokens_subset()

Extract a subset of a tokens

tokens_tolower() tokens_toupper()

Convert the case of tokens

tokens_wordstem() char_wordstem() dfm_wordstem()

Stem the terms in an object

is.tokens_xptr() as.tokens_xptr()

Methods for tokens_xptr objects

types()

Get word types from a tokens object

concat() concatenator()

Return the concatenator character from an object

as.list(<tokens>) as.character(<tokens>) is.tokens() as.tokens()

Coercion, checking, and combining functions for tokens objects

Character functions

Functions for constructing and manipulating character objects.

char_tolower() char_toupper()

Convert the case of character objects

corpus_segment() char_segment()

Segment texts on a pattern match

tokens_ngrams() char_ngrams() tokens_skipgrams()

Create n-grams and skip-grams from tokens

char_select() char_remove() char_keep()

Select or remove elements from a character vector

corpus_trim() char_trim()

Remove sentences based on their token lengths or a pattern match

tokens_wordstem() char_wordstem() dfm_wordstem()

Stem the terms in an object

Text matrix functions

Functions for constructing and manipulating a document-feature matrix (dfm) or feature co-occurrence matrix object.

dfm()

Create a document-feature matrix

dfm_compress() fcm_compress()

Recombine a dfm or fcm by combining identical dimension elements

dfm_group()

Combine documents in a dfm by a grouping variable

dfm_lookup()

Apply a dictionary to a dfm

dfm_match()

Match the feature set of a dfm to given feature names

dfm_replace()

Replace features in dfm

dfm_sample()

Randomly sample documents from a dfm

dfm_select() dfm_remove() dfm_keep() fcm_select() fcm_remove() fcm_keep()

Select features from a dfm or fcm

dfm_sort()

Sort a dfm by frequency of one or more margins

dfm_subset()

Extract a subset of a dfm

dfm_tfidf()

Weight a dfm by tf-idf

dfm_tolower() dfm_toupper() fcm_tolower() fcm_toupper()

Convert the case of the features of a dfm and combine

dfm_trim()

Trim a dfm using frequency threshold-based feature selection

dfm_weight() dfm_smooth()

Weight the feature frequencies in a dfm

tokens_wordstem() char_wordstem() dfm_wordstem()

Stem the terms in an object

docfreq()

Compute the (weighted) document frequency of a feature

featfreq()

Compute the frequencies of features

head(<dfm>) tail(<dfm>)

Return the first or last part of a dfm

as.dfm() is.dfm()

Coercion and checking functions for dfm objects

as.matrix(<dfm>)

Coerce a dfm to a matrix or data.frame

fcm()

Create a feature co-occurrence matrix

fcm_sort()

Sort an fcm in alphabetical order of the features

as.fcm()

Coercion and checking functions for fcm objects

Dictionary functions

Constructor and utility functions for working with dictionaries.

dictionary()

Create a dictionary

as.dictionary() is.dictionary()

Coercion and checking functions for dictionary objects

as.yaml()

Convert quanteda dictionary objects to the YAML format

Phrase discovery functions

Functions for exploring and detecting keywords and phrases.

is.collocations()

Check if an object is collocations

kwic() is.kwic() as.data.frame(<kwic>)

Locate keywords-in-context

Utility functions

R-like functions to return counts and object information.

index() is.index()

Locate a pattern in a tokens object

ndoc() nfeat()

Count the number of documents or features

nsentence()

Count the number of sentences

ntoken() ntype()

Count the number of tokens or types

print(<corpus>) print(<dfm>) print(<dictionary2>) print(<fcm>) print(<kwic>) print(<tokens>)

Print methods for quanteda core objects

docnames() `docnames<-`() docid() segid()

Get or set document names

featnames()

Get the feature labels from a dfm

Miscellaneous functions

phrase() as.phrase() is.phrase()

Declare a pattern to be a sequence of separate patterns

convert()

Convert quanteda objects to non-quanteda formats

bootstrap_dfm()

Bootstrap a dfm

meta() `meta<-`()

Get or set object metadata

spacyr-methods

Extensions for and from spacy_parse objects

Statistics, models, and plots

Functions for computing statistics, fitting models, and producing visualisations models from text.

sparsity()

Compute the sparsity of a document-feature matrix

topfeatures()

Identify the most frequent features in a dfm

textmodels

Models for scaling and classification of textual data

textplots

Plots for textual data

textstats

Statistics for textual data