library(TextAnalysisR)
tokens <- quanteda::tokens(SpecialEduTech$abstract[1:5])
dispersion <- calculate_lexical_dispersion(
tokens,
terms = c("learning", "instruction")
)
head(dispersion)## doc_id term position doc_length
## 1 text2 learning 0.5200000 25
## 2 text3 learning 0.3714286 35
## 3 text3 instruction 0.3142857 35
## 4 text3 instruction 0.6857143 35
## 5 text5 learning 0.7904762 105
Python enables features: NLP with spaCy, embeddings, and neural sentiment analysis.
setup_python_env() automatically:
textanalysisr-enven_core_web_sm)Uses virtualenv (or conda if available).
Run check_python_env() to verify the environment.
Set preferred environment in .Rprofile with
Sys.setenv(RETICULATE_PYTHON_ENV = "textanalysisr-env"),
then restart R.
Avoid OneDrive paths. Use
setup_python_env(envname = "textanalysisr-env").
The default en_core_web_sm model is installed
automatically. For word vectors (similarity), the medium model is 91 MB
and the large model is 560 MB:
For embeddings and neural sentiment:
Use reticulate::py_config() and
reticulate::virtualenv_list() to inspect the active
Python.