--- title: "Python Environment" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Python Environment} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(message = FALSE, warning = FALSE) ``` ```{r} library(TextAnalysisR) tokens <- quanteda::tokens(SpecialEduTech$abstract[1:5]) dispersion <- calculate_lexical_dispersion( tokens, terms = c("learning", "instruction") ) head(dispersion) ``` Python enables features: NLP with spaCy, embeddings, and neural sentiment analysis. ## Quick Setup `setup_python_env()` automatically: 1. Creates virtual environment `textanalysisr-env` 2. Installs **spacy** and **pdfplumber** 3. Downloads spaCy English model (`en_core_web_sm`) Uses virtualenv (or conda if available). ## Check Status Run `check_python_env()` to verify the environment. ## Common Issues ### "Another Python already initialized" Set preferred environment in `.Rprofile` with `Sys.setenv(RETICULATE_PYTHON_ENV = "textanalysisr-env")`, then restart R. ### Environment in OneDrive Avoid OneDrive paths. Use `setup_python_env(envname = "textanalysisr-env")`. ## spaCy Models The default `en_core_web_sm` model is installed automatically. For word vectors (similarity), the medium model is 91 MB and the large model is 560 MB: ```bash python -m spacy download en_core_web_md python -m spacy download en_core_web_lg ``` ## Deep Learning (Optional) For embeddings and neural sentiment: ```bash pip install sentence-transformers transformers torch ``` ## Diagnostics Use `reticulate::py_config()` and `reticulate::virtualenv_list()` to inspect the active Python.