--- title: "AI Integration" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{AI Integration} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(message = FALSE, warning = FALSE) ``` ```{r} library(TextAnalysisR) packageVersion("TextAnalysisR") mydata <- SpecialEduTech[seq_len(20), c("title", "abstract")] united <- unite_cols(mydata, listed_vars = c("title", "abstract")) toks <- prep_texts(united, text_field = "united_texts") dfm <- quanteda::dfm(toks) extract_keywords_tfidf(dfm, top_n = 10) ``` TextAnalysisR provides AI features via cloud-based providers. On the hosted web app, Gemini usage is free, supported by the Google Cloud Research program. OpenAI calls require a personal API key. ## Providers | Provider | Type | API Key | Best For | |----------|------|---------|----------| | OpenAI | Web-based | OPENAI_API_KEY | Quality, speed | | [Gemini](https://ai.google.dev/) | Web-based | None on hosted app; otherwise GEMINI_API_KEY | Quality, speed | | [spaCy](https://spacy.io/) | Local | None | Linguistic analysis | | Transformers | Local | None | Embeddings, sentiment | ## Setup Set keys via `.Renviron` (persistent) or `Sys.setenv()` (session). The cloud chat, embedding, and RAG functions (`call_llm_api()`, `call_openai_chat()`, `call_gemini_chat()`, `get_api_embeddings()`, `run_rag_search()`) require an API key and network access; see their reference pages for usage. ## Default Models | Provider | Chat Model | Embedding Model | |----------|------------|-----------------| | OpenAI | gpt-4.1-mini | text-embedding-3-small | | Gemini | gemini-2.5-flash | gemini-embedding-001 | | Local | - | all-MiniLM-L6-v2 | ## Responsible AI Design All AI features follow [NIST AI Risk Management Framework](https://www.nist.gov/itl/ai-risk-management-framework) principles: | Principle | Implementation | |-----------|----------------| | Human oversight | AI suggests; review and approve | | User control | Edit, regenerate, or override any output | | Transparency | View prompts and parameters used | | Privacy | Local sentence-transformers and spaCy options for sensitive data | | Grounding | Content based on input data, not generic knowledge |