--- title: "Multimodal Analysis" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Multimodal Analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(message = FALSE, warning = FALSE) ``` ```{r} library(TextAnalysisR) sample_text <- c( "Figure 1 shows the distribution of student outcomes.", "Table 2 reports the effect sizes for each intervention." ) toks <- prep_texts( data.frame(united_texts = sample_text), text_field = "united_texts" ) quanteda::ntoken(toks) ``` Extract text from PDFs with charts, diagrams, and images using vision AI. R-native pipeline -- no Python required. ## How It Works 1. Extracts text from each page using `pdftools::pdf_text()` (R-native) 2. Renders each page as a PNG image via `pdftools::pdf_render_page()` 3. Identifies sparse-text pages (< 500 characters) that likely contain figures 4. Sends only those pages to a vision LLM for description 5. Merges extracted text + image descriptions into a single text corpus ## Functions `process_pdf_unified()` runs the full pipeline with automatic fallback: 1. **Multimodal** (pdftools + vision LLM) -- extracts text and describes visual content 2. **Text-only** (pdftools) -- fallback when no vision provider is set `describe_image()` describes a single base64-encoded PNG. Both require a vision-provider API key (OpenAI/Gemini) and network access; see their reference pages for usage. ## Provider Comparison | Provider | Cost | Privacy | Accuracy | Setup | |----------|------|---------|----------|-------| | OpenAI | Per use | Cloud | Best | API key | | Gemini | Free on hosted app (Google Cloud Research); otherwise per use | Cloud | Best | API key |