library(TextAnalysisR)
sample_text <- c(
"Figure 1 shows the distribution of student outcomes.",
"Table 2 reports the effect sizes for each intervention."
)
toks <- prep_texts(
data.frame(united_texts = sample_text),
text_field = "united_texts"
)
quanteda::ntoken(toks)## text1 text2
## 7 8
Extract text from PDFs with charts, diagrams, and images using vision AI. R-native pipeline – no Python required.
pdftools::pdf_text()
(R-native)pdftools::pdf_render_page()process_pdf_unified() runs the full pipeline with
automatic fallback:
describe_image() describes a single base64-encoded PNG.
Both require a vision-provider API key (OpenAI/Gemini) and network
access; see their reference pages for usage.
| Provider | Cost | Privacy | Accuracy | Setup |
|---|---|---|---|---|
| OpenAI | Per use | Cloud | Best | API key |
| Gemini | Free on hosted app (Google Cloud Research); otherwise per use | Cloud | Best | API key |