Package: TextAnalysisR 0.1.4

TextAnalysisR: A Text Mining Workflow Tool

Provides a text mining and natural language processing workflow for documents. Includes preprocessing via 'quanteda', lexical analysis (term frequency-inverse document frequency, log-odds ratios, lexical diversity) via 'tidytext', topic modeling via 'stm' and the 'BERTopic' approach, semantic similarity and document clustering on transformer representations, an interactive 'Shiny' interface with 'ggplot2' visualization, optional 'spaCy' preprocessing, and local 'sentence-transformers' or web-based ('OpenAI', 'Gemini') model providers for retrieval-augmented generation, as described in Shin et al. (2026) <doi:10.1177/07319487251412879>.

Authors:Mikyung Shin [aut, cre]

TextAnalysisR_0.1.4.tar.gz
TextAnalysisR_0.1.4.zip(r-4.7)TextAnalysisR_0.1.4.zip(r-4.6)TextAnalysisR_0.1.4.zip(r-4.5)
TextAnalysisR_0.1.4.tgz(r-4.6-any)TextAnalysisR_0.1.4.tgz(r-4.5-any)
TextAnalysisR_0.1.4.tar.gz(r-4.7-any)TextAnalysisR_0.1.4.tar.gz(r-4.6-any)
TextAnalysisR_0.1.4.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
TextAnalysisR/json (API)

# Install 'TextAnalysisR' in R:
install.packages('TextAnalysisR', repos = c('https://mshin77.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/mshin77/textanalysisr/issues

Pkgdown/docs site:https://mshin77.github.io

Datasets:

On CRAN:

Conda:

bert-modellexiconollamapythonsemantictext-miningtopic-modelingword-networks

5.48 score 1 stars 34 scripts 135 exports 84 dependencies

Last updated from:8d75b3d589. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK240
source / vignettesOK312
linux-release-x86_64OK240
macos-release-arm64OK114
macos-oldrel-arm64OK164
windows-develOK160
windows-releaseOK175
windows-oldrelOK165
wasm-releaseOK164

Exports:%>%analyze_document_clusteringanalyze_semantic_evolutionanalyze_sentimentanalyze_sentiment_llmanalyze_similarity_gapsassess_embedding_stabilityauto_tune_embedding_topicscalculate_assignment_consistencycalculate_clustering_metricscalculate_cosine_similaritycalculate_cross_similaritycalculate_dispersion_metricscalculate_document_similaritycalculate_keyword_stabilitycalculate_lexical_dispersioncalculate_log_odds_ratiocalculate_semantic_driftcalculate_similarity_robustcalculate_text_readabilitycalculate_topic_probabilitycalculate_topic_stabilitycalculate_weighted_log_oddscalculate_word_frequencycall_llm_apicheck_python_envcheck_vision_modelsclear_lexdiv_cachecluster_embeddingsdescribe_imagedetect_multi_wordsdetect_pdf_content_typedetect_pdf_content_type_pyexport_document_clusteringextract_cross_category_similaritiesextract_keywords_keynessextract_keywords_tfidfextract_morphologyextract_named_entitiesextract_noun_chunksextract_pos_tagsextract_subjects_objectsextract_tables_from_pdf_pyextract_topic_terms_dffind_optimal_kfind_similar_wordsfind_topic_matchesfit_embedding_modelfit_embedding_topicsfit_semantic_modelfit_temporal_modelfit_topic_prevalence_modelgenerate_cluster_labelsgenerate_cluster_labels_autogenerate_embeddingsgenerate_topic_contentgenerate_topic_labelsget_available_dfmget_available_tokensget_best_embeddingsget_content_type_promptget_content_type_user_templateget_sentencesget_sentiment_colorget_sentiment_colorsget_spacy_model_infoget_topic_prevalenceget_topic_termsget_topic_textsget_word_similarityidentify_topic_trendsimport_filesinit_spacy_nlplemmatize_tokenslexical_diversity_analysislexical_frequency_analysisplot_cluster_termsplot_cross_category_heatmapplot_document_sentiment_trajectoryplot_emotion_radarplot_entity_frequenciesplot_keyness_keywordsplot_keyword_comparisonplot_lexical_dispersionplot_lexical_diversity_distributionplot_log_odds_ratioplot_model_comparisonplot_morphology_featureplot_mwe_frequencyplot_ngram_frequencyplot_pos_frequenciesplot_quality_metricsplot_readability_by_groupplot_readability_distributionplot_semantic_vizplot_sentiment_boxplotplot_sentiment_by_categoryplot_sentiment_distributionplot_sentiment_violinplot_similarity_heatmapplot_term_trends_continuousplot_tfidf_keywordsplot_top_readability_documentsplot_topic_effects_categoricalplot_topic_effects_continuousplot_topic_probabilityplot_weighted_log_oddsplot_word_frequencyplot_word_probabilityprep_textsprocess_pdf_unifiedreduce_dimensionsrender_displacy_deprender_displacy_entrun_apprun_neural_topics_internalrun_rag_searchrun_text_workflowsemantic_document_clusteringsemantic_similarity_analysissentiment_embedding_analysissentiment_lexicon_analysissetup_python_envshow_web_bannerspacy_extract_entitiesspacy_has_vectorsspacy_initializedspacy_lemmatizespacy_parse_fullsummarize_morphologyunite_colsvalidate_cross_modelsvalidate_semantic_coherenceword_co_occurrence_networkword_correlation_network

Dependencies:backportsbase64encbroombslibcachemclicommonmarkcpp11crosstalkdigestdplyrDTevaluatefarverfastmapfastmatchfontawesomefsgenericsggplot2gluegtablehighrhtmltoolshtmlwidgetshttpuvigraphisobandISOcodesjaneaustenrjquerylibjsonliteknitrlabelinglaterlatticelazyevallifecyclemagrittrMatrixmemoisemimensyllableotelpillarpkgconfigplyrpromisesproxyCpurrrquantedaquanteda.textstatsR6rappdirsRColorBrewerRcppRcppArmadilloreshape2rlangrmarkdownS7sassscalesshinySnowballCsourcetoolsstopwordsstringistringrtibbletidyrtidyselecttidytexttinytextokenizersutf8vctrsviridisLitewidyrwithrxfunxml2xtableyaml

Semantic Analysis
Setup | Word Co-occurrence | Word Correlation | Document Similarity | Comparative Analysis | Semantic Search | Sentiment & Emotion | Document Groups

Last update: 2026-06-03
Started: 2025-11-30

Accessibility
Keyboard Navigation | Visual Features | Screen Reader Support | Language Support

Last update: 2026-06-02
Started: 2025-11-30

AI Integration
Providers | Setup | Default Models | Responsible AI Design

Last update: 2026-06-02
Started: 2025-12-29

Installation
R Package | Web App | Optional Features | Linguistic Analysis (spaCy) | Python Features | Troubleshooting

Last update: 2026-06-02
Started: 2025-11-30

Lexical Analysis
Setup | Linguistic Annotation | Part-of-Speech Tags | Morphological Features | Named Entity Recognition | Frequency Trends | Keywords | TF-IDF | Statistical Keyness | Comparison | Lexical Diversity | Readability | Log Odds Ratio | Lexical Dispersion | Multi-Word Expressions

Last update: 2026-06-02
Started: 2025-11-30

Multimodal Analysis
How It Works | Functions | Provider Comparison

Last update: 2026-06-02
Started: 2025-11-30

Python Environment
Quick Setup | Check Status | Common Issues | "Another Python already initialized" | Environment in OneDrive | spaCy Models | Deep Learning (Optional) | Diagnostics

Last update: 2026-06-02
Started: 2025-11-30

Security
Input Validation | API Key Security | Network Security | Data Protection | Infrastructure

Last update: 2026-06-02
Started: 2025-11-30

Topic Modeling
Setup | Model Configuration | Word-Topic | Content Generation | Document-Topic | Quotes | Estimated Effects | Categorical Covariates | Continuous Covariates | Embedding-based Topics | STM vs Embedding

Last update: 2026-06-02
Started: 2025-11-30

Preprocessing
Workflow | Multi-word Expressions

Last update: 2026-05-28
Started: 2025-11-30

Getting Started
Install | Launch App | Quick Example | Features

Last update: 2026-05-27
Started: 2025-11-30

Readme and manuals

Help Manual

Help pageTopics
Acronym Listacronym
Analyze Document Clusteringanalyze_document_clustering
Analyze Semantic Evolutionanalyze_semantic_evolution
Analyze Text Sentimentanalyze_sentiment
LLM-based Sentiment Analysisanalyze_sentiment_llm
Analyze Similarity Gaps Between Categoriesanalyze_similarity_gaps
Assess Embedding Topic Model Stabilityassess_embedding_stability
Auto-tune BERTopic Hyperparametersauto_tune_embedding_topics
Calculate Assignment Consistencycalculate_assignment_consistency
Calculate Clustering Quality Metricscalculate_clustering_metrics
Calculate Cross-Matrix Cosine Similaritycalculate_cross_similarity
Calculate Dispersion Metricscalculate_dispersion_metrics
Calculate Keyword Stabilitycalculate_keyword_stability
Calculate Lexical Dispersioncalculate_lexical_dispersion
Calculate Log Odds Ratio Between Categoriescalculate_log_odds_ratio
Calculate Semantic Driftcalculate_semantic_drift
Calculate Document Similarity with Fallbackscalculate_similarity_robust
Calculate Text Readabilitycalculate_text_readability
Calculate Topic Probabilitiescalculate_topic_probability
Calculate Topic Stabilitycalculate_topic_stability
Calculate Weighted Log Odds Ratiocalculate_weighted_log_odds
Analyze and Visualize Word Frequencies Across a Continuous Variablecalculate_word_frequency
Call LLM API (Unified Wrapper)call_llm_api
Clear Lexical Diversity Cacheclear_lexdiv_cache
Embedding-based Document Clusteringcluster_embeddings
Describe Image Using Vision LLMdescribe_image
Detect Multi-Word Expressionsdetect_multi_words
Detect PDF Content Typedetect_pdf_content_type
Export Document Clustering Analysisexport_document_clustering
Extract Cross-Category Similarities from Full Similarity Matrixextract_cross_category_similarities
Extract Keywords Using Statistical Keynessextract_keywords_keyness
Extract Keywords Using TF-IDFextract_keywords_tfidf
Extract Morphological Featuresextract_morphology
Extract Named Entities from Tokensextract_named_entities
Extract Noun Chunksextract_noun_chunks
Extract Part-of-Speech Tags from Tokensextract_pos_tags
Extract Subjects and Objectsextract_subjects_objects
Build a topic-term data frame from any supported topic modelextract_topic_terms_df
Find Optimal Number of Topicsfind_optimal_k
Find Similar Wordsfind_similar_words
Find Similar Topicsfind_topic_matches
Fit Embedding-based Topic Modelfit_embedding_model
Fit Semantic Modelfit_semantic_model
Fit Temporal Topic Modelfit_temporal_model
Fit Topic Prevalence Modelfit_topic_prevalence_model
Generate Cluster Label Suggestions (Human-in-the-Loop)generate_cluster_labels
Generate Cluster Labelsgenerate_cluster_labels_auto
Generate Embeddingsgenerate_embeddings
Generate Content from Topic Termsgenerate_topic_content
Generate Topic Labels Using AIgenerate_topic_labels
Get Best Available Embeddingsget_best_embeddings
Get Default System Prompt for Content Typeget_content_type_prompt
Get Default User Prompt Template for Content Typeget_content_type_user_template
Get Sentencesget_sentences
Get spaCy Model Informationget_spacy_model_info
Get Topic Prevalence (Gamma) from STM Modelget_topic_prevalence
Select Top Terms for Each Topicget_topic_terms
Convert Topic Terms to Text Stringsget_topic_texts
Calculate Word Similarityget_word_similarity
Identify Topic Trendsidentify_topic_trends
Process Filesimport_files
Lemmatize Tokens with Batch Processinglemmatize_tokens
Lexical Analysis Functionslexical_analysis
Lexical Diversity Analysislexical_diversity_analysis
Lexical Frequency Analysislexical_frequency_analysis
Plot Cluster Top Termsplot_cluster_terms
Plot Cross-Category Similarity Comparisonplot_cross_category_heatmap
Plot Document Sentiment Trajectoryplot_document_sentiment_trajectory
Plot Emotion Radar Chartplot_emotion_radar
Plot Named Entity Frequenciesplot_entity_frequencies
Plot Statistical Keynessplot_keyness_keywords
Plot Keyword Comparison (TF-IDF vs Frequency)plot_keyword_comparison
Plot Lexical Dispersionplot_lexical_dispersion
Plot Lexical Diversity Distributionplot_lexical_diversity_distribution
Plot Log Odds Ratioplot_log_odds_ratio
Plot Topic Model Comparison Scatterplot_model_comparison
Plot Morphology Feature Distributionplot_morphology_feature
Plot Multi-Word Expression Frequencyplot_mwe_frequency
Plot N-gram Frequencyplot_ngram_frequency
Plot Part-of-Speech Tag Frequenciesplot_pos_frequencies
Plot Topic Model Quality Metricsplot_quality_metrics
Plot Readability by Groupplot_readability_by_group
Plot Readability Distributionplot_readability_distribution
Plot Semantic Analysis Visualizationplot_semantic_viz
Plot Sentiment Box Plot by Categoryplot_sentiment_boxplot
Plot Sentiment by Categoryplot_sentiment_by_category
Plot Sentiment Distributionplot_sentiment_distribution
Plot Sentiment Violin Plot by Categoryplot_sentiment_violin
Plot Document Similarity Heatmapplot_similarity_heatmap
Plot Term Frequency Trends by Continuous Variableplot_term_trends_continuous
Plot TF-IDF Keywordsplot_tfidf_keywords
Plot Top Documents by Readabilityplot_top_readability_documents
Plot Topic Effects for Categorical Variablesplot_topic_effects_categorical
Plot Topic Effects for Continuous Variablesplot_topic_effects_continuous
Plot Per-Document Per-Topic Probabilitiesplot_topic_probability
Plot Weighted Log Oddsplot_weighted_log_odds
Plot Word Frequencyplot_word_frequency
Plot Word Probabilities by Topicplot_word_probability
Preprocess Text Dataprep_texts
Process PDF File (Unified Entry Point)process_pdf_unified
Dimensionality Reduction Analysisreduce_dimensions
Render displaCy Dependency Visualizationrender_displacy_dep
Render displaCy Entity Visualizationrender_displacy_ent
Launch the TextAnalysisR apprun_app
RAG Semantic Searchrun_rag_search
Complete Text Mining Workflowrun_text_workflow
Semantic Document Clusteringsemantic_document_clustering
Semantic Similarity Analysissemantic_similarity_analysis
Embedding-based Sentiment Analysissentiment_embedding_analysis
Analyze Sentiment Using Tidytext Lexiconssentiment_lexicon_analysis
Setup Python Environmentsetup_python_env
Extract Named Entities with spaCyspacy_extract_entities
Check if Model Has Word Vectorsspacy_has_vectors
Check if spaCy is Initializedspacy_initialized
Lemmatize Texts with spaCyspacy_lemmatize
Parse Texts with spaCyspacy_parse_full
Special education technology bibliographic dataSpecialEduTech
An example structure of a structural topic modelstm_15
Stopwords Liststopwords_list
Summarize Morphology Featuressummarize_morphology
Unite Text Columnsunite_cols
Cross-Analysis Validationvalidate_cross_models
Validate Semantic Coherencevalidate_semantic_coherence
Analyze and Visualize Word Co-occurrence Networksword_co_occurrence_network
Analyze and Visualize Word Correlation Networksword_correlation_network