TopicAnalysis
The TopicAnalysis report block performs NLP analysis to identify and categorize topics in text data using BERTopic. It processes transcript data through various transformation methods and generates comprehensive topic insights with visualizations and representative examples.
Overview
The TopicAnalysis block orchestrates a multi-stage analysis pipeline similar to theplexus analyze topics
CLI command. It transforms text data, applies BERTopic clustering to discover topics, and generates visualizations and insights.
The analysis supports multiple transformation methods including direct chunking, LLM-based extraction, and itemized processing, making it flexible for different types of text data and analysis requirements.
Key Features
BERTopic Clustering
Advanced topic modeling using state-of-the-art transformer embeddings
Topic Visualization
Interactive charts showing topic distribution and relationships
Keyword Extraction
Identifies most relevant keywords for each discovered topic
Representative Examples
Shows actual text examples that best represent each topic
Configuration
Configure the TopicAnalysis block in your report configuration:
Configuration Parameters
Data Configuration
Parameter | Required | Description |
---|---|---|
data.source | Required* | DataSource name, key, or ID (mutually exclusive with dataset) |
data.dataset | Required* | Specific DataSet ID (mutually exclusive with source) |
data.content_column | Optional | Column containing text data (default: "text") |
data.sample_size | Optional | Limit number of records to process (default: all) |
* Either source OR dataset must be specified
LLM Extraction Configuration
Parameter | Required | Description |
---|---|---|
llm_extraction.method | Optional | "chunk", "llm", or "itemize" (default: "chunk") |
llm_extraction.provider | Optional | "ollama", "openai", "anthropic" (default: "ollama") |
llm_extraction.model | Optional | LLM model name (default: "gemma3:27b") |
BERTopic Analysis Configuration
Parameter | Required | Description |
---|---|---|
bertopic_analysis.min_topic_size | Optional | Minimum documents per topic (default: 10) |
bertopic_analysis.top_n_words | Optional | Number of keywords per topic (default: 10) |
bertopic_analysis.min_ngram | Optional | Minimum n-gram size (default: 1) |
bertopic_analysis.max_ngram | Optional | Maximum n-gram size (default: 2) |
Example Output
Here's an example of how the TopicAnalysis block output appears in a report:
Live Example
This is a live rendering of the TopicAnalysis component using example data
Topic Analysis Example
Topic Analysis Results
Loading diagram...
Understanding the Output
Topic Discovery
BERTopic automatically discovers the optimal number of topics based on the data, clustering semantically similar texts together. Each topic is characterized by its most representative keywords and examples.
Pipeline Visualization
The pipeline diagram shows the complete flow from data preprocessing through LLM extraction to BERTopic analysis, making it easy to understand and reproduce the analysis process.
Topic Distribution
Interactive pie chart visualization shows the relative prevalence of each topic, helping identify the most common themes in your data at a glance.
Representative Examples
Each topic includes actual text examples that best represent the topic's content, providing concrete context for understanding what each topic covers.
Analysis Methods
Chunking Method
Direct text chunking without LLM processing. Fast and efficient for well-structured text data.
Best for: Clean transcript data, structured documents, high-volume processing
LLM Method
Uses large language models to extract and refine key themes from text before topic analysis.
Best for: Noisy data, complex conversations, extracting specific themes
Itemize Method
Breaks down text into individual items or points using LLM analysis for granular topic discovery.
Best for: Multi-topic documents, detailed analysis, customer feedback