TopicAnalysis

NLP

The TopicAnalysis report block performs NLP analysis to identify and categorize topics in text data using BERTopic. It processes transcript data through various transformation methods and generates comprehensive topic insights with visualizations and representative examples.

Overview

The TopicAnalysis block orchestrates a multi-stage analysis pipeline similar to theplexus analyze topics CLI command. It transforms text data, applies BERTopic clustering to discover topics, and generates visualizations and insights.

The analysis supports multiple transformation methods including direct chunking, LLM-based extraction, and itemized processing, making it flexible for different types of text data and analysis requirements.

Key Features

BERTopic Clustering

Advanced topic modeling using state-of-the-art transformer embeddings

Topic Visualization

Interactive charts showing topic distribution and relationships

Keyword Extraction

Identifies most relevant keywords for each discovered topic

Representative Examples

Shows actual text examples that best represent each topic

Configuration

Configure the TopicAnalysis block in your report configuration:

```block
class: TopicAnalysis
data:
source: "customer-calls" # DataSource name or ID
content_column: "text" # Column containing text data
sample_size: 1000 # Optional: limit number of records
llm_extraction:
method: "chunk" # "chunk", "llm", or "itemize"
provider: "ollama" # LLM provider if using "llm" method
model: "gemma3:27b" # LLM model if using "llm" method
bertopic_analysis:
min_topic_size: 10 # Minimum documents per topic
top_n_words: 10 # Number of keywords per topic
```

Configuration Parameters

Data Configuration

ParameterRequiredDescription
data.source
Required*
DataSource name, key, or ID (mutually exclusive with dataset)
data.dataset
Required*
Specific DataSet ID (mutually exclusive with source)
data.content_column
Optional
Column containing text data (default: "text")
data.sample_size
Optional
Limit number of records to process (default: all)

* Either source OR dataset must be specified

LLM Extraction Configuration

ParameterRequiredDescription
llm_extraction.method
Optional
"chunk", "llm", or "itemize" (default: "chunk")
llm_extraction.provider
Optional
"ollama", "openai", "anthropic" (default: "ollama")
llm_extraction.model
Optional
LLM model name (default: "gemma3:27b")

BERTopic Analysis Configuration

ParameterRequiredDescription
bertopic_analysis.min_topic_size
Optional
Minimum documents per topic (default: 10)
bertopic_analysis.top_n_words
Optional
Number of keywords per topic (default: 10)
bertopic_analysis.min_ngram
Optional
Minimum n-gram size (default: 1)
bertopic_analysis.max_ngram
Optional
Maximum n-gram size (default: 2)

Example Output

Here's an example of how the TopicAnalysis block output appears in a report:

Live Example

This is a live rendering of the TopicAnalysis component using example data

Topic Analysis Example

Topic Analysis Results

Pipeline Setup

Loading diagram...

Analysis Details

Understanding the Output

Topic Discovery

BERTopic automatically discovers the optimal number of topics based on the data, clustering semantically similar texts together. Each topic is characterized by its most representative keywords and examples.

Pipeline Visualization

The pipeline diagram shows the complete flow from data preprocessing through LLM extraction to BERTopic analysis, making it easy to understand and reproduce the analysis process.

Topic Distribution

Interactive pie chart visualization shows the relative prevalence of each topic, helping identify the most common themes in your data at a glance.

Representative Examples

Each topic includes actual text examples that best represent the topic's content, providing concrete context for understanding what each topic covers.

Analysis Methods

Chunking Method

Direct text chunking without LLM processing. Fast and efficient for well-structured text data.

Best for: Clean transcript data, structured documents, high-volume processing

LLM Method

Uses large language models to extract and refine key themes from text before topic analysis.

Best for: Noisy data, complex conversations, extracting specific themes

Itemize Method

Breaks down text into individual items or points using LLM analysis for granular topic discovery.

Best for: Multi-topic documents, detailed analysis, customer feedback

Related Documentation