TopicAnalysis

NLP

The TopicAnalysis report block performs NLP analysis to identify and categorize topics in text data using BERTopic. It processes transcript data through various transformation methods and generates comprehensive topic insights with visualizations and representative examples.

Overview

The TopicAnalysis block orchestrates a multi-stage analysis pipeline similar to theplexus analyze topics CLI command. It transforms text data, applies BERTopic clustering to discover topics, and generates visualizations and insights.

The analysis supports multiple transformation methods including direct chunking, LLM-based extraction, and itemized processing, making it flexible for different types of text data and analysis requirements.

Key Features

BERTopic Clustering

Advanced topic modeling using state-of-the-art transformer embeddings

Topic Visualization

Interactive charts showing topic distribution and relationships

Keyword Extraction

Identifies most relevant keywords for each discovered topic

Representative Examples

Shows actual text examples that best represent each topic

Configuration

Configure the TopicAnalysis block in your report configuration:

```block

class: TopicAnalysis

data:

source: "customer-calls" # DataSource name or ID

content_column: "text" # Column containing text data

sample_size: 1000 # Optional: limit number of records

llm_extraction:

method: "chunk" # "chunk", "llm", or "itemize"

provider: "ollama" # LLM provider if using "llm" method

model: "gemma3:27b" # LLM model if using "llm" method

bertopic_analysis:

min_topic_size: 10 # Minimum documents per topic

top_n_words: 10 # Number of keywords per topic

```

Configuration Parameters

Data Configuration

Parameter	Required	Description
data.source	Required*	DataSource name, key, or ID (mutually exclusive with dataset)
data.dataset	Required*	Specific DataSet ID (mutually exclusive with source)
data.content_column	Optional	Column containing text data (default: "text")
data.sample_size	Optional	Limit number of records to process (default: all)

* Either source OR dataset must be specified

LLM Extraction Configuration

Parameter	Required	Description
llm_extraction.method	Optional	"chunk", "llm", or "itemize" (default: "chunk")
llm_extraction.provider	Optional	"ollama", "openai", "anthropic" (default: "ollama")
llm_extraction.model	Optional	LLM model name (default: "gemma3:27b")

BERTopic Analysis Configuration

Parameter	Required	Description
bertopic_analysis.min_topic_size	Optional	Minimum documents per topic (default: 10)
bertopic_analysis.top_n_words	Optional	Number of keywords per topic (default: 10)
bertopic_analysis.min_ngram	Optional	Minimum n-gram size (default: 1)
bertopic_analysis.max_ngram	Optional	Maximum n-gram size (default: 2)

Example Output

Here's an example of how the TopicAnalysis block output appears in a report:

Live Example

This is a live rendering of the TopicAnalysis component using example data

Topic Analysis Example

Topic Analysis Results

Pipeline Setup

Loading diagram...

Analysis Details

Understanding the Output

Topic Discovery

BERTopic automatically discovers the optimal number of topics based on the data, clustering semantically similar texts together. Each topic is characterized by its most representative keywords and examples.

Pipeline Visualization

The pipeline diagram shows the complete flow from data preprocessing through LLM extraction to BERTopic analysis, making it easy to understand and reproduce the analysis process.

Topic Distribution

Interactive pie chart visualization shows the relative prevalence of each topic, helping identify the most common themes in your data at a glance.

Representative Examples

Each topic includes actual text examples that best represent the topic's content, providing concrete context for understanding what each topic covers.

Analysis Methods

Chunking Method

Direct text chunking without LLM processing. Fast and efficient for well-structured text data.

Best for: Clean transcript data, structured documents, high-volume processing

LLM Method

Uses large language models to extract and refine key themes from text before topic analysis.

Best for: Noisy data, complex conversations, extracting specific themes

Itemize Method

Breaks down text into individual items or points using LLM analysis for granular topic discovery.

Best for: Multi-topic documents, detailed analysis, customer feedback

TopicAnalysis

Overview

Key Features

BERTopic Clustering

Topic Visualization

Keyword Extraction

Representative Examples

Configuration

Configuration Parameters

Data Configuration

LLM Extraction Configuration

BERTopic Analysis Configuration

Example Output

Live Example

Topic Analysis Example

Topic Analysis Results

Billing Inquiry45 items

Technical Support38 items

Service Cancellation32 items

Product Information28 items

Pre-processing

LLM Extraction

BERTopic Analysis

Fine-tuning

Understanding the Output

Topic Discovery

Pipeline Visualization

Topic Distribution

Representative Examples

Analysis Methods

Chunking Method

LLM Method

Itemize Method

Related Documentation