Evaluations

Learn how to evaluate your scorecards and analyze their performance.

Overview

Evaluations help you measure and improve your scorecard's performance by analyzing its predictions against labeled samples. This process is essential for ensuring your scorecards are accurate and reliable.

When you run an evaluation, Plexus will:

  • Test your scorecard against a set of labeled examples
  • Calculate key performance metrics like accuracy and precision
  • Generate visualizations to help understand the results
  • Store the results for future reference and comparison

Understanding Evaluation Results

Here's an example of what an evaluation looks like in the dashboard:

Lead Qualification
Qualified Lead?
Evaluation
evaluate
 
Processing
Complete
200 / 200100%
Labels: Binary
Balanced distribution
Qualified
Not Qualified
Predicted classes
Qualified
Not Qualified
0%60%80%90%95%100%85.5%
Accuracy
0%60%80%90%95%100%88.2%
Precision
0%60%80%90%95%100%82.1%
Sensitivity
0%60%80%90%95%100%91.3%
Specificity
Confusion matrix
Actual
Qualified
Not Qualified
45
10
5
40
Qualified
Not Qualified
Predicted
Score Results (20)
86%
Not Qualified
No clear buying intent or business requirements mentioned.
89%
Qualified
Not Qualified
Incorrectly interpreted general interest as qualified lead.
100%
Qualified
Shows clear interest in enterprise-level solution with specific requirements and scale indicators.
81%
Qualified
Not Qualified
Incorrectly interpreted general interest as qualified lead.
74%
Qualified
Shows clear interest in enterprise-level solution with specific requirements and scale indicators.
94%
Not Qualified
No clear buying intent or business requirements mentioned.
83%
Not Qualified
No clear buying intent or business requirements mentioned.
88%
Not Qualified
No clear buying intent or business requirements mentioned.
93%
Not Qualified
No clear buying intent or business requirements mentioned.
88%
Qualified
Shows clear interest in enterprise-level solution with specific requirements and scale indicators.
72%
Not Qualified
No clear buying intent or business requirements mentioned.
99%
Not Qualified
No clear buying intent or business requirements mentioned.
91%
Not Qualified
Qualified
Missed qualification signals: Mentioned team size and specific use case requirements.
83%
Qualified
Shows clear interest in enterprise-level solution with specific requirements and scale indicators.
75%
Not Qualified
No clear buying intent or business requirements mentioned.
71%
Qualified
Shows clear interest in enterprise-level solution with specific requirements and scale indicators.
75%
Qualified
Shows clear interest in enterprise-level solution with specific requirements and scale indicators.
94%
Not Qualified
No clear buying intent or business requirements mentioned.
81%
Not Qualified
No clear buying intent or business requirements mentioned.
89%
Qualified
Shows clear interest in enterprise-level solution with specific requirements and scale indicators.
88%

Key Components

  • Performance Metrics: See accuracy, precision, sensitivity, and specificity scores at a glance
  • Class Distribution: Understand the balance of your test data and predictions
  • Confusion Matrix: Visualize where your scorecard excels or needs improvement
  • Individual Results: Review specific examples to understand prediction patterns

Key Metrics Explained

Accuracy

The percentage of correct predictions. For example, if your "Qualified Lead?" score correctly identifies 95 out of 100 leads, the accuracy is 95%.

Precision

Of the leads marked as qualified by the "Qualified Lead?" score, what percentage were actually qualified? High precision means fewer false positives.

Recall (Sensitivity)

Of all actually qualified leads, what percentage did we identify? High recall means fewer missed opportunities.

Specificity

Of all unqualified leads, what percentage did we correctly identify? High specificity means better filtering of poor leads.

Running Evaluations

You can run evaluations in two ways:

1. Using the Dashboard

Navigate to your scorecard and click the "Evaluate" button. You can specify:

  • Number of samples to evaluate
  • Whether to generate visualizations
  • Specific data filters or criteria

2. Using the CLI

For advanced users, you can use the CLI tool:

plexus \ evaluate \ accuracy \ --scorecard-name "Lead Qualification" \ --number-of-samples 100 \ --visualize

Best Practices

  • Use at least 100 samples for reliable results
  • Include a diverse range of cases in your evaluation dataset
  • Run evaluations regularly to monitor performance over time
  • Pay attention to both precision and recall - high accuracy alone isn't enough
  • Use visualizations to identify patterns in misclassifications

Advanced Features

Confusion Matrix

The confusion matrix shows the breakdown of predictions:

  • True Positives: Correctly identified qualified leads
  • False Positives: Unqualified leads mistakenly marked as qualified
  • True Negatives: Correctly identified unqualified leads
  • False Negatives: Qualified leads mistakenly marked as unqualified