Evaluations
Learn how to evaluate your scorecards and analyze their performance.
Overview
Evaluations help you measure and improve your scorecard's performance by analyzing its predictions against labeled samples. This process is essential for ensuring your scorecards are accurate and reliable.
When you run an evaluation, Plexus will:
- Test your scorecard against a set of labeled examples
- Calculate key performance metrics like accuracy and precision
- Generate visualizations to help understand the results
- Store the results for future reference and comparison
Understanding Evaluation Results
Here's an example of what an evaluation looks like in the dashboard:
Key Components
- Performance Metrics: See accuracy, precision, sensitivity, and specificity scores at a glance
- Class Distribution: Understand the balance of your test data and predictions
- Confusion Matrix: Visualize where your scorecard excels or needs improvement
- Individual Results: Review specific examples to understand prediction patterns
Key Metrics Explained
Accuracy
The percentage of correct predictions. For example, if your "Qualified Lead?" score correctly identifies 95 out of 100 leads, the accuracy is 95%.
Precision
Of the leads marked as qualified by the "Qualified Lead?" score, what percentage were actually qualified? High precision means fewer false positives.
Recall (Sensitivity)
Of all actually qualified leads, what percentage did we identify? High recall means fewer missed opportunities.
Specificity
Of all unqualified leads, what percentage did we correctly identify? High specificity means better filtering of poor leads.
Running Evaluations
You can run evaluations in two ways:
1. Using the Dashboard
Navigate to your scorecard and click the "Evaluate" button. You can specify:
- Number of samples to evaluate
- Whether to generate visualizations
- Specific data filters or criteria
2. Using the CLI
For advanced users, you can use the CLI tool:
plexus \ evaluate \ accuracy \ --scorecard-name "Lead Qualification" \ --number-of-samples 100 \ --visualize
Best Practices
- Use at least 100 samples for reliable results
- Include a diverse range of cases in your evaluation dataset
- Run evaluations regularly to monitor performance over time
- Pay attention to both precision and recall - high accuracy alone isn't enough
- Use visualizations to identify patterns in misclassifications
Advanced Features
Confusion Matrix
The confusion matrix shows the breakdown of predictions:
- True Positives: Correctly identified qualified leads
- False Positives: Unqualified leads mistakenly marked as qualified
- True Negatives: Correctly identified unqualified leads
- False Negatives: Qualified leads mistakenly marked as unqualified