Evaluations

Evaluations in Plexus are how you validate and assess your scorecards to ensure they align with your policies and stakeholder needs. They help you measure the effectiveness and accuracy of your scoring criteria before deploying them to production.

What are Evaluations?

An evaluation is like a machine learning evaluation process - it's how you test and validate your scorecards against known correct answers. This helps ensure your scoring criteria are properly calibrated and will produce reliable results when deployed.

Evaluation Components

Each evaluation consists of:

Test Dataset: A set of content with known correct answers
Scorecard: The scoring criteria being evaluated
Results: How well the scorecard's predictions match the known correct answers
Metrics: Performance indicators like accuracy, precision, and recall

Evaluation Process

When you run an evaluation:

Your scorecard is applied to a test dataset with known correct answers
The scorecard's predictions are compared against the ground truth
Performance metrics are calculated to measure accuracy and reliability
A comprehensive report helps you identify areas for improvement

Example Evaluation

Here's an example of what a scorecard evaluation looks like in Plexus:

Evaluation

scorecard-123

score-123

Processed 100 of 100 items

Elapsed: 2m 0s

Complete

100

/ 100

100%

Labels: Binary

Balanced distribution

Positive

Negative

Predicted classes

Positive

Negative

Metrics

Goal:

This evaluation shows strong performance across key metrics.

achieve 90% accuracy

85%

Accuracy

Confusion matrix

Positive

Negative

Predicted

Actual

Score Results

Positive

High confidence prediction

95.0%

Understanding Results

Evaluation results help you understand how well your scorecard performs and where it needs improvement:

Score Results

For each score in your scorecard, you get:

How well the predictions match known correct answers
Detailed explanations of where and why mismatches occurred
Confidence levels to identify uncertain predictions
Insights for improving score accuracy

Performance Metrics

Overall evaluation metrics help you assess scorecard quality:

Accuracy, precision, and recall statistics
Performance trends as you refine your scorecard
Comparison with baseline benchmarks
Quality indicators to guide improvements

Using Evaluations

Evaluations are essential tools for developing reliable scorecards. Use them to:

Validate that scorecards align with your policies and requirements
Identify and fix biases or gaps in scoring criteria
Track scorecard improvement over time
Build confidence in your scoring system before deployment