Evaluations

Evaluations in Plexus are how you validate and assess your scorecards to ensure they align with your policies and stakeholder needs. They help you measure the effectiveness and accuracy of your scoring criteria before deploying them to production.

What are Evaluations?

An evaluation is like a machine learning evaluation process - it's how you test and validate your scorecards against known correct answers. This helps ensure your scoring criteria are properly calibrated and will produce reliable results when deployed.

Evaluation Components

Each evaluation consists of:

  • Test Dataset: A set of content with known correct answers
  • Scorecard: The scoring criteria being evaluated
  • Results: How well the scorecard's predictions match the known correct answers
  • Metrics: Performance indicators like accuracy, precision, and recall

Evaluation Process

When you run an evaluation:

  • Your scorecard is applied to a test dataset with known correct answers
  • The scorecard's predictions are compared against the ground truth
  • Performance metrics are calculated to measure accuracy and reliability
  • A comprehensive report helps you identify areas for improvement

Example Evaluation

Here's an example of what a scorecard evaluation looks like in Plexus:

scorecard-123
score-123
Evaluation Evaluation
Processed 100 of 100 items
Complete
100 / 100100%
Labels: Binary
Balanced distribution
Positive
Negative
Predicted classes
Positive
Negative
Metrics
Goal:

This evaluation shows strong performance across key metrics.

achieve 90% accuracy
0%60%80%90%95%100%85%
Accuracy
Confusion matrix
Actual
Positive
Negative
45
10
5
40
Positive
Negative
Predicted
Score Results (1)
1%
Positive
High confidence prediction
95%

Understanding Results

Evaluation results help you understand how well your scorecard performs and where it needs improvement:

Score Results

For each score in your scorecard, you get:

  • How well the predictions match known correct answers
  • Detailed explanations of where and why mismatches occurred
  • Confidence levels to identify uncertain predictions
  • Insights for improving score accuracy

Performance Metrics

Overall evaluation metrics help you assess scorecard quality:

  • Accuracy, precision, and recall statistics
  • Performance trends as you refine your scorecard
  • Comparison with baseline benchmarks
  • Quality indicators to guide improvements

Using Evaluations

Evaluations are essential tools for developing reliable scorecards. Use them to:

  • Validate that scorecards align with your policies and requirements
  • Identify and fix biases or gaps in scoring criteria
  • Track scorecard improvement over time
  • Build confidence in your scoring system before deployment