This evaluation shows strong performance across key metrics.
Evaluations
Evaluations in Plexus are how you validate and assess your scorecards to ensure they align with your policies and stakeholder needs. They help you measure the effectiveness and accuracy of your scoring criteria before deploying them to production.
What are Evaluations?
An evaluation is like a machine learning evaluation process - it's how you test and validate your scorecards against known correct answers. This helps ensure your scoring criteria are properly calibrated and will produce reliable results when deployed.
Evaluation Components
Each evaluation consists of:
- Test Dataset: A set of content with known correct answers
- Scorecard: The scoring criteria being evaluated
- Results: How well the scorecard's predictions match the known correct answers
- Metrics: Performance indicators like accuracy, precision, and recall
Evaluation Process
When you run an evaluation:
- Your scorecard is applied to a test dataset with known correct answers
- The scorecard's predictions are compared against the ground truth
- Performance metrics are calculated to measure accuracy and reliability
- A comprehensive report helps you identify areas for improvement
Example Evaluation
Here's an example of what a scorecard evaluation looks like in Plexus:
Understanding Results
Evaluation results help you understand how well your scorecard performs and where it needs improvement:
Score Results
For each score in your scorecard, you get:
- How well the predictions match known correct answers
- Detailed explanations of where and why mismatches occurred
- Confidence levels to identify uncertain predictions
- Insights for improving score accuracy
Performance Metrics
Overall evaluation metrics help you assess scorecard quality:
- Accuracy, precision, and recall statistics
- Performance trends as you refine your scorecard
- Comparison with baseline benchmarks
- Quality indicators to guide improvements
Using Evaluations
Evaluations are essential tools for developing reliable scorecards. Use them to:
- Validate that scorecards align with your policies and requirements
- Identify and fix biases or gaps in scoring criteria
- Track scorecard improvement over time
- Build confidence in your scoring system before deployment