Evaluate a Score
Learn how to run evaluations using individual scores or complete scorecards.
Running an Evaluation
You can evaluate content using individual scores or entire scorecards. The evaluation process analyzes your content against the defined criteria and provides detailed results.
Using the Dashboard
- Select your source content
- Choose a scorecard or individual score
- Click "Run Evaluation"
- Monitor the evaluation progress
- Review the results
Using the SDK
from plexus import Plexus
plexus = Plexus(api_key="your-api-key")
# Evaluate using a specific score (accepts ID, name, key, or external ID)
evaluation = plexus.evaluations.create(
source_id="source-id",
score="Grammar Check" # Can use name, key, ID, or external ID
)
# Or evaluate using an entire scorecard (accepts ID, name, key, or external ID)
evaluation = plexus.evaluations.create(
source_id="source-id",
scorecard="Content Quality" # Can use name, key, ID, or external ID
)
# Get evaluation results
results = evaluation.get_results()
# Print score values
for score in results.scores:
print(f"{score.name}: {score.value}")
The SDK supports the flexible identifier system, allowing you to reference scorecards and scores using different types of identifiers (name, key, ID, or external ID).
Using the CLI
# Evaluate using a scorecard
plexus evaluate accuracy --scorecard "Content Quality" --number-of-samples 100
# List evaluation results
plexus evaluations list
# View detailed results for a specific evaluation
plexus evaluations list-results --evaluation evaluation-id
The CLI supports the flexible identifier system, allowing you to reference scorecards using different types of identifiers (name, key, ID, or external ID).
Understanding Results
Score Values
Numerical or categorical results for each evaluated criterion.
Explanations
Detailed reasoning behind each score's evaluation result.
Suggestions
Recommendations for improvement based on the evaluation results.
Batch Evaluations
You can evaluate multiple sources at once using batch processing:
# Create a batch evaluation
batch = plexus.evaluations.create_batch(
source_ids=["source-1", "source-2", "source-3"],
scorecard="Quality Assurance" # Can use name, key, ID, or external ID
)
# Monitor batch progress
status = batch.get_status()
# Get results when complete
results = batch.get_results()
Like individual evaluations, batch evaluations also support the flexible identifier system for scorecards and scores.
Coming Soon
Detailed documentation about evaluations is currently being developed. Check back soon for:
- Advanced evaluation options
- Custom result formatting
- Evaluation performance optimization
- Result analysis techniques