Evaluate a Score

Learn how to run evaluations using individual scores or complete scorecards.

Running an Evaluation

You can evaluate content using individual scores or entire scorecards. The evaluation process analyzes your content against the defined criteria and provides detailed results.

Using the Dashboard

Select your source content
Choose a scorecard or individual score
Click "Run Evaluation"
Monitor the evaluation progress
Review the results

Using the SDK

from plexus import Plexus

plexus = Plexus(api_key="your-api-key")

# Evaluate using a specific score (accepts ID, name, key, or external ID)
evaluation = plexus.evaluations.create(
    source_id="source-id",
    score="Grammar Check"  # Can use name, key, ID, or external ID
)

# Or evaluate using an entire scorecard (accepts ID, name, key, or external ID)
evaluation = plexus.evaluations.create(
    source_id="source-id",
    scorecard="Content Quality"  # Can use name, key, ID, or external ID
)

# Get evaluation results
results = evaluation.get_results()

# Print score values
for score in results.scores:
    print(f"{score.name}: {score.value}")

The SDK supports the flexible identifier system, allowing you to reference scorecards and scores using different types of identifiers (name, key, ID, or external ID).

Using the CLI

# Evaluate using a scorecard
plexus evaluate accuracy --scorecard "Content Quality" --number-of-samples 100

# List evaluation results
plexus evaluations list

# View detailed results for a specific evaluation
plexus evaluations list-results --evaluation evaluation-id

The CLI supports the flexible identifier system, allowing you to reference scorecards using different types of identifiers (name, key, ID, or external ID).

Understanding Results

Score Values

Numerical or categorical results for each evaluated criterion.

Explanations

Detailed reasoning behind each score's evaluation result.

Suggestions

Recommendations for improvement based on the evaluation results.

Batch Evaluations

You can evaluate multiple sources at once using batch processing:

# Create a batch evaluation
batch = plexus.evaluations.create_batch(
    source_ids=["source-1", "source-2", "source-3"],
    scorecard="Quality Assurance"  # Can use name, key, ID, or external ID
)

# Monitor batch progress
status = batch.get_status()

# Get results when complete
results = batch.get_results()

Like individual evaluations, batch evaluations also support the flexible identifier system for scorecards and scores.

Coming Soon

Detailed documentation about evaluations is currently being developed. Check back soon for:

Advanced evaluation options
Custom result formatting
Evaluation performance optimization
Result analysis techniques