Score Results

Score Results record the outcomes of scoring items against scores in a scorecard, providing detailed information about the evaluation process and results.

What are Score Results?

A Score Result is a record created when an item is evaluated against a score or scores in a scorecard. It captures not only the outcome of the evaluation but also important contextual information about how the evaluation was performed and what data was used.

Core Components

Each Score Result contains these essential components:

Value: The actual result of the evaluation (e.g., "yes"/"no", a numeric score, or a category)
Confidence: For applicable scores, indicates how certain the system is about the result
Correct: A boolean indicating whether the result matches the expected outcome (for labeled data)
Explanation: A detailed description of why this result was chosen, providing transparency into the decision-making process
Metadata: Contextual information including the inputs used for evaluation and other relevant data
Trace: A detailed record of the evaluation process, including intermediate steps and decisions

Relationships

Score Results are connected to several other entities in the Plexus system:

Item: The content being evaluated (e.g., a conversation transcript, document, or other data)
Score: The specific evaluation criteria being applied
Scorecard: The collection of scores that the individual score belongs to
Scoring Job: The process that generated this result (may be part of a larger evaluation)
Evaluation: The broader evaluation process that may include multiple scoring jobs and results

Understanding Score Result Data

Score Results provide rich information that can be used for analysis, debugging, and improving your evaluation processes.

Metadata

The metadata field contains contextual information about the evaluation, which may include:

Input Data: The specific content that was evaluated
Human Labels: For labeled data, the expected outcomes provided by human reviewers
Session Information: Identifiers for the evaluation session or batch
Configuration Details: Specific settings used for this evaluation

Metadata is valuable for understanding the context of each result and for filtering or grouping results during analysis.

Trace

The trace field provides a detailed record of the evaluation process, which is especially valuable for complex evaluations like those performed by LLMs or multi-step processes. A trace may include:

Intermediate Steps: The sequence of operations performed during evaluation
LLM Prompts and Responses: For LLM-based evaluations, the exact prompts sent and responses received
Decision Points: Key decision points in the evaluation process
Timing Information: Performance metrics for different stages of the evaluation

Traces are invaluable for debugging, understanding model behavior, and improving evaluation processes over time.

Working with Score Results

Plexus provides several ways to work with Score Results, both through the dashboard interface and the CLI.

Viewing Results in the Dashboard

The Plexus dashboard provides a user-friendly interface for viewing and analyzing Score Results:

Item Detail View: See all Score Results for a specific item
Evaluation Results: View aggregated results from evaluation runs
Scoring Job Details: Examine the results of individual scoring jobs

Using the CLI

The Plexus CLI provides powerful commands for working with Score Results:

# List recent score results for a specific scorecard
plexus results list --scorecard "Example Scorecard" --limit 20

# List recent score results for a specific account
plexus results list --account "Example Account" --limit 20

# Get detailed information about a specific score result
plexus results info --id "result-id-here"

These commands provide detailed views of Score Results, including pretty-printed metadata and trace information for in-depth analysis.