`plexus` CLI Tool

Master the command-line interface for managing your Plexus deployment.

Overview

The Plexus CLI tool provides a powerful command-line interface for managing your Plexus deployment, with a focus on evaluating and monitoring scorecard performance.

Installation

Install the Plexus CLI tool using pip:

pip install plexus-cli

Flexible Identifier System

The Plexus CLI uses a flexible identifier system that allows you to reference resources using different types of identifiers. This makes commands more intuitive and reduces the need to look up specific IDs.

Scorecard Identifiers

When using the --scorecard parameter, you can provide any of the following:

DynamoDB ID: The unique database identifier (e.g., e51cd5ec-1940-4d8e-abcc-faa851390112)
Name: The human-readable name (e.g., "Quality Assurance")
Key: The URL-friendly key (e.g., quality-assurance)
External ID: Your custom external identifier (e.g., qa-2023)

Examples:

# All of these commands do the same thing, using different identifier types
plexus scorecards info --scorecard e51cd5ec-1940-4d8e-abcc-faa851390112
plexus scorecards info --scorecard "Quality Assurance"
plexus scorecards info --scorecard quality-assurance
plexus scorecards info --scorecard qa-2023

Score Identifiers

Similar to scorecards, scores can be referenced using various identifiers:

DynamoDB ID: The unique UUID assigned to the score
Name: The human-readable name of the score
Key: The machine-friendly key of the score
External ID: An optional external identifier for the score

When using the --score parameter, you can use any of these identifiers:

# Using DynamoDB ID
plexus scores info --scorecard "Quality Assurance" --score 7a9b2c3d-4e5f-6g7h-8i9j-0k1l2m3n4o5p

# Using Name (with quotes for names containing spaces)
plexus scores info --scorecard "Quality Assurance" --score "Grammar Check"

# Using Key
plexus scores info --scorecard "Quality Assurance" --score grammar-check

# Using External ID
plexus scores info --scorecard "Quality Assurance" --score gc-001

# Combining different identifier types for scorecard and score
plexus scores info --scorecard quality_assurance --score "Grammar Check"

The flexible identifier system makes it easy to reference scores in a way that's most convenient for your workflow. You can use different identifier types for the scorecard and score in the same command.

Account Identifiers

When using the --account parameter, you can provide any of the following:

DynamoDB ID: The unique database identifier
Name: The human-readable name
Key: The URL-friendly key

Common Scorecard Commands

Here are some common commands for managing scorecards:

# List all scorecards
plexus scorecards list

# Get detailed information about a specific scorecard
plexus scorecards info --scorecard example1

# List all scores in a scorecard
plexus scores list --scorecard example1

# Pull scorecard configuration to YAML
plexus scorecards pull --scorecard example1 --output ./my-scorecards

# Push scorecard configuration from YAML
plexus scorecards push --scorecard example1 --file ./my-scorecard.yaml --note "Updated configuration"

# Delete a scorecard
plexus scorecards delete --scorecard example1

Score Management Commands

The CLI provides commands for managing and viewing information about scores.

Viewing Score Information

The scores info command displays detailed information about a specific score, including its versions:

plexus scores info --scorecard "Example Scorecard" --score "Example Score"

This command provides:

Score Details: Name, key, external ID, type, and order
Scorecard Information: Name, key, external ID, and section
Score Versions: Up to 10 versions in reverse chronological order (newest first)

Version Information

For each version, the command displays:

Version ID: Unique identifier for the version
Creation Date: When the version was created
Note: Any notes associated with the version (if available)
Configuration: The first 4 lines of the version's configuration
Status Indicators: Whether the version is the Champion (active) or Featured

Example output:

Score Information:
  Name: Grammar Check
  Key: grammar-check
  External ID: 123
  Type: LangGraphScore
  Order: 1

Scorecard Information:
  Name: Quality Assurance
  Key: quality_assurance
  External ID: 456
  Section: Default

Score Versions (3 of 3 total versions, newest first):
  Version: 7a9b2c3d-4e5f-6g7h-8i9j-0k1l2m3n4o5p
  Created: 2023-10-15 14:30:45
  Note: Updated prompt for better accuracy
  Configuration:
    {
      "prompt": "Evaluate the grammar of the following text...",
      "model": "gpt-4",
      // ... configuration continues ...
    }
  [Champion]

  Version: 8b9c3d4e-5f6g-7h8i-9j0k-1l2m3n4o5p6q
  Created: 2023-09-20 09:15:22
  Configuration:
    {
      "prompt": "Check the following text for grammar errors...",
      "model": "gpt-3.5-turbo",
      // ... configuration continues ...
    }
  [Featured]

  Version: 9c0d4e5f-6g7h-8i9j-0k1l-2m3n4o5p6q7r
  Created: 2023-08-05 11:45:33
  Configuration:
    {
      "prompt": "Analyze the grammar in this content...",
      "model": "gpt-3.5-turbo",
      // ... configuration continues ...
    }

This command displays up to 10 versions in reverse chronological order (newest first), showing which version is the champion and which versions are featured.

Listing Scores in a Scorecard

The scores list command displays all scores within a scorecard:

plexus scores list --scorecard "Example Scorecard"

# You can also use the score alias (singular form)
plexus score list --scorecard "Example Scorecard"

This command provides a detailed view of all scores organized by section, including:

Score Names: The human-readable names of each score
Score IDs: The unique identifiers for each score
Score Keys: The machine-friendly keys for each score
External IDs: Any external identifiers associated with the scores

Running Evaluations

The primary way to evaluate your scorecard's performance is using the evaluate accuracy command:

plexus \
  evaluate \
  accuracy \
  --scorecard "Inbound Leads" \
  --number-of-samples 100 \
  --visualize

--scorecard: Scorecard to evaluate (accepts ID, name, key, or external ID)

--number-of-samples: Number of samples to evaluate (recommended: 100+)

--visualize: Generate visualizations of the results

This command will evaluate your scorecard against labeled samples and provide detailed accuracy metrics, including precision, recall, and confusion matrices when visualization is enabled.

Viewing Evaluation Results

After running evaluations, you can view the results:

# List all evaluation records
plexus \
  evaluations \
  list

# View detailed results
plexus \
  evaluations \
  list-results \
  --evaluation evaluation-id \
  --limit 100

The results include accuracy metrics, individual predictions, and any visualizations that were generated during the evaluation.

Score Result Commands

The CLI provides commands for viewing and analyzing individual score results:

Listing Score Results

The results list command displays recent score results with optional filtering:

# List score results for a specific scorecard
plexus results list --scorecard "Example Scorecard" --limit 20

# List score results for a specific account
plexus results list --account "Example Account" --limit 20

This command requires either a scorecard or account identifier and provides:

Basic Information: ID, value, confidence, correct status, and related IDs
Timestamps: When the result was created and last updated
Metadata: Pretty-printed JSON showing input data and context
Trace: Detailed record of the evaluation process (when available)
Explanation: The reasoning behind the result (when available)

Viewing Detailed Score Result Information

The results info command displays detailed information about a specific score result:

plexus results info --id "result-id-here"

This command provides a comprehensive view of a single score result, including:

Complete Result Data: All fields and values associated with the result
Formatted Metadata: Nicely formatted JSON for easy reading
Formatted Trace: Detailed execution trace with clear visual separation
Relationship Information: Links to related entities like items, scorecards, and evaluations

This command is particularly useful for debugging evaluation issues or understanding exactly how a specific result was determined.

Report Commands

Manage report configurations and generated reports using the following commands. Remember to run commands from your project root using `python -m plexus.cli.CommandLineInterface ...` if you are working on the codebase locally, to avoid conflicts with globally installed versions.

Report Configuration Commands

# List available report configurations for your account
python -m plexus.cli.CommandLineInterface report config list

# Show details of a specific report configuration (using ID or Name)
# Note: Uses the flexible identifier system (tries ID, then Name if it looks like UUID; otherwise Name then ID)
python -m plexus.cli.CommandLineInterface report config show <id_or_name>

# Create a new report configuration from a Markdown/YAML file
python -m plexus.cli.CommandLineInterface report config create --name "My Report Config" --file ./path/to/config.md [--description "Optional description"]

# Delete a report configuration (prompts for confirmation)
python -m plexus.cli.CommandLineInterface report config delete <id_or_name>

# Delete a report configuration (skip confirmation prompt)
python -m plexus.cli.CommandLineInterface report config delete <id_or_name> --yes

Report Generation and Viewing Commands

# Trigger a new report generation task based on a configuration (using ID or Name for config)
python -m plexus.cli.CommandLineInterface report run --config <config_id_or_name> [param1=value1 param2=value2 ...]

# List generated reports, optionally filtered by configuration (using ID or Name for config filter)
# Shows Report ID, Name, Config ID, Task ID, and Task Status
python -m plexus.cli.CommandLineInterface report list [--config <config_id_or_name>]

# Show details of a specific generated report (using ID or Name)
# Includes Report details, linked Task status/details, rendered output, and Report Block summary
python -m plexus.cli.CommandLineInterface report show <report_id_or_name>

# Show details of the most recently created report
python -m plexus.cli.CommandLineInterface report last

Report Block Inspection Commands

# List the analysis blocks for a specific report (requires Report ID)
python -m plexus.cli.CommandLineInterface report block list <report_id>

# Show details of a specific block within a report (requires Report ID and block position or name)
# Displays block details, output JSON (syntax highlighted), and logs
python -m plexus.cli.CommandLineInterface report block show <report_id> <block_position_or_name>

Additional Resources

For more detailed information about specific features:

Visit our Evaluations Guide
Check the built-in help with plexus --help
Get command-specific help with plexus evaluate accuracy --help

plexus CLI Tool