plexus CLI Tool

Master the command-line interface for managing your Plexus deployment.

Overview

The Plexus CLI tool provides a powerful command-line interface for managing your Plexus deployment, with a focus on evaluating and monitoring scorecard performance.

Installation

Install the Plexus CLI tool using pip:

pip install plexus-cli

Flexible Identifier System

The Plexus CLI uses a flexible identifier system that allows you to reference resources using different types of identifiers. This makes commands more intuitive and reduces the need to look up specific IDs.

Scorecard Identifiers

When using the --scorecard parameter, you can provide any of the following:

  • DynamoDB ID: The unique database identifier (e.g., e51cd5ec-1940-4d8e-abcc-faa851390112)
  • Name: The human-readable name (e.g., "Quality Assurance")
  • Key: The URL-friendly key (e.g., quality-assurance)
  • External ID: Your custom external identifier (e.g., qa-2023)

Examples:

# All of these commands do the same thing, using different identifier types plexus scorecards info --scorecard e51cd5ec-1940-4d8e-abcc-faa851390112 plexus scorecards info --scorecard "Quality Assurance" plexus scorecards info --scorecard quality-assurance plexus scorecards info --scorecard qa-2023

Score Identifiers

Similar to scorecards, scores can be referenced using various identifiers:

  • DynamoDB ID: The unique UUID assigned to the score
  • Name: The human-readable name of the score
  • Key: The machine-friendly key of the score
  • External ID: An optional external identifier for the score

When using the --score parameter, you can use any of these identifiers:

# Using DynamoDB ID plexus scores info --scorecard "Quality Assurance" --score 7a9b2c3d-4e5f-6g7h-8i9j-0k1l2m3n4o5p # Using Name (with quotes for names containing spaces) plexus scores info --scorecard "Quality Assurance" --score "Grammar Check" # Using Key plexus scores info --scorecard "Quality Assurance" --score grammar-check # Using External ID plexus scores info --scorecard "Quality Assurance" --score gc-001 # Combining different identifier types for scorecard and score plexus scores info --scorecard quality_assurance --score "Grammar Check"

The flexible identifier system makes it easy to reference scores in a way that's most convenient for your workflow. You can use different identifier types for the scorecard and score in the same command.

Account Identifiers

When using the --account parameter, you can provide any of the following:

  • DynamoDB ID: The unique database identifier
  • Name: The human-readable name
  • Key: The URL-friendly key

Common Scorecard Commands

Here are some common commands for managing scorecards:

# List all scorecards plexus scorecards list # Get detailed information about a specific scorecard plexus scorecards info --scorecard example1 # List all scores in a scorecard plexus scores list --scorecard example1 # Pull scorecard configuration to YAML plexus scorecards pull --scorecard example1 --output ./my-scorecards # Push scorecard configuration from YAML plexus scorecards push --scorecard example1 --file ./my-scorecard.yaml --note "Updated configuration" # Delete a scorecard plexus scorecards delete --scorecard example1

Score Management Commands

The CLI provides commands for managing and viewing information about scores.

Viewing Score Information

The scores info command displays detailed information about a specific score, including its versions:

plexus scores info --scorecard "Example Scorecard" --score "Example Score"

This command provides:

  • Score Details: Name, key, external ID, type, and order
  • Scorecard Information: Name, key, external ID, and section
  • Score Versions: Up to 10 versions in reverse chronological order (newest first)

Version Information

For each version, the command displays:

  • Version ID: Unique identifier for the version
  • Creation Date: When the version was created
  • Note: Any notes associated with the version (if available)
  • Configuration: The first 4 lines of the version's configuration
  • Status Indicators: Whether the version is the Champion (active) or Featured

Example output:

Score Information: Name: Grammar Check Key: grammar-check External ID: 123 Type: LangGraphScore Order: 1 Scorecard Information: Name: Quality Assurance Key: quality_assurance External ID: 456 Section: Default Score Versions (3 of 3 total versions, newest first): Version: 7a9b2c3d-4e5f-6g7h-8i9j-0k1l2m3n4o5p Created: 2023-10-15 14:30:45 Note: Updated prompt for better accuracy Configuration: { "prompt": "Evaluate the grammar of the following text...", "model": "gpt-4", // ... configuration continues ... } [Champion] Version: 8b9c3d4e-5f6g-7h8i-9j0k-1l2m3n4o5p6q Created: 2023-09-20 09:15:22 Configuration: { "prompt": "Check the following text for grammar errors...", "model": "gpt-3.5-turbo", // ... configuration continues ... } [Featured] Version: 9c0d4e5f-6g7h-8i9j-0k1l-2m3n4o5p6q7r Created: 2023-08-05 11:45:33 Configuration: { "prompt": "Analyze the grammar in this content...", "model": "gpt-3.5-turbo", // ... configuration continues ... }

This command displays up to 10 versions in reverse chronological order (newest first), showing which version is the champion and which versions are featured.

Listing Scores in a Scorecard

The scores list command displays all scores within a scorecard:

plexus scores list --scorecard "Example Scorecard" # You can also use the score alias (singular form) plexus score list --scorecard "Example Scorecard"

This command provides a detailed view of all scores organized by section, including:

  • Score Names: The human-readable names of each score
  • Score IDs: The unique identifiers for each score
  • Score Keys: The machine-friendly keys for each score
  • External IDs: Any external identifiers associated with the scores

Running Evaluations

The primary way to evaluate your scorecard's performance is using the evaluate accuracy command:

plexus \ evaluate \ accuracy \ --scorecard "Inbound Leads" \ --number-of-samples 100 \ --visualize

--scorecard: Scorecard to evaluate (accepts ID, name, key, or external ID)

--number-of-samples: Number of samples to evaluate (recommended: 100+)

--visualize: Generate visualizations of the results

This command will evaluate your scorecard against labeled samples and provide detailed accuracy metrics, including precision, recall, and confusion matrices when visualization is enabled.

Viewing Evaluation Results

After running evaluations, you can view the results:

# List all evaluation records plexus \ evaluations \ list # View detailed results plexus \ evaluations \ list-results \ --evaluation evaluation-id \ --limit 100

The results include accuracy metrics, individual predictions, and any visualizations that were generated during the evaluation.

Score Result Commands

The CLI provides commands for viewing and analyzing individual score results:

Listing Score Results

The results list command displays recent score results with optional filtering:

# List score results for a specific scorecard plexus results list --scorecard "Example Scorecard" --limit 20 # List score results for a specific account plexus results list --account "Example Account" --limit 20

This command requires either a scorecard or account identifier and provides:

  • Basic Information: ID, value, confidence, correct status, and related IDs
  • Timestamps: When the result was created and last updated
  • Metadata: Pretty-printed JSON showing input data and context
  • Trace: Detailed record of the evaluation process (when available)
  • Explanation: The reasoning behind the result (when available)

Viewing Detailed Score Result Information

The results info command displays detailed information about a specific score result:

plexus results info --id "result-id-here"

This command provides a comprehensive view of a single score result, including:

  • Complete Result Data: All fields and values associated with the result
  • Formatted Metadata: Nicely formatted JSON for easy reading
  • Formatted Trace: Detailed execution trace with clear visual separation
  • Relationship Information: Links to related entities like items, scorecards, and evaluations

This command is particularly useful for debugging evaluation issues or understanding exactly how a specific result was determined.

Report Commands

Manage report configurations and generated reports using the following commands. Remember to run commands from your project root using `python -m plexus.cli.CommandLineInterface ...` if you are working on the codebase locally, to avoid conflicts with globally installed versions.

Report Configuration Commands

# List available report configurations for your account python -m plexus.cli.CommandLineInterface report config list # Show details of a specific report configuration (using ID or Name) # Note: Uses the flexible identifier system (tries ID, then Name if it looks like UUID; otherwise Name then ID) python -m plexus.cli.CommandLineInterface report config show <id_or_name> # Create a new report configuration from a Markdown/YAML file python -m plexus.cli.CommandLineInterface report config create --name "My Report Config" --file ./path/to/config.md [--description "Optional description"] # Delete a report configuration (prompts for confirmation) python -m plexus.cli.CommandLineInterface report config delete <id_or_name> # Delete a report configuration (skip confirmation prompt) python -m plexus.cli.CommandLineInterface report config delete <id_or_name> --yes

Report Generation and Viewing Commands

# Trigger a new report generation task based on a configuration (using ID or Name for config) python -m plexus.cli.CommandLineInterface report run --config <config_id_or_name> [param1=value1 param2=value2 ...] # List generated reports, optionally filtered by configuration (using ID or Name for config filter) # Shows Report ID, Name, Config ID, Task ID, and Task Status python -m plexus.cli.CommandLineInterface report list [--config <config_id_or_name>] # Show details of a specific generated report (using ID or Name) # Includes Report details, linked Task status/details, rendered output, and Report Block summary python -m plexus.cli.CommandLineInterface report show <report_id_or_name> # Show details of the most recently created report python -m plexus.cli.CommandLineInterface report last

Report Block Inspection Commands

# List the analysis blocks for a specific report (requires Report ID) python -m plexus.cli.CommandLineInterface report block list <report_id> # Show details of a specific block within a report (requires Report ID and block position or name) # Displays block details, output JSON (syntax highlighted), and logs python -m plexus.cli.CommandLineInterface report block show <report_id> <block_position_or_name>

Additional Resources

For more detailed information about specific features:

  • Visit our Evaluations Guide
  • Check the built-in help with plexus --help
  • Get command-specific help with plexus evaluate accuracy --help