plexus CLI Tool

Master the command-line interface for managing your Plexus deployment.

Overview

The Plexus CLI tool provides a powerful command-line interface for managing your Plexus deployment, with a focus on evaluating and monitoring scorecard performance.

Installation

Install the Plexus CLI tool using pip:

pip install plexus-cli

Flexible Identifier System

The Plexus CLI uses a flexible identifier system that allows you to reference resources using different types of identifiers. This makes commands more intuitive and reduces the need to look up specific IDs.

Scorecard Identifiers

When using the --scorecard parameter, you can provide any of the following:

  • DynamoDB ID: The unique database identifier (e.g., e51cd5ec-1940-4d8e-abcc-faa851390112)
  • Name: The human-readable name (e.g., "Quality Assurance")
  • Key: The URL-friendly key (e.g., quality-assurance)
  • External ID: Your custom external identifier (e.g., qa-2023)

Examples:

# All of these commands do the same thing, using different identifier types plexus scorecards info --scorecard e51cd5ec-1940-4d8e-abcc-faa851390112 plexus scorecards info --scorecard "Quality Assurance" plexus scorecards info --scorecard quality-assurance plexus scorecards info --scorecard qa-2023

Score Identifiers

Similar to scorecards, scores can be referenced using various identifiers:

  • DynamoDB ID: The unique UUID assigned to the score
  • Name: The human-readable name of the score
  • Key: The machine-friendly key of the score
  • External ID: An optional external identifier for the score

When using the --score parameter, you can use any of these identifiers:

# Using DynamoDB ID plexus scores info --scorecard "Quality Assurance" --score 7a9b2c3d-4e5f-6g7h-8i9j-0k1l2m3n4o5p # Using Name (with quotes for names containing spaces) plexus scores info --scorecard "Quality Assurance" --score "Grammar Check" # Using Key plexus scores info --scorecard "Quality Assurance" --score grammar-check # Using External ID plexus scores info --scorecard "Quality Assurance" --score gc-001 # Combining different identifier types for scorecard and score plexus scores info --scorecard quality_assurance --score "Grammar Check"

The flexible identifier system makes it easy to reference scores in a way that's most convenient for your workflow. You can use different identifier types for the scorecard and score in the same command.

Account Identifiers

When using the --account parameter, you can provide any of the following:

  • DynamoDB ID: The unique database identifier
  • Name: The human-readable name
  • Key: The URL-friendly key

Common Scorecard Commands

Here are some common commands for managing scorecards:

# List all scorecards plexus scorecards list # Get detailed information about a specific scorecard plexus scorecards info --scorecard example1 # List all scores in a scorecard plexus scores list --scorecard example1 # Pull scorecard configuration to YAML plexus scorecards pull --scorecard example1 --output ./my-scorecards # Push scorecard configuration from YAML plexus scorecards push --scorecard example1 --file ./my-scorecard.yaml --note "Updated configuration" # Delete a scorecard plexus scorecards delete --scorecard example1

Score Management Commands

The CLI provides commands for managing and viewing information about scores.

Viewing Score Information

The scores info command displays detailed information about a specific score, including its versions:

plexus scores info --scorecard "Example Scorecard" --score "Example Score"

This command provides:

  • Score Details: Name, key, external ID, type, and order
  • Scorecard Information: Name, key, external ID, and section
  • Score Versions: Up to 10 versions in reverse chronological order (newest first)

Version Information

For each version, the command displays:

  • Version ID: Unique identifier for the version
  • Creation Date: When the version was created
  • Note: Any notes associated with the version (if available)
  • Configuration: The first 4 lines of the version's configuration
  • Status Indicators: Whether the version is the Champion (active) or Featured

Example output:

Score Information: Name: Grammar Check Key: grammar-check External ID: 123 Type: LangGraphScore Order: 1 Scorecard Information: Name: Quality Assurance Key: quality_assurance External ID: 456 Section: Default Score Versions (3 of 3 total versions, newest first): Version: 7a9b2c3d-4e5f-6g7h-8i9j-0k1l2m3n4o5p Created: 2023-10-15 14:30:45 Note: Updated prompt for better accuracy Configuration: { "prompt": "Evaluate the grammar of the following text...", "model": "gpt-4", // ... configuration continues ... } [Champion] Version: 8b9c3d4e-5f6g-7h8i-9j0k-1l2m3n4o5p6q Created: 2023-09-20 09:15:22 Configuration: { "prompt": "Check the following text for grammar errors...", "model": "gpt-3.5-turbo", // ... configuration continues ... } [Featured] Version: 9c0d4e5f-6g7h-8i9j-0k1l-2m3n4o5p6q7r Created: 2023-08-05 11:45:33 Configuration: { "prompt": "Analyze the grammar in this content...", "model": "gpt-3.5-turbo", // ... configuration continues ... }

This command displays up to 10 versions in reverse chronological order (newest first), showing which version is the champion and which versions are featured.

Listing Scores in a Scorecard

The scores list command displays all scores within a scorecard:

plexus scores list --scorecard "Example Scorecard" # You can also use the score alias (singular form) plexus score list --scorecard "Example Scorecard"

This command provides a detailed view of all scores organized by section, including:

  • Score Names: The human-readable names of each score
  • Score IDs: The unique identifiers for each score
  • Score Keys: The machine-friendly keys for each score
  • External IDs: Any external identifiers associated with the scores

Running Evaluations

The primary way to evaluate your scorecard's performance is using the evaluate accuracy command:

plexus \ evaluate \ accuracy \ --scorecard "Inbound Leads" \ --number-of-samples 100 \ --visualize

--scorecard: Scorecard to evaluate (accepts ID, name, key, or external ID)

--number-of-samples: Number of samples to evaluate (recommended: 100+)

--visualize: Generate visualizations of the results

This command will evaluate your scorecard against labeled samples and provide detailed accuracy metrics, including precision, recall, and confusion matrices when visualization is enabled.

Viewing Evaluation Results

After running evaluations, you can view the results:

# List all evaluation records plexus \ evaluations \ list # View detailed results plexus \ evaluations \ list-results \ --evaluation evaluation-id \ --limit 100

The results include accuracy metrics, individual predictions, and any visualizations that were generated during the evaluation.

Score Result Commands

The CLI provides commands for viewing and analyzing individual score results:

Listing Score Results

The results list command displays recent score results with optional filtering:

# List score results for a specific scorecard plexus results list --scorecard "Example Scorecard" --limit 20 # List score results for a specific account plexus results list --account "Example Account" --limit 20

This command requires either a scorecard or account identifier and provides:

  • Basic Information: ID, value, confidence, correct status, and related IDs
  • Timestamps: When the result was created and last updated
  • Metadata: Pretty-printed JSON showing input data and context
  • Trace: Detailed record of the evaluation process (when available)
  • Explanation: The reasoning behind the result (when available)

Viewing Detailed Score Result Information

The results info command displays detailed information about a specific score result:

plexus results info --id "result-id-here"

This command provides a comprehensive view of a single score result, including:

  • Complete Result Data: All fields and values associated with the result
  • Formatted Metadata: Nicely formatted JSON for easy reading
  • Formatted Trace: Detailed execution trace with clear visual separation
  • Relationship Information: Links to related entities like items, scorecards, and evaluations

This command is particularly useful for debugging evaluation issues or understanding exactly how a specific result was determined.

Additional Resources

For more detailed information about specific features:

  • Visit our Evaluations Guide
  • Check the built-in help with plexus --help
  • Get command-specific help with plexus evaluate accuracy --help