Testing philosophy

The Problem with Traditional Testing

MCP servers are fundamentally different from traditional APIs. They handle:

Natural language interactions instead of structured requests
Multi-turn conversations that build context over time
Dynamic tool discovery where capabilities are negotiated
Human-like workflows that don’t follow predictable patterns

Traditional testing approaches fail because:

Unit Tests Don’t Capture Reality

# This tells you nothing about user experience:
assert get_stories() == [{"title": "Story 1", "id": 1}]

Problems:

Doesn’t test natural language understanding
Misses conversation flow and context
Ignores the actual user experience
Can’t validate tool usage patterns

Integration Tests Are Too Brittle

curl -X POST /mcp -d '{"method": "get_stories"}' 
# Expected: {"stories": [...]}

Problems:

Breaks with minor response changes
Doesn’t test through actual MCP protocol
No validation of conversational behavior
Requires maintaining rigid expectations

Manual Testing Doesn’t Scale

Having humans manually test every conversation flow is:

Time-consuming - Each test takes 5-10 minutes
Inconsistent - Different testers, different results
Expensive - Requires dedicated QA resources
Limited coverage - Can’t test edge cases systematically

The AI Agent Approach

MCP Testing Framework uses AI agents to simulate realistic user interactions:

Real User Simulation

AI agents conduct actual conversations with your MCP server:

Diagram placeholder: Conversation flow diagram showing User Simulator → AI Agent (Claude) → Your MCP Server with example exchanges.

User Simulator → AI Agent (Claude) → Your MCP Server
     ↓               ↓                    ↓
"Find me news"  Uses tools           Returns stories
     ↓               ↓                    ↓  
"About tech"    Refines search       Better results
     ↓               ↓                    ↓
"Perfect!"      Natural response     User satisfied

Intelligence-Based Evaluation

Instead of brittle assertions, use LLM-as-a-Judge evaluation:

{
  "success_criteria": "Agent should help user find relevant tech news and confirm satisfaction",
  "judge_reasoning": "The agent successfully used search tools, found relevant tech stories, and confirmed the user was satisfied with the results. The conversation flow was natural and goal-oriented.",
  "verdict": "PASS"
}

Core Testing Principles

1. Test Behavior, Not Implementation

Traditional approach:

# Tests implementation details
assert server.search_stories("tech") == expected_json

AI testing approach:

{
  "user_message": "Find me tech news",
  "success_criteria": "User gets relevant tech news stories",
  "evaluation": "Judges whether goal was achieved"
}

2. Validate Complete User Journeys

Test end-to-end workflows that users actually experience:

{
  "test_id": "news_discovery_flow",
  "user_message": "I want to stay updated on AI developments",
  "success_criteria": "Agent helps user set up a reliable way to track AI news",
  "max_turns": 8,
  "expected_tools": ["search_stories", "subscribe_topic", "get_user_preferences"]
}

This tests:

Natural language understanding - Can the agent interpret “AI developments”?
Tool orchestration - Does it use search, then subscribe appropriately?
Conversation flow - Does it ask clarifying questions when needed?
Goal completion - Is the user actually helped?

3. Embrace Multi-Turn Interactions

Real users have conversations, not single requests:

Turn 1: "Help me find news"
        → Agent asks: "What topics interest you?"

Turn 2: "Technology and startups" 
        → Agent searches and shows results

Turn 3: "These are too general"
        → Agent refines search criteria

Turn 4: "Perfect! Can I save this search?"
        → Agent explains save functionality

Judge evaluates the complete interaction:

Did the conversation feel natural?
Was the user’s goal ultimately achieved?
Did the agent handle clarifications well?

Types of Testing

Conversational Testing

Test realistic user workflows through natural dialogue:

User goal achievement
Conversation quality
Context management
Tool usage appropriateness

Compliance Testing

Validate MCP protocol conformance:

Handshake negotiations
Capability discovery
Tool and resource availability
Error handling patterns

Security Testing

Test authentication and vulnerabilities:

Access control enforcement
Input validation effectiveness
Rate limiting behavior
Injection attack resistance

Next Steps

Now that you understand the philosophy:

Learn about Servers - How to configure servers for testing
Understand Test Suites - Structure comprehensive test coverage
Learn about Test Generation - Automatically generate test suites

Overview

Guides

Reference

Support

The Problem with Traditional Testing

Unit Tests Don’t Capture Reality

Integration Tests Are Too Brittle

Manual Testing Doesn’t Scale

The AI Agent Approach

Real User Simulation

Intelligence-Based Evaluation

Core Testing Principles

1. Test Behavior, Not Implementation

2. Validate Complete User Journeys

3. Embrace Multi-Turn Interactions

Types of Testing

Conversational Testing

Compliance Testing

Security Testing

Next Steps

Overview

Guides

Reference

Support

​The Problem with Traditional Testing

​Unit Tests Don’t Capture Reality

​Integration Tests Are Too Brittle

​Manual Testing Doesn’t Scale

​The AI Agent Approach

​Real User Simulation

​Intelligence-Based Evaluation

​Core Testing Principles

​1. Test Behavior, Not Implementation

​2. Validate Complete User Journeys

​3. Embrace Multi-Turn Interactions

​Types of Testing

​Conversational Testing

​Compliance Testing

​Security Testing

​Next Steps

The Problem with Traditional Testing

Unit Tests Don’t Capture Reality

Integration Tests Are Too Brittle

Manual Testing Doesn’t Scale

The AI Agent Approach

Real User Simulation

Intelligence-Based Evaluation

Core Testing Principles

1. Test Behavior, Not Implementation

2. Validate Complete User Journeys

3. Embrace Multi-Turn Interactions

Types of Testing

Conversational Testing

Compliance Testing

Security Testing

Next Steps