Skip to main content

What is Conversational Testing?

Conversational testing uses AI agents to conduct realistic dialogue with your MCP server. Instead of testing individual API calls, you test complete user journeys through natural conversation flows.

Key Benefits

  • Real user simulation - AI agents behave like actual users
  • Context persistence - Tests maintain conversation history
  • Natural language evaluation - Success criteria in plain English
  • Multi-turn interactions - Test complex workflows that span multiple exchanges

How It Works

Example Flow

Turn 1: "Help me find tech news"
        → Agent asks: "What specific topics interest you?"

Turn 2: "AI and startups"  
        → Agent uses search tools, shows results

Turn 3: "These are perfect! Can I save this search?"
        → Agent explains save functionality and helps set up preferences

Judge: ✅ PASS - User successfully found relevant content and learned about features

Configuration

Test Case Structure

{
  "test_id": "user_onboarding_flow",
  "user_message": "I'm new here, what can you help me with?",
  "success_criteria": "Agent provides friendly greeting and clear overview of capabilities",
  "max_turns": 5,
  "metadata": {
    "category": "onboarding",
    "priority": "high"
  }
}

Required Fields

FieldTypeDescription
test_idstringUnique identifier for the test case
user_messagestringInitial message that starts the conversation
success_criteriastringNatural language description of successful outcome

Optional Fields

FieldTypeDefaultDescription
max_turnsinteger10Maximum conversation turns (runtime uses 20, safety limit: 50)
metadataobjectnullAdditional test metadata

Suite Configuration

Basic Suite Setup

{
  "suite_id": "my_conversational_tests",
  "name": "User Journey Tests", 
  "suite_type": "conversational",
  "test_cases": [
    {
      "test_id": "greeting_test",
      "user_message": "Hello!",
      "success_criteria": "Friendly welcome with capability overview"
    }
  ]
}

Advanced Suite Settings

{
  "suite_id": "advanced_conversations",
  "name": "Complex User Interactions",
  "suite_type": "conversational", 
  "user_patience_level": "low",
  "parallelism": 3,
  "test_cases": [...]
}

Suite-Level Options

OptionValuesDefaultDescription
user_patience_levellow, medium, highmediumHow patient the simulated user is
parallelisminteger5Number of concurrent test executions

Test Patterns

1. Onboarding Flow

Test how new users discover your server’s capabilities:
{
  "test_id": "new_user_onboarding",
  "user_message": "Hi, I just connected to this server. What can you do?",
  "success_criteria": "Agent provides clear overview of main features with examples",
  "max_turns": 5
}

2. Feature Discovery

Test users learning about specific functionality:
{
  "test_id": "feature_exploration", 
  "user_message": "I heard you can help with data analysis. Show me how.",
  "success_criteria": "Agent demonstrates data analysis capabilities with practical examples",
  "max_turns": 8
}

3. Complex Workflow

Test multi-step user goals:
{
  "test_id": "data_pipeline_setup",
  "user_message": "I need to set up automated reporting for my sales data",
  "success_criteria": "Agent guides user through complete pipeline setup with validation",
  "max_turns": 15
}

4. Error Recovery

Test how well your server handles confused users:
{
  "test_id": "confused_user_recovery",
  "user_message": "This isn't working, I'm confused",
  "success_criteria": "Agent asks clarifying questions and provides helpful guidance",
  "max_turns": 6
}

5. Edge Case Handling

Test unusual but realistic user behavior:
{
  "test_id": "impatient_user",
  "user_message": "Just give me the data already!",
  "success_criteria": "Agent handles impatience gracefully while collecting necessary details",
  "max_turns": 4,
  "metadata": {
    "category": "edge_cases",
    "priority": "medium"
  }
}

Writing Effective Success Criteria

✅ Good Examples

// Specific and measurable
"success_criteria": "Agent greets user warmly and lists at least 3 main capabilities with brief explanations"

// Focused on user value
"success_criteria": "User successfully creates their first report and understands how to modify it"

// Tests conversation quality  
"success_criteria": "Agent asks relevant follow-up questions and provides personalized recommendations"

❌ Bad Examples

// Too vague
"success_criteria": "Agent responds appropriately"

// Tests implementation details
"success_criteria": "Agent calls the get_reports() function"

// Unrealistic expectations
"success_criteria": "Agent perfectly anticipates every user need"

User Personality Simulation

Patience Levels

Low Patience
  • “Impatient and want things done quickly. You provide minimal details”
  • Likely to abandon if confused
  • Needs clear, fast responses
Medium Patience
  • “Reasonably patient but want to get things done efficiently”
  • Willing to provide some clarification
  • Balanced between speed and thoroughness
High Patience
  • “Very patient and understanding. You’re willing to provide detailed information”
  • Provides detailed information
  • Tolerates longer interactions

Conversation Styles

Natural
  • Realistic user language and patterns
  • Mix of clear and ambiguous requests
  • Natural conversation flow
Demanding
  • Direct, impatient communication
  • High expectations for performance
  • Tests stress response
Confused
  • Unclear requirements
  • Frequent misunderstandings
  • Tests guidance and clarification
Expert
  • Technical language and concepts
  • Advanced feature usage
  • Tests depth of functionality

Running Conversational Tests

Create and Run Conversational Tests

# Create conversational test suite (interactive menu)
mcp-t create suite

# Create conversational test suite directly
mcp-t create conversational
mcp-t create conversational --id my-chat-tests

# Run conversational tests
mcp-t run conversation-suite-id server-id --verbose

Example Output

🤖 Starting conversational test: user_onboarding_flow

Turn 1/5
👤 User: Hi, I just connected to this server. What can you do?
🤖 Agent: Hello! I'm excited to help you get started...

Turn 2/5  
👤 User: That sounds great! Can you show me an example?
🤖 Agent: Absolutely! Let me demonstrate our search functionality...

⚖️  Judge Evaluation: ✅ PASS
   Reasoning: Agent provided warm greeting, clear capability overview,
   and practical demonstration. User expressed satisfaction and engagement.

✅ Test passed: user_onboarding_flow

Best Practices

Design Realistic Scenarios

  • Base tests on actual user feedback and support tickets
  • Include both happy paths and common confusion points
  • Test edge cases that real users encounter

Balance Coverage and Efficiency

  • Core workflows - Test every critical user journey
  • Happy paths - Ensure basic functionality works smoothly
  • Error recovery - Validate graceful failure handling
  • Edge cases - Use AI creativity to discover unusual scenarios

Write Clear Success Criteria

  • Be specific about expected outcomes
  • Focus on user value, not implementation details
  • Include both functional and conversation quality aspects

Optimize Conversation Length

  • Most real conversations are 3-8 turns
  • Test config default: 10 turns, runtime default: 20 turns, safety limit: 50
  • Use max_turns to prevent infinite loops
  • Test both brief interactions and complex workflows

Integration with Other Test Types

Conversational testing works well alongside:

Compliance Testing

  • Start with compliance to ensure basic protocol functionality
  • Add conversational tests for user-facing behavior

Security Testing

  • Use conversational tests to verify auth flows feel natural
  • Test that security measures don’t break user experience

Next Steps

  1. Create your first conversational test suite
  2. Learn about compliance testing for protocol validation
  3. Explore security testing for auth and vulnerability checks