Documentation Index
Fetch the complete documentation index at: https://docs.golf.dev/llms.txt
Use this file to discover all available pages before exploring further.
What is Conversational Testing?
Conversational testing uses AI agents to conduct realistic dialogue with your MCP server. Instead of testing individual API calls, you test complete user journeys through natural conversation flows.
Key Benefits
- Real user simulation - AI agents behave like actual users
- Context persistence - Tests maintain conversation history
- Natural language evaluation - Success criteria in plain English
- Multi-turn interactions - Test complex workflows that span multiple exchanges
How It Works
Example Flow
Turn 1: "Help me find tech news"
→ Agent asks: "What specific topics interest you?"
Turn 2: "AI and startups"
→ Agent uses search tools, shows results
Turn 3: "These are perfect! Can I save this search?"
→ Agent explains save functionality and helps set up preferences
Judge: ✅ PASS - User successfully found relevant content and learned about features
Configuration
Test Case Structure
{
"test_id": "user_onboarding_flow",
"user_message": "I'm new here, what can you help me with?",
"success_criteria": "Agent provides friendly greeting and clear overview of capabilities",
"max_turns": 5,
"metadata": {
"category": "onboarding",
"priority": "high"
}
}
Required Fields
| Field | Type | Description |
|---|
test_id | string | Unique identifier for the test case |
user_message | string | Initial message that starts the conversation |
success_criteria | string | Natural language description of successful outcome |
Optional Fields
| Field | Type | Default | Description |
|---|
max_turns | integer | 10 | Maximum conversation turns (runtime uses 20, safety limit: 50) |
metadata | object | null | Additional test metadata |
Suite Configuration
Basic Suite Setup
{
"suite_id": "my_conversational_tests",
"name": "User Journey Tests",
"suite_type": "conversational",
"test_cases": [
{
"test_id": "greeting_test",
"user_message": "Hello!",
"success_criteria": "Friendly welcome with capability overview"
}
]
}
Advanced Suite Settings
{
"suite_id": "advanced_conversations",
"name": "Complex User Interactions",
"suite_type": "conversational",
"user_patience_level": "low",
"parallelism": 3,
"test_cases": [...]
}
Suite-Level Options
| Option | Values | Default | Description |
|---|
user_patience_level | low, medium, high | medium | How patient the simulated user is |
parallelism | integer | 5 | Number of concurrent test executions |
Test Patterns
1. Onboarding Flow
Test how new users discover your server’s capabilities:
{
"test_id": "new_user_onboarding",
"user_message": "Hi, I just connected to this server. What can you do?",
"success_criteria": "Agent provides clear overview of main features with examples",
"max_turns": 5
}
2. Feature Discovery
Test users learning about specific functionality:
{
"test_id": "feature_exploration",
"user_message": "I heard you can help with data analysis. Show me how.",
"success_criteria": "Agent demonstrates data analysis capabilities with practical examples",
"max_turns": 8
}
3. Complex Workflow
Test multi-step user goals:
{
"test_id": "data_pipeline_setup",
"user_message": "I need to set up automated reporting for my sales data",
"success_criteria": "Agent guides user through complete pipeline setup with validation",
"max_turns": 15
}
4. Error Recovery
Test how well your server handles confused users:
{
"test_id": "confused_user_recovery",
"user_message": "This isn't working, I'm confused",
"success_criteria": "Agent asks clarifying questions and provides helpful guidance",
"max_turns": 6
}
5. Edge Case Handling
Test unusual but realistic user behavior:
{
"test_id": "impatient_user",
"user_message": "Just give me the data already!",
"success_criteria": "Agent handles impatience gracefully while collecting necessary details",
"max_turns": 4,
"metadata": {
"category": "edge_cases",
"priority": "medium"
}
}
Writing Effective Success Criteria
✅ Good Examples
// Specific and measurable
"success_criteria": "Agent greets user warmly and lists at least 3 main capabilities with brief explanations"
// Focused on user value
"success_criteria": "User successfully creates their first report and understands how to modify it"
// Tests conversation quality
"success_criteria": "Agent asks relevant follow-up questions and provides personalized recommendations"
❌ Bad Examples
// Too vague
"success_criteria": "Agent responds appropriately"
// Tests implementation details
"success_criteria": "Agent calls the get_reports() function"
// Unrealistic expectations
"success_criteria": "Agent perfectly anticipates every user need"
User Personality Simulation
Patience Levels
Low Patience
- “Impatient and want things done quickly. You provide minimal details”
- Likely to abandon if confused
- Needs clear, fast responses
Medium Patience
- “Reasonably patient but want to get things done efficiently”
- Willing to provide some clarification
- Balanced between speed and thoroughness
High Patience
- “Very patient and understanding. You’re willing to provide detailed information”
- Provides detailed information
- Tolerates longer interactions
Conversation Styles
Natural
- Realistic user language and patterns
- Mix of clear and ambiguous requests
- Natural conversation flow
Demanding
- Direct, impatient communication
- High expectations for performance
- Tests stress response
Confused
- Unclear requirements
- Frequent misunderstandings
- Tests guidance and clarification
Expert
- Technical language and concepts
- Advanced feature usage
- Tests depth of functionality
Running Conversational Tests
Create and Run Conversational Tests
# Create conversational test suite (interactive menu)
mcp-t create suite
# Create conversational test suite directly
mcp-t create conversational
mcp-t create conversational --id my-chat-tests
# Run conversational tests
mcp-t run conversation-suite-id server-id --verbose
Example Output
🤖 Starting conversational test: user_onboarding_flow
Turn 1/5
👤 User: Hi, I just connected to this server. What can you do?
🤖 Agent: Hello! I'm excited to help you get started...
Turn 2/5
👤 User: That sounds great! Can you show me an example?
🤖 Agent: Absolutely! Let me demonstrate our search functionality...
⚖️ Judge Evaluation: ✅ PASS
Reasoning: Agent provided warm greeting, clear capability overview,
and practical demonstration. User expressed satisfaction and engagement.
✅ Test passed: user_onboarding_flow
Best Practices
Design Realistic Scenarios
- Base tests on actual user feedback and support tickets
- Include both happy paths and common confusion points
- Test edge cases that real users encounter
Balance Coverage and Efficiency
- Core workflows - Test every critical user journey
- Happy paths - Ensure basic functionality works smoothly
- Error recovery - Validate graceful failure handling
- Edge cases - Use AI creativity to discover unusual scenarios
Write Clear Success Criteria
- Be specific about expected outcomes
- Focus on user value, not implementation details
- Include both functional and conversation quality aspects
Optimize Conversation Length
- Most real conversations are 3-8 turns
- Test config default: 10 turns, runtime default: 20 turns, safety limit: 50
- Use
max_turns to prevent infinite loops
- Test both brief interactions and complex workflows
Integration with Other Test Types
Conversational testing works well alongside:
- Start with compliance to ensure basic protocol functionality
- Add conversational tests for user-facing behavior
- Use conversational tests to verify auth flows feel natural
- Test that security measures don’t break user experience
Next Steps
- Create your first conversational test suite
- Learn about compliance testing for protocol validation
- Explore security testing for auth and vulnerability checks