Metadata-Version: 2.4
Name: mcp-observability-server
Version: 0.1.3
Summary: A Model Context Protocol (MCP) server for querying logs from multiple observability platforms (New Relic, Azure)
Project-URL: Homepage, https://github.com/yourusername/mcp-observability-server
Project-URL: Documentation, https://github.com/yourusername/mcp-observability-server#readme
Project-URL: Repository, https://github.com/yourusername/mcp-observability-server
Project-URL: Issues, https://github.com/yourusername/mcp-observability-server/issues
Project-URL: Changelog, https://github.com/yourusername/mcp-observability-server/releases
Author: MCP Observability Contributors
Maintainer: MCP Observability Contributors
License: MIT
License-File: LICENSE
Keywords: azure,logging,mcp,model-context-protocol,monitoring,new-relic,observability,sre
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Logging
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.12
Requires-Dist: aiofiles>=24.1.0
Requires-Dist: azure-identity>=1.25.2
Requires-Dist: azure-monitor-query>=1.5.0
Requires-Dist: boto3>=1.35.0
Requires-Dist: debugpy>=1.8.20
Requires-Dist: httpx>=0.28.1
Requires-Dist: mcp[cli]>=1.26.0
Requires-Dist: pydantic>=2.12.5
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: pyyaml>=6.0.3
Provides-Extra: dev
Requires-Dist: black>=24.0.0; extra == 'dev'
Requires-Dist: isort>=5.13.0; extra == 'dev'
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pre-commit>=3.6.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.3.0; extra == 'dev'
Description-Content-Type: text/markdown

# MCP Observability Server

mcp-name: io.github.gagandeeppra/mcp-observability-server

A Model Context Protocol (MCP) server that enables Claude to query logs from multiple observability platforms simultaneously. Perfect for SRE workflows, incident investigation, and distributed tracing.

## Supported Platforms

- **New Relic** - Query logs using NRQL
- **Azure Application Insights** - Query logs using Kusto Query Language (KQL)

## Features

- 🔍 **Unified Search** - Search across all platforms with a single query
- 🎯 **Severity Filtering** - Filter by log levels (debug, info, warning, error, critical)
- 🔗 **Distributed Tracing** - Find all logs related to a trace ID across platforms
- ⚡ **Concurrent Queries** - Queries all providers in parallel for fast results
- 📊 **Recent Errors** - Quick access to recent error logs across all systems
- 🏥 **Health Checks** - Verify connectivity to all configured providers
- 🤖 **Guided Workflows** - Pre-built prompts for incident investigation, deployment validation, and root cause analysis

## Installation

### From PyPI

```bash
pip install mcp-observability-server
```

### From Source

```bash
git clone https://github.com/yourusername/mcp-observability-server.git
cd mcp-observability-server
pip install -e .
```

## Configuration

### 1. Create Configuration File

Copy the example configuration:

```bash
cp config.yaml.example config.yaml
```

Edit `config.yaml` with your credentials:

```yaml
providers:
  newrelic:
    enabled: true
    api_key: ${NEW_RELIC_API_KEY}
    account_id: "1234567"
    region: "US"
  
  azure:
    enabled: true
    workspace_id: ${AZURE_WORKSPACE_ID}
    client_id: ${AZURE_CLIENT_ID}
    client_secret: ${AZURE_CLIENT_SECRET}
    tenant_id: ${AZURE_TENANT_ID}
```

### 2. Set Environment Variables

Copy and configure environment variables:

```bash
cp .env.example .env
```

Edit `.env` with your actual credentials.

### 3. Configure Claude Desktop

Add to your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):

```json
{
  "mcpServers": {
    "observability": {
      "command": "python",
      "args": ["-m", "mcp_observability.server", "/path/to/config.yaml"]
    }
  }
}
```

## Usage

Once configured, you can ask Claude to query your logs:

### Example Queries

**Search for errors in the last hour:**
```
Show me all errors from the last hour across all platforms
```

**Search specific text:**
```
Find logs containing "timeout" from the last 30 minutes
```

**Filter by service:**
```
Show me warning and error logs from the api-gateway service in the last 2 hours
```

**Distributed tracing:**
```
Find all logs related to trace ID abc123-def456
```

**Recent errors:**
```
What errors have occurred in the last 15 minutes?
```

### Available Tools

The server exposes these tools to Claude:

#### `query_logs`
Search logs across all platforms with flexible filtering.

**Parameters:**
- `start_time` (required) - ISO format or relative (e.g., "1h", "30m", "2d")
- `end_time` (optional) - Defaults to now
- `query` (optional) - Text to search for
- `severity` (optional) - Array of severity levels
- `service_name` (optional) - Filter by service
- `limit` (optional) - Max results (default: 100)
- `providers` (optional) - Specific providers to query

#### `get_recent_errors`
Quick access to recent error and critical logs.

**Parameters:**
- `minutes` (optional) - Look back period (default: 60)
- `limit` (optional) - Max results per provider (default: 100)
- `service_name` (optional) - Filter by service

#### `search_by_trace_id`
Find all logs associated with a distributed trace.

**Parameters:**
- `trace_id` (required) - The trace ID to search for
- `start_time` (optional) - Defaults to 24 hours ago
- `end_time` (optional) - Defaults to now

#### `health_check`
Verify connectivity to all configured providers.

### Guided Workflows (Prompts)

The server provides guided prompts for common SRE workflows. Prompts chain multiple tools together and provide structured analysis frameworks.

#### `investigate-incident`
Systematic incident investigation workflow.

**Use for:** Active production incidents requiring thorough investigation  
**Parameters:**
- `service_name` (optional) - Service to investigate
- `time_period` (default: "1h") - Investigation time window
- `severity_threshold` (default: "error") - Minimum severity

**Example:**
```
Use the investigate-incident prompt for api-gateway service
```

**Workflow:** Recent errors → Pattern analysis → Trace investigation → Health checks → Summary with recommendations

#### `health-check-report`
Generate comprehensive health status report.

**Use for:** Daily health checks, system status overviews  
**Parameters:**
- `time_period` (default: "24h") - Error statistics period
- `include_metrics` (default: true) - Include detailed metrics

**Example:**
```
Generate a health check report
```

**Workflow:** Provider health → Error analysis → Service catalog → Active traces → Recommendations

#### `post-deployment-check`
Validate deployment health by comparing before/after metrics.

**Use for:** Post-deployment validation, CI/CD pipelines  
**Parameters:**
- `service_name` (required) - Deployed service name
- `deployment_time` (optional) - When deployment occurred
- `lookback_minutes` (default: 30) - Baseline comparison period

**Example:**
```
Run a post-deployment check for user-service
```

**Workflow:** Current errors → Baseline comparison → New error detection → Trace analysis → Health recommendation (PROCEED/MONITOR/ROLLBACK)

#### `trace-flow-analysis`
Analyze distributed trace execution flow and timing.

**Use for:** Debugging distributed systems, understanding request flow  
**Parameters:**
- `trace_id` (required) - Trace ID to analyze
- `include_timing` (default: true) - Include timing breakdown

**Example:**
```
Analyze trace flow for abc123-def456
```

**Workflow:** Timeline construction → Service chain mapping → Timing analysis → Error detection → Bottleneck identification → Root cause

#### `root-cause-analysis`
Deep root cause investigation for complex failures.

**Use for:** Finding originating causes, cascading failure analysis  
**Parameters:**
- `trace_id` (optional) - Specific trace to investigate
- `error_pattern` (optional) - Known error pattern
- `time_window` (default: "1h") - Investigation window

**Example:**
```
Perform root cause analysis for error "database connection timeout"
```

**Workflow:** Evidence gathering → Timeline building → Trace flow → Pattern recognition → Root cause formulation → Prevention recommendations

See [Prompts README](src/mcp_observability/prompts/README.md) for detailed documentation.

## Platform-Specific Configuration

### New Relic

1. Create an API key in New Relic (User > API Keys)
2. Find your account ID in the URL or account dropdown
3. Choose region: "US" or "EU"

### Azure Application Insights

1. Create a service principal in Azure AD
2. Grant "Log Analytics Reader" role to the service principal
3. Note the workspace ID, client ID, client secret, and tenant ID

## Development

### Setup Development Environment

```bash
# Clone repository
git clone https://github.com/yourusername/mcp-observability-server.git
cd mcp-observability-server

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black src/
ruff check src/
```

### Testing with MCP Inspector

Test the server interactively using the MCP Inspector:

```bash
npx @modelcontextprotocol/inspector \
  uv \
  --directory /home/gagan/mcp-observability-server \
  run \
  mcp-observability \
  /home/gagan/mcp-observability-server/config.yaml
```

Or using the Python module directly:

```bash
npx @modelcontextprotocol/inspector \
  uv \
  --directory /home/gagan/mcp-observability-server \
  run \
  python \
  -m \
  mcp_observability.server \
  /home/gagan/mcp-observability-server/config.yaml
```

### Project Structure

```
mcp-observability-server/
├── src/
│   └── mcp_observability/
│       ├── __init__.py
│       ├── server.py           # Main MCP server
│       ├── models.py            # Data models
│       ├── utils.py             # Utilities
│       ├── prompts/             # Guided workflow prompts
│       │   ├── __init__.py
│       │   ├── incident.py      # Incident investigation prompts
│       │   ├── health.py        # Health monitoring prompts
│       │   ├── deployment.py    # Deployment validation prompts
│       │   ├── trace_analysis.py # Trace flow analysis prompts
│       │   └── README.md        # Prompts documentation
│       └── providers/
│           ├── base.py          # Abstract base
│           ├── newrelic.py
│           ├── azure.py
│           
│           
├── tests/
├── config.yaml.example
├── .env.example
└── pyproject.toml
```

### Running Tests

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=mcp_observability

# Run specific test file
pytest tests/test_providers.py
```

### Logging

The server includes comprehensive logging to help with debugging and monitoring:

**Configure Log Level:**

Set the `MCP_LOG_LEVEL` environment variable:

```bash
# In your .env file or environment
export MCP_LOG_LEVEL=DEBUG  # Options: DEBUG, INFO, WARNING, ERROR, CRITICAL
```

**Log Levels:**
- `DEBUG` - Detailed diagnostic information (queries, parameters, API calls)
- `INFO` - General informational messages (default)
- `WARNING` - Warning messages for potential issues
- `ERROR` - Error messages for failures
- `CRITICAL` - Critical issues that prevent operation

**What Gets Logged:**
- Server initialization and configuration loading
- Provider initialization and health checks
- Tool invocations with parameters
- Query execution and results
- API calls to observability platforms
- Errors and exceptions with stack traces

**Example Log Output:**

```
2026-02-13 10:30:15 - mcp_observability.server - INFO - Starting MCP Observability Server
2026-02-13 10:30:15 - mcp_observability.utils - INFO - Loading config from: config.yaml
2026-02-13 10:30:15 - mcp_observability.providers.newrelic - INFO - New Relic provider initialized for region: US
2026-02-13 10:30:20 - mcp_observability.server - INFO - Tool called: query_logs
2026-02-13 10:30:21 - mcp_observability.providers.newrelic - INFO - New Relic query returned 42 log(s)
```

## Troubleshooting

### Common Issues

**"Provider unhealthy" in health check**
- Verify credentials are correct in config.yaml
- Check environment variables are set
- Ensure network connectivity to provider API

**"No logs found"**
- Verify time range includes the period you're interested in
- Check that log groups/workspaces are configured correctly
- Ensure services are actually logging during the time period

**AWS credentials error**
- If using IAM role, ensure instance has correct permissions
- If using access keys, verify they're correct in .env
- Check AWS region matches where your logs are

**Timeout errors**
- Increase timeout_seconds in provider config
- Reduce limit to fetch fewer results
- Check network connectivity

**Enable debug logging**
- Set `MCP_LOG_LEVEL=DEBUG` in your environment or .env file
- Check logs for detailed query information and API responses
- Review stack traces for error details

## Performance Tips

1. **Specify log groups** - specify exact log groups instead of querying all
2. **Use time ranges wisely** - Shorter time ranges return faster
3. **Limit results** - Start with smaller limits and increase if needed
4. **Filter by service** - Reduces data scanned across all platforms
5. **Use recent_errors** - Optimized query for error investigation

## Security Best Practices

1. **Never commit credentials** - Use environment variables or secrets manager
2. **Rotate keys regularly** - Set up key rotation for all platforms
3. **Principle of least privilege** - Grant only read permissions needed
4. **Audit access** - Monitor who's using the MCP server
5. **Secure config files** - Restrict file permissions on config.yaml

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

## License

MIT License - see LICENSE file for details

## Support

- **Issues**: https://github.com/yourusername/mcp-observability-server/issues
- **Discussions**: https://github.com/yourusername/mcp-observability-server/discussions
- **Documentation**: https://docs.example.com/mcp-observability

## Roadmap

- [ ] Support for more providers (Splunk, Elastic, Grafana Loki)
- [ ] Advanced query builders
- [ ] Log analytics and pattern detection
- [ ] Alerting integration
- [ ] Performance metrics collection
- [ ] Custom query templates
- [ ] Multi-account support per provider

## Acknowledgments

Built with the [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic.