Metadata-Version: 2.3
Name: grabba-mcp
Version: 0.0.5
Summary: Model Context Protocol server for the Grabba API.
License: Proprietary
Author: Bolu Agbana
Author-email: obaa@obaa.io
Requires-Python: >=3.10,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: fastmcp (>=2.14.0,<3.0.0)
Requires-Dist: grabba (>=0.0.14,<0.0.15)
Requires-Dist: pydantic-settings (>=2.9.1,<3.0.0)
Description-Content-Type: text/markdown

# Grabba MCP Server

This repository contains the Grabba Model Context Protocol (MCP) server, designed to expose Grabba API functionalities as a set of callable tools. Built on `FastMCP`, this server allows AI agents, orchestrators (like LangChain), and other applications to seamlessly interact with the Grabba data extraction and management services.

> **Recommended:** point your MCP client at the hosted instance at
> [`https://mcp.grabba.dev/`](https://mcp.grabba.dev/) — no install required.
> The rest of this README covers self-hosting (PyPI, Docker) for users who
> need their own deployment.

## Table of Contents

1.  [Features](#features)
2.  [Connecting to the Hosted Instance](#connecting-to-the-hosted-instance)
3.  [Self-Hosting](#self-hosting)
      * [Prerequisites](#prerequisites)
      * [Installation](#installation)
          * [Via PyPI](#via-pypi)
          * [From Source (Development)](#from-source-development)
      * [Running the Server](#running-the-server)
          * [Locally](#locally)
          * [Docker Container](#docker-container)
4.  [Configuration](#configuration)
      * [Environment Variables](#environment-variables)
      * [Command-Line Arguments](#command-line-arguments)
5.  [Available Tools](#available-tools)
      * [Authentication](#authentication)
      * [Tool Details](#tool-details)
6.  [Programmatic Clients](#programmatic-clients)
      * [Python Client (LangChain Example)](#python-client-langchain-example)
          * [Streamable HTTP Transport](#streamable-http-transport)
          * [Stdio Transport (for Docker-in-Docker or specific use cases)](#stdio-transport-for-docker-in-docker-or-specific-use-cases)
7.  [Development Notes](#development-notes)
      * [Project Structure](#project-structure)
      * [Running Tests](#running-tests)
8.  [Links & Resources](#links--resources)
9.  [License](#license)

-----

## Features

  * **Grabba API Exposure:** Exposes key Grabba API functionalities (data extraction, job management, statistics) as accessible tools.
  * **Multiple Transports:** Supports `stdio`, `streamable-http`, and `sse` transports, offering flexibility for different deployment and client scenarios.
  * **Dependency Injection:** Leverages FastAPI's robust dependency injection for secure and efficient `GrabbaService` initialization (e.g., handling API keys).
  * **Containerized Deployment:** Optimized for Docker for easy packaging and deployment.
  * **Configurable:** Allows configuration via environment variables and command-line arguments.

-----

## Connecting to the Hosted Instance

The Grabba MCP server is publicly available — most users do not need to install
anything.

  * **URL:** `https://mcp.grabba.dev/`
  * **Transports:** `streamable-http` (recommended), `sse` for legacy clients.
  * **Authentication:** include your Grabba API key as the `API_KEY` HTTP header.

### Claude Desktop

Edit `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS)
or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):

```json
{
  "mcpServers": {
    "grabba": {
      "type": "streamable-http",
      "url": "https://mcp.grabba.dev/",
      "headers": { "API_KEY": "gk_live_..." }
    }
  }
}
```

Restart Claude Desktop and Grabba's tools will appear in the MCP palette.

> **Older Claude Desktop builds** without native HTTP support can connect through
> the [`mcp-remote`](https://www.npmjs.com/package/mcp-remote) bridge:
>
> ```json
> {
>   "mcpServers": {
>     "grabba": {
>       "command": "npx",
>       "args": [
>         "-y", "mcp-remote",
>         "https://mcp.grabba.dev/",
>         "--header", "API_KEY:gk_live_..."
>       ]
>     }
>   }
> }
> ```

### Cursor

Settings → MCP → User:

```json
{
  "mcpServers": {
    "grabba": {
      "url": "https://mcp.grabba.dev/",
      "headers": { "API_KEY": "gk_live_..." }
    }
  }
}
```

### Smoke test

```bash
curl -i \
  -H "API_KEY: gk_live_..." \
  https://mcp.grabba.dev/
```

A `200 OK` (or `406 Not Acceptable` from the transport handshake) confirms the
endpoint is reachable and your key was accepted at the edge.

-----

## Self-Hosting

For air-gapped environments, CI runners, or anyone who'd rather not depend on
the public endpoint, the same server is published as a Python package
(`grabba`) and a Docker image (`itsobaa/grabba-mcp`).

### Prerequisites

  * Python 3.10+
  * Docker (for containerized deployment)
  * A Grabba API Key (you can get one from the [Grabba website](https://www.grabba.dev/))

### Installation

#### Via PyPI

The `grabba-mcp` package is available on PyPI. This is the simplest way to get started.

```bash
pip install grabba-mcp
```

#### From Source (Development)

If you plan to contribute or modify the server, you'll want to install from source.

1.  **Clone the repository:**

    ```bash
    git clone https://github.com/grabba-dev/grabba-mcp
    cd grabba-mcp
    ```

2.  **Install Poetry:**
    If you don't have Poetry installed, follow their official guide:

    ```bash
    pip install poetry
    ```

3.  **Install project dependencies:**
    Navigate to the `apps/mcp` directory where `pyproject.toml` resides, then install:

    ```bash
    cd apps/mcp
    poetry install
    ```

### Running the Server

#### Locally

After installation (either via `pip` or from source), you can run the server.

1.  **Create a `.env` file:**
    In the `apps/mcp` directory (if running from source) or the directory from which you'll execute the `grabba-mcp` command, create a `.env` file and add your Grabba API key:

    ```dotenv
    API_KEY="YOUR_API_KEY_HERE"
    # Optional: configure the server port
    PORT=8283
    # Optional: configure the default transport (overridden by CLI)
    MCP_SERVER_TRANSPORT="streamable-http"
    ```

2.  **Execute the server:**

      * **If installed via `pip`:**

        ```bash
        grabba-mcp
        ```

        To specify a transport via command line:

        ```bash
        grabba-mcp streamable-http
        ```

      * **If running from source (using Poetry):**

        ```bash
        cd apps/mcp
        poetry run python src/server.py
        ```

        To specify a transport via command line:

        ```bash
        poetry run python src/server.py stdio
        ```

    You should see output indicating the server is starting and listening on the specified port (e.g., `http://0.0.0.0:8283`) if using HTTP transports. Note that the `stdio` transport will exit after a single request/response cycle, making it unsuitable for persistent services.

#### Docker Container

A pre-built Docker image is available on Docker Hub, making deployment straightforward.

1.  **Pull the image:**

    ```bash
    docker pull itsobaa/grabba-mcp:latest
    ```

2.  **Run the container:**
    For a persistent server, you'll typically use the `streamable-http` transport and map ports.

    ```bash
    docker run -d \
      -p 8283:8283 \
      -e API_KEY="YOUR_API_KEY_HERE" \
      -e MCP_SERVER_TRANSPORT="streamable-http" \
      itsobaa/grabba-mcp:latest
    ```

    You can also use `docker-compose` for more complex setups:

    ```yaml
    # docker-compose.yml
    version: '3.8'
    services:
      grabba-mcp:
        image: itsobaa/grabba-mcp:latest
        container_name: grabba-mcp
        environment:
          API_KEY: ${API_KEY} # Reads from a .env file next to docker-compose.yml
          MCP_SERVER_TRANSPORT: streamable-http
          PORT: 8283
        ports:
          - "8283:8283"
        healthcheck:
          test: ["CMD-SHELL", "curl -f http://localhost:8283/tools/openapi.json || exit 1"]
          interval: 10s
          timeout: 5s
          retries: 5
    ```

    With a `docker-compose.yml` file, create a `.env` file next to it (e.g., `API_KEY="YOUR_API_KEY_HERE"`) and run:

    ```bash
    docker-compose up -d
    ```

-----

## Configuration

The server can be configured via environment variables and command-line arguments.

### Environment Variables

  * **`API_KEY`** (Required): Your Grabba API key. This is critical for authenticating with Grabba services.
  * **`PORT`** (Optional, default: `8283`): The port on which the MCP server's HTTP transports (`streamable-http`, `sse`) will listen.
  * **`MCP_SERVER_TRANSPORT`** (Optional, default: `stdio`): The default transport protocol for the MCP server. Can be `stdio`, `streamable-http`, or `sse`.

### Command-Line Arguments

The server also accepts a single positional command-line argument which overrides `MCP_SERVER_TRANSPORT`:

```bash
grabba-mcp [transport_protocol]
# or for source: python src/server.py [transport_protocol]
```

  * `[transport_protocol]`: Can be `stdio`, `streamable-http`, or `sse`.
      * Example: `grabba-mcp streamable-http`

-----

## Available Tools

The Grabba MCP Server exposes a suite of tools that wrap the Grabba Python SDK functionalities.

### Authentication

For `streamable-http` and `sse` transports, authentication is performed by including an **`API_KEY`** HTTP header with your Grabba API Key.
Example: `API_KEY: YOUR_API_KEY_HERE`

For `stdio` transport, the **`API_KEY`** environment variable must be set in the environment where the `grabba-mcp` command is executed, as there are no HTTP headers in this communication mode.

### Tool Details

#### `extract_data`

  * **Description:** Schedules a new data extraction job with Grabba. Suitable for web search tasks.
  * **Input:** `Job` object (Pydantic model) detailing the extraction tasks.
  * **Output:** `tuple[str, Optional[Dict]]` - A message and the `JobResult` as a dictionary.

#### `schedule_existing_job`

  * **Description:** Schedules an existing Grabba job to run immediately.
  * **Input:** `job_id` (string) - The ID of the existing job.
  * **Output:** `tuple[str, Optional[Dict]]` - A message and the `JobResult` as a dictionary.

#### `wait_for_job_completion`

  * **Description:** Waits for a job to reach a terminal state (`completed`, `failed`, or `cancelled`) using SSE first and polling fallback.
  * **Input:** `job_id` (string), optional `timeout_seconds` (int, default `240`), optional `poll_interval_seconds` (int, default `5`).
  * **Output:** `tuple[str, Dict]` - A message and a status payload:
    * `job_id`
    * `status`
    * `completed`
    * `timed_out`
    * `source` (`sse`, `fetch_specific_job`, `timeout`)
    * `reason`
    * optional `job_result_id`

#### `fetch_all_jobs`

  * **Description:** Fetches all Grabba jobs for the current user.
  * **Input:** None.
  * **Output:** `tuple[str, Optional[List[Job]]]` - A message and a list of `Job` objects.

#### `fetch_specific_job`

  * **Description:** Fetches details of a specific Grabba job by its ID.
  * **Input:** `job_id` (string) - The ID of the job.
  * **Output:** `tuple[str, Optional[Job]]` - A message and the `Job` object.

#### `delete_job`

  * **Description:** Deletes a specific Grabba job.
  * **Input:** `job_id` (string) - The ID of the job to delete.
  * **Output:** `tuple[str, None]` - A success message.

#### `fetch_job_result`

  * **Description:** Fetches results of a completed Grabba job by its result ID.
  * **Input:** `job_result_id` (string) - The ID of the job result.
  * **Output:** `tuple[str, Optional[Dict]]` - A message and the job result data as a dictionary.

#### `delete_job_result`

  * **Description:** Deletes results of a completed Grabba job.
  * **Input:** `job_result_id` (string) - The ID of the job result to delete.
  * **Output:** `tuple[str, None]` - A success message.

#### `fetch_stats_data`

  * **Description:** Fetches usage statistics and current user token balance for Grabba.
  * **Input:** None.
  * **Output:** `tuple[str, Optional[JobStats]]` - A message and the `JobStats` object.

#### `estimate_job_cost`

  * **Description:** Estimates the cost of a Grabba job before creation or scheduling.
  * **Input:** `Job` object (Pydantic model) detailing the extraction tasks.
  * **Output:** `tuple[str, Optional[Dict]]` - A message and the estimated cost details as a dictionary.

#### `create_job`

  * **Description:** Creates a new data extraction job in Grabba without immediately scheduling it for execution.
  * **Input:** `Job` object (Pydantic model) detailing the extraction tasks.
  * **Output:** `tuple[str, Optional[Job]]` - A message and the created `Job` object.

#### `fetch_available_regions`

  * **Description:** Fetches a list of all available puppet (web agent) regions that can be used for scheduling web data extractions.
  * **Input:** None.
  * **Output:** `tuple[str, Optional[List[PuppetRegion]]]` - A message and a list of `PuppetRegion` objects.

-----

## Programmatic Clients

If you're building your own agent runtime (LangChain, custom Python orchestrator,
etc.) you can talk to either the hosted instance or your self-hosted server with
the same client config — just change the `url`.

### Python Client (LangChain Example)

This example assumes you have the `mcp-client` package installed (often as part of a larger LangChain/Agent setup), along with `grabba` and `pydantic`.

```python
import asyncio
import os
from typing import List, Dict, Optional
from langchain_core.tools import BaseTool, Tool
from mcp.models.mcp_server_config import McpServerConfig, McpServer
from mcp.client.transports.streamable_http import StreamableHttpConnection
from mcp.client.transports.stdio import StdioConnection
from mcp.client.multi_server_client import MultiServerMCPClient
from grabba import Job, JobStats, PuppetRegion # Import necessary Grabba Pydantic models
from dotenv import load_dotenv # For loading API key from .env

async def connect_and_use_mcp_tools(mcp_server_configs: List[McpServerConfig], api_key: Optional[str] = None) -> List[Tool]:
    """
    Connects to the MCP server(s), discovers its tools, and wraps them as LangChain Tools.
    Handles API key injection for HTTP connections.
    """
    try:
        mcp_client_config = {}
        for config in mcp_server_configs:
            # Pydantic V2 model validation
            mcp_server_model = McpServer.model_validate(config.mcp_server.model_dump())
            
            connection_headers = {}
            if api_key:
                # Use standard header name for API keys
                connection_headers["API_KEY"] = api_key 

            if mcp_server_model.transport == "streamable_http":
                server_params: StreamableHttpConnection = {
                    "transport": "streamable_http",
                    "url": str(mcp_server_model.url),
                    "env": config.env_variables or {}, # For other env variables, if any
                    "headers": connection_headers # Pass headers for HTTP transports
                }
            elif mcp_server_model.transport == "stdio":
                server_params: StdioConnection = {
                    "transport": "stdio",
                    "command": mcp_server_model.command, 
                    "args": mcp_server_model.args, 
                    "env": config.env_variables # For stdio, env maps to subprocess env vars
                }
            else:
                raise ValueError(f"Unsupported transport: {mcp_server_model.transport}")

            print(f"Client connecting with params: {server_params}")
            mcp_client_config[mcp_server_model.name] = server_params
        
        mcp_client = MultiServerMCPClient(mcp_client_config)
        tools: List[BaseTool] = await mcp_client.get_tools()
        print(f"Successfully loaded {len(tools)} tools.")
        return tools
    except Exception as e:
        print(f"Error connecting to MCP server or loading tools: {e}")
        return []


async def main():
    load_dotenv() # Load API key from a client-side .env file
    API_KEY = os.getenv("API_KEY", "YOUR_API_KEY_HERE_IF_NOT_ENV")

    # --- Configuration for Streamable HTTP Transport (Local or Public Instance) ---
    # For local: url="http://localhost:8283"
    # For public: url="https://mcp.grabba.dev/"
    http_mcp_config = McpServerConfig(
        mcp_server=McpServer(
            name="grabba-agent-http",
            transport="streamable_http",
            url="http://localhost:8283" # Or "https://mcp.grabba.dev/" for public
        )
    )

    print("\n--- Connecting via Streamable HTTP ---")
    http_tools = await connect_and_use_mcp_tools(
        mcp_server_configs=[http_mcp_config],
        api_key=API_KEY
    )

    if http_tools:
        print("\nAvailable HTTP Tools:")
        for tool in http_tools:
            print(f"- {tool.name}: {tool.description.split('.')[0]}.")
        
        # Example: Using the extract_data tool (adjust as per your Job Pydantic model)
        extract_tool = next((t for t in http_tools if t.name == "extract_data"), None)
        if extract_tool:
            print("\n--- Testing extract_data tool via HTTP ---")
            sample_job = Job(
                url="https://example.com/some-page",
                type="markdown", # or "pdf", "html" etc.
                parser="text-content",
                strategy="auto"
                # ... other required fields for Job
            )
            try:
                result_msg, result_data = await extract_tool.ainvoke({"extraction_data": sample_job})
                print(f"Extraction Result (HTTP): {result_msg}")
                if result_data:
                    print(f"Extraction Data (HTTP): {result_data.get('extracted_text', 'No text extracted')[:100]}...") # Print first 100 chars
            except Exception as e:
                print(f"Error calling extract_data via HTTP: {e}")
        else:
            print("extract_data tool not found in HTTP tools.")

        # Example: Using fetch_all_jobs tool
        fetch_jobs_tool = next((t for t in http_tools if t.name == "fetch_all_jobs"), None)
        if fetch_jobs_tool:
            print("\n--- Testing fetch_all_jobs tool via HTTP ---")
            try:
                result_msg, jobs_list = await fetch_jobs_tool.ainvoke({})
                print(f"Fetch Jobs Result (HTTP): {result_msg}")
                if jobs_list:
                    print(f"Fetched {len(jobs_list)} jobs.")
                    for job in jobs_list[:2]: # Print first 2 jobs
                        print(f"  - Job ID: {job.job_id}, URL: {job.url}")
            except Exception as e:
                print(f"Error calling fetch_all_jobs via HTTP: {e}")
        
        # Example: Using fetch_stats_data tool
        fetch_stats_tool = next((t for t in http_tools if t.name == "fetch_stats_data"), None)
        if fetch_stats_tool:
            print("\n--- Testing fetch_stats_data tool via HTTP ---")
            try:
                result_msg, stats_data = await fetch_stats_tool.ainvoke({})
                print(f"Fetch Stats Result (HTTP): {result_msg}")
                if stats_data:
                    print(f"Token Balance (HTTP): {stats_data.token_balance}")
                    print(f"Jobs Run (HTTP): {stats_data.jobs_run_count}")
            except Exception as e:
                print(f"Error calling fetch_stats_data via HTTP: {e}")

    # --- Configuration for Stdio Transport (e.g., to a Docker container running the server) ---
    # This assumes you have the 'itsobaa/grabba-mcp:latest' Docker image available.
    # The client launches a temporary Docker container for each tool call.
    stdio_mcp_config = McpServerConfig(
        mcp_server=McpServer(
            name="grabba-agent-stdio",
            transport="stdio",
            command="docker",
            args=[
                "run",
                "-i",          # Keep STDIN open for interactive communication
                "--rm",        # Remove container after exit
                "itsobaa/grabba-mcp:latest", # The Docker Hub image for Grabba MCP server
                "grabba-mcp", "stdio" # Command to run the server in stdio mode inside container
            ],
            env_variables={"API_KEY": API_KEY} # Pass API key as env var for stdio
        )
    )

    print("\n--- Connecting via Stdio (to Docker container as a subprocess) ---")
    stdio_tools = await connect_and_use_mcp_tools(
        mcp_server_configs=[stdio_mcp_config],
        api_key=API_KEY # Client might still pass for internal consistency, though env_variables is primary for stdio
    )

    if stdio_tools:
        print("\nAvailable Stdio Tools:")
        for tool in stdio_tools:
            print(f"- {tool.name}: {tool.description.split('.')[0]}.")
        
        # Example: Using the fetch_available_regions tool via Stdio
        fetch_regions_tool = next((t for t in stdio_tools if t.name == "fetch_available_regions"), None)
        if fetch_regions_tool:
            print("\n--- Testing fetch_available_regions tool via Stdio ---")
            try:
                result_msg, regions_list = await fetch_regions_tool.ainvoke({})
                print(f"Fetch Regions Result (Stdio): {result_msg}")
                if regions_list:
                    print(f"Fetched {len(regions_list)} regions.")
                    for region in regions_list[:3]: # Print first 3 regions
                        print(f"  - {region.display_name} ({region.code})")
            except Exception as e:
                print(f"Error calling fetch_available_regions via Stdio: {e}")
        else:
            print("fetch_available_regions tool not found in Stdio tools.")

if __name__ == "__main__":
    asyncio.run(main())
```

-----

## Development Notes

### Project Structure

```
your_project_root/
├── src/
│   └── server.py               # Main FastMCP server application
├── .env                        # Environment variables for local development
├── pyproject.toml              # Poetry project configuration
└── poetry.lock                 # Poetry dependency lock file
├── Dockerfile                  # Docker build instructions for the server
├── docker-compose.yml          # Docker Compose configuration for local development/deployment
├── .dockerignore               # Files to ignore during Docker build
├── .env                        # Example .env for docker-compose (for API_KEY)
├── README.md                   # This documentation file
├── pyproject.toml              # Root pyproject.toml (if using monorepo structure)
├── poetry.lock                 # Root poetry.lock (if using monorepo structure)
├── src/                        # Source code (often for the root project if it's a monorepo)
├── tests/                      # Project tests
└── ... (other project files like dist, docs, tox.ini, project.json etc.)
```

### Running Tests

To run tests (as configured by your `pyproject.toml`):

```bash
poetry run pytest
```

-----

## Links & Resources

  * **Grabba Website:** [https://www.grabba.dev/](https://www.grabba.dev/)
  * **Grabba MCP Server Public Instance:** [https://mcp.grabba.dev/](https://mcp.grabba.dev/)
  * **GitHub Repository:** [https://github.com/grabba-dev/grabba-mcp](https://github.com/grabba-dev/grabba-mcp)
  * **Docker Hub Image:** [https://hub.docker.com/r/itsobaa/grabba-mcp](https://hub.docker.com/r/itsobaa/grabba-mcp)
  * **PyPI Package:** [https://pypi.org/project/grabba-mcp/](https://pypi.org/project/grabba-mcp/)

-----

## License

This project is licensed under the Proprietary License. Please see the `LICENSE` file in the repository root for full details.

-----
