Metadata-Version: 2.4
Name: crawlnest-mcp
Version: 0.7.1
Summary: CrawlNest MCP server for web scraping, Twitter, Reddit, GitHub, Hacker News, YC directory, Product Hunt, Google Play Store, Apple App Store, video download, image download, and transcription. Use with Claude Code, Claude Desktop, Cursor, and other MCP clients.
Author-email: WonderCrafts <support@crawlnest.com>
License: MIT
Project-URL: Homepage, https://crawlnest.com
Project-URL: Documentation, https://docs.crawlnest.com
Project-URL: Repository, https://github.com/WonderCrafts/CrawlNest
Project-URL: Issues, https://github.com/WonderCrafts/CrawlNest/issues
Keywords: mcp,model-context-protocol,web-scraping,twitter,reddit,claude-code,claude-desktop,anthropic,scraping,crawler,anti-bot,data-extraction,github,hackernews,ycombinator,video-download,image-download,transcription,google-play-store,apple-app-store,app-store,product-hunt,producthunt
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Requires-Python: <3.14,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastmcp>=0.1.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: httpx>=0.24.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: build>=1.0.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Dynamic: license-file

# crawlnest-mcp

MCP server for web scraping, Twitter, and Reddit — 17 tools for any MCP-compatible LLM client.

Works with Claude Code, Claude Desktop, Cursor, Windsurf, and any other MCP client.

## Quick Start

```bash
pip install crawlnest-mcp
```

Then configure your client with two env vars:

| Variable | Description |
|----------|-------------|
| `CRAWLNEST_API_URL` | Your CrawlNest API endpoint (e.g., `https://api.crawlnest.com` or `http://100.x.x.x:8000` for Tailscale) |
| `CRAWLNEST_API_KEY` | Your API key (starts with `cn_...`) |

That's it. No SSH, no scripts, no cloning repos.

## Client Configuration

### Claude Code

```bash
claude mcp add crawlnest \
  -e CRAWLNEST_API_URL=https://api.crawlnest.com \
  -e CRAWLNEST_API_KEY=cn_your_key \
  -- crawlnest-mcp
```

Or add to `~/.claude.json` manually:

```json
{
  "mcpServers": {
    "crawlnest": {
      "command": "crawlnest-mcp",
      "env": {
        "CRAWLNEST_API_URL": "https://api.crawlnest.com",
        "CRAWLNEST_API_KEY": "cn_your_key"
      }
    }
  }
}
```

### Claude Desktop

Add to your config file:

| Platform | Config Path |
|----------|------------|
| macOS | `~/Library/Application Support/Claude/claude_desktop_config.json` |
| Windows | `%APPDATA%\Claude\claude_desktop_config.json` |
| Linux | `~/.config/Claude/claude_desktop_config.json` |

```json
{
  "mcpServers": {
    "crawlnest": {
      "command": "crawlnest-mcp",
      "env": {
        "CRAWLNEST_API_URL": "https://api.crawlnest.com",
        "CRAWLNEST_API_KEY": "cn_your_key"
      }
    }
  }
}
```

Restart Claude Desktop after saving.

### Cursor

Add to `~/.cursor/mcp.json`:

```json
{
  "mcpServers": {
    "crawlnest": {
      "command": "crawlnest-mcp",
      "env": {
        "CRAWLNEST_API_URL": "https://api.crawlnest.com",
        "CRAWLNEST_API_KEY": "cn_your_key"
      }
    }
  }
}
```

### Windsurf

Add to `~/.codeium/windsurf/mcp_config.json`:

```json
{
  "mcpServers": {
    "crawlnest": {
      "command": "crawlnest-mcp",
      "env": {
        "CRAWLNEST_API_URL": "https://api.crawlnest.com",
        "CRAWLNEST_API_KEY": "cn_your_key"
      }
    }
  }
}
```

### Self-Hosted / Tailscale / Proxmox

Point `CRAWLNEST_API_URL` at your instance:

```json
{
  "mcpServers": {
    "crawlnest": {
      "command": "crawlnest-mcp",
      "env": {
        "CRAWLNEST_API_URL": "http://100.64.0.5:8000",
        "CRAWLNEST_API_KEY": "cn_your_key"
      }
    }
  }
}
```

Replace `100.64.0.5` with your Tailscale IP (`tailscale ip` on the server).

## Tools (17)

### Web Scraping

| Tool | Description | Key Parameters |
|------|-------------|----------------|
| `scrape_url` | Scrape a single page with anti-bot bypass | `url` |
| `crawl_website` | Crawl up to 50 pages with sitemap discovery | `url`, `max_pages` |
| `extract_structured` | Extract structured data with custom schema | `url`, `schema`, `output_format` |

### Twitter

| Tool | Description | Credentials |
|------|-------------|-------------|
| `fetch_tweet` | Fetch tweet by URL (3-layer cascade) | No |
| `twitter_user_info` | User profile info | No |
| `twitter_search` | Search tweets by query | Required |
| `twitter_trending` | Trending topics (local/global) | Required |
| `twitter_news` | News timeline | Required |
| `twitter_for_you` | Explore timeline | Required |
| `twitter_home_timeline` | Home feed (for_you/following) | Required |

### Reddit

| Tool | Description | Credentials |
|------|-------------|-------------|
| `reddit_post` | Fetch post + comment tree | No |
| `reddit_community` | Subreddit post listings | No |
| `reddit_about` | Subreddit metadata | No |
| `reddit_search` | Search across Reddit | No |
| `reddit_explore` | Discover subreddits | No |
| `reddit_popular` | Trending posts (with geo filter) | No |
| `reddit_feed` | Personalized home/news feed | Only with `use_cookies=true` |

## Social Media Credentials

Twitter/Reddit tools that need authentication use server-side credentials — not passed through MCP. Set them once via the API.

### Twitter (for search, trending, timelines)

Get `auth_token` and `ct0` from x.com cookies (DevTools → Application → Cookies):

```bash
curl -X POST $CRAWLNEST_API_URL/api/twitter/credentials \
  -H "X-API-Key: cn_your_key" \
  -H "Content-Type: application/json" \
  -d '{"auth_token": "YOUR_AUTH_TOKEN", "ct0": "YOUR_CT0"}'
```

### Reddit (for personalized feeds only)

Get `reddit_session` from reddit.com cookies:

```bash
curl -X POST $CRAWLNEST_API_URL/api/subreddit/credentials \
  -H "X-API-Key: cn_your_key" \
  -H "Content-Type: application/json" \
  -d '{"reddit_session": "YOUR_SESSION"}'
```

## Usage Examples

```
> Scrape https://news.ycombinator.com and show the top stories

> Extract product name, price, and image from https://example.com/products as CSV

> Crawl https://anthropic.com/news with max 5 pages

> Get the latest tweet from https://x.com/anthropic/status/123456

> What's trending on Twitter right now?

> Show me the top posts from r/technology this week

> Search Reddit for "machine learning" in r/programming
```

## Troubleshooting

### "Command not found: crawlnest-mcp"

```bash
pip install --force-reinstall crawlnest-mcp
which crawlnest-mcp
```

### "CrawlNest API not reachable"

```bash
curl -H "X-API-Key: cn_your_key" $CRAWLNEST_API_URL/health
```

### "Twitter credentials missing"

Save credentials first (see [Social Media Credentials](#social-media-credentials)).

### Tools not showing

1. Restart your client
2. Check config JSON syntax: `cat config.json | python3 -m json.tool`
3. Check logs: `tail -f /tmp/mcp_server.log`

## Self-Hosting

```bash
git clone https://github.com/WonderCrafts/CrawlNest
cd CrawlNest
make setup && make infra && make db-apply && make dev
```

Then use `http://localhost:8000` as your `CRAWLNEST_API_URL`.

## License

MIT

## Links

- [GitHub](https://github.com/WonderCrafts/CrawlNest)
- [Issues](https://github.com/WonderCrafts/CrawlNest/issues)
- [Full MCP Setup Guide](https://github.com/WonderCrafts/CrawlNest/blob/main/docs/MCP_SETUP.md)
