Metadata-Version: 2.3
Name: docduty-search
Version: 0.1.1
Summary: MCP server — web search, patent details, video transcripts, scholar profiles, and browser tools for Claude Code
Author: Fredrik Angelsen
Author-email: Fredrik Angelsen <fredrikangelsen@gmail.com>
Requires-Dist: fastmcp>=3.4.2
Requires-Dist: httpx>=0.28.1
Requires-Dist: beautifulsoup4>=4.15 ; extra == 'browser'
Requires-Dist: websockets>=16.0 ; extra == 'browser'
Requires-Dist: trafilatura>=2.1 ; extra == 'browser'
Requires-Dist: markitdown[pdf,docx,pptx,xlsx]>=0.1.6 ; extra == 'browser'
Requires-Dist: pypdfium2>=5.10 ; extra == 'browser'
Requires-Python: >=3.12
Provides-Extra: browser
Description-Content-Type: text/markdown

# docduty-search

MCP server that unifies web search, academic papers, videos, news, and patents behind a small set of composable tools — with optional Chrome browser integration for fetching JS-rendered pages, running JavaScript, parsing with Python, and taking screenshots.

## Install

```bash
uv tool install docduty-search            # search + details only
uv tool install "docduty-search[browser]" # + fetch, js, py, screenshot
```

## Setup

```bash
docduty-search setup
```

One command does everything:

1. Prompts for API keys → saves to `~/.config/docduty-search/.env`
2. Installs the research skill → `~/.claude/skills/docduty-search/SKILL.md`
3. Registers the MCP server with Claude Code (if `claude` CLI is found)

**API keys:**

- **SerpAPI key** (required) — [serpapi.com/manage-api-key](https://serpapi.com/manage-api-key)
- **Perplexity key** (optional) — [console.perplexity.ai](https://console.perplexity.ai) — enables domain-filtered search

## Tools

### `search`

Search across 7 source types.

```
search("svelte 5 runes")                                      # web
search("machine learning", type="scholar")                     # papers with citations
search("react tutorial", type="video")                         # YouTube
search("AI regulation EU", type="news")                        # news
search("piezoelectric transducer", type="patents")             # patents
search("svelte logo", type="images")                           # images
search("electric vehicles", type="trends")                     # Google Trends
```

| Parameter | Description |
|-----------|-------------|
| `query` | Search query (required) |
| `type` | `web` · `scholar` · `video` · `news` · `patents` · `images` · `trends` |
| `domains` | Filter by domain. Prefix with `-` to exclude: `["-reddit.com"]` |
| `language` | ISO 639-1 code: `"en"`, `"no"`, `"de"` |
| `recency` | `hour` · `day` · `week` · `month` · `year` |
| `num` | Max results (default 10) |

**Patent-specific filters:**

| Parameter | Description |
|-----------|-------------|
| `before` | `"priority:YYYYMMDD"` or `"publication:YYYYMMDD"` |
| `after` | Same format as `before` |
| `inventor` | Filter by inventor name |
| `assignee` | Filter by assignee/company |
| `country` | Country codes: `"US,WO,EP"` |
| `status` | `"GRANT"` or `"APPLICATION"` |
| `sort` | `"new"` or `"old"` (by filing date) |

**Per-type extras in results:**

| Type | Extra fields |
|------|-------------|
| `scholar` | `authors`, `cited_by`, `result_id`, `resources` (PDF links), `author_ids` |
| `video` | `duration`, `channel`, `views`, `video_id` |
| `news` | `source` |
| `patents` | `patent_id`, `pdf`, `filing_date`, `inventor`, `assignee` |

### `details`

Get structured details for a result. Returns a summary inline; full data (claims, articles, transcript text) goes to the `py()` namespace.

```
details("patent/US11734097B1/en")          # patent claims, citations, similar
details("scholar/6497879044063343659")     # scholar article via Google Patents
details("t-NybWd6Sz0J")                   # citation formats (MLA, APA, BibTeX)
details("video/dQw4w9WgXcQ")              # video metadata + description
details("transcript/dQw4w9WgXcQ")          # full transcript text
details("author/nTJ7ihUAAAAJ")            # author profile, h-index, articles
```

| ID format | Inline response | In py() namespace |
|-----------|----------------|-------------------|
| `patent/...` | title, abstract, dates, classifications, counts | `claims`, `patent_citations`, `cited_by`, `similar_documents` |
| `scholar/...` | same as patent | same as patent |
| `video/...` | title, channel, views, description_preview | `description` |
| `transcript/...` | chars, chapters | `full_text` |
| `author/...` | name, affiliations, h-index, articles_count | `articles`, `co_authors` |
| result_id | citation formats + export links | — |

### `fetch` (browser extra)

Navigate Chrome to a URL and extract content. Returns a summary with preview; full content available via `py()`.

```
fetch("https://svelte.dev/docs/svelte/$state")
→ { url, title, chars, content_preview }

py("content[:500]")    # full extracted markdown
py("soup.select('h2')") # BeautifulSoup on raw HTML
```

Handles Cloudflare challenge pages automatically (waits for real Chrome to pass the challenge). Also handles documents (PDF/DOCX/PPTX/XLSX) via Content-Type detection and MarkItDown conversion.

### `js` (browser extra)

Run JavaScript on the current page.

```
js("document.title")
js("[...document.querySelectorAll('h2')].map(h => h.textContent)")
```

### `py` (browser extra)

Run Python with access to page content and all previous results.

```
py("len(content)")
py("[h.text for h in soup.select('article h2')]")
py("cache.clear()")
```

Available in namespace:
- `content` — clean extracted markdown from last `fetch()`
- `html` — raw HTML from last `fetch()`
- `soup` — BeautifulSoup parsed HTML
- `url` — current page URL
- `_` / `_1`..`_N` — result history across all tools
- `js_result` — last `js()` result
- `cache` — TTL cache, call `cache.clear()` to reset
- `claims`, `cited_by`, `full_text`, etc. — from last `details()` call

### `screenshot` (browser extra)

Capture the current page as PNG.

```
screenshot()                    # saves to temp file
screenshot("/tmp/page.png")    # saves to specific path
```

## Example: Patent Research

```
# Search with filters
search("MEMS transducer", type="patents", assignee="Murata", country="US,JP", status="GRANT")

# Get structured details — claims, citations, similar documents
details(_1[0]["patent_id"])
py("len(claims)")           # 55 claims
py("claims[0][:200]")       # first claim preview
py("len(cited_by)")         # 61 forward citations

# Find prior art (replicates Google Patents "Find Prior Art" button)
search(" ".join(_2["prior_art_keywords"]), type="patents",
       before=f"priority:{_2['prior_art_date'].replace('-','')}")

# Follow a citation chain
details(_2["cited_by"][0]["patent_id"])
```

## Example: Video Learning

```
# Find tutorials
search("svelte 5 runes tutorial", type="video", num=5)

# Get video details
details(f"video/{_1[0]['video_id']}")

# Get the full transcript
details(f"transcript/{_1[0]['video_id']}")
py("full_text[:500]")    # first 500 chars of spoken content
```

## Example: Scholar Research

```
# Search papers
search("piezoelectric micromachined ultrasonic transducer review", type="scholar")

# Get citation formats
details(_1[0]["result_id"])    # MLA, APA, Chicago, BibTeX

# Author profile
details(f"author/{_1[0]['author_ids'][0]}")
py("articles[:3]")             # top 3 papers
py("co_authors")               # collaborators
```

## Backends

| Backend | Used when | API key |
|---------|-----------|---------|
| SerpAPI | All search types + details | `SERPAPI_API_KEY` (required) |
| Perplexity | `type="web"` + `domains` set | `PERPLEXITY_API_KEY` (optional) |
| Chrome CDP | `fetch`, `js`, `py`, `screenshot` | None (uses local Chrome) |

## Development

```bash
git clone https://github.com/user/docduty-search
cd docduty-search
uv sync --extra browser
ruff format src/ && ruff check src/
```
