Metadata-Version: 2.4
Name: omniscout
Version: 0.2.2
Summary: OmniScout CLI: local-first browser automation, semantic search, and research for AI agents
Project-URL: Homepage, https://github.com/sriramramnath/omniscout
Project-URL: Repository, https://github.com/sriramramnath/omniscout
Project-URL: Issues, https://github.com/sriramramnath/omniscout/issues
Author: OmniScout
License: Modified MIT License
        
        Copyright (c) 2026 OmniScout
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
        Our only modification part is that, if the Software (or any derivative works
        thereof) is used for any of your products or services, you shall prominently
        display "Powered by OmniScout" on the user interface of such product or
        service.
License-File: LICENSE
Keywords: agent,cli,playwright,research,scraping,search
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Requires-Dist: aiohttp>=3.9
Requires-Dist: httpx>=0.27
Requires-Dist: markdownify>=0.13
Requires-Dist: nltk>=3.8
Requires-Dist: platformdirs>=4.2
Requires-Dist: playwright>=1.45
Requires-Dist: pydantic>=2.7
Requires-Dist: qdrant-client>=1.9
Requires-Dist: rich>=13.7
Requires-Dist: selectolax>=0.3.21
Requires-Dist: sentence-transformers>=2.7
Requires-Dist: sumy>=0.11
Requires-Dist: tomli>=2.0; python_version < '3.11'
Requires-Dist: trafilatura>=1.12
Requires-Dist: typer>=0.12
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: respx>=0.21; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Description-Content-Type: text/markdown

# OmniScout

Local-first browser automation, semantic search, and research for AI agents.

No cloud APIs. No hosted browser sessions. No MCP yet. No SDK.

**The CLI is the interface.** Install the `omniscout` command and drive everything from the terminal or via JSON (`--json` / `OMNISCOUT_JSON=1`).

`scout` is a short alias. `harness` is a legacy dev alias kept for compatibility.

---

## Install

Requires Python 3.11+ and Google Chrome (macOS default:
`/Applications/Google Chrome.app`).

```bash
pip install omniscout
omniscout install                    # verify Chrome + prefetch embedding model (~80MB)
omniscout install --skill            # optional: drop agent skill into Claude/Cursor/Codex
```

If Chrome is not installed, add `--bundled` to download Playwright Chromium (~190MB).

Search commands auto-start the local daemon and keep the embedding model loaded in
RAM across invocations — no manual warm-up step required.

---

## Features

### Browser automation (daemon-backed)

Long-lived daemon at `127.0.0.1:7720` for sub-second per-action latency.

- **Playwright backend** (default) — local Chrome with persistent profiles
- **Chrome extension backend** (opt-in) — drives your real running Chrome via
  `chrome.debugger`; same JSON vocabulary, real cookies and logins
- Atomic actions: `navigate`, `snapshot`, `click`, `fill`, `scroll`, `key`,
  `hover`, `screenshot`, `pdf`, `eval`, `wait`, tabs, `network`, `upload`,
  `login`, `captcha`, `close`
- Stable `@eN` refs from the accessibility tree (preferred over CSS selectors)
- Persistent profiles — log in once, stay logged in
- CAPTCHA: local-first manual handoff; optional 2captcha / capsolver solvers
- Network capture (CDP) with URL filters and per-request inspection
- Session restore across daemon restarts

### Semantic search

- DuckDuckGo HTML search with optional local embedding rerank
- Sources: `ddg`, `index` (local crawl corpus), `memory` (remembered visits),
  `hybrid` (memory + DDG)
- One-sentence answer mode: `--answer` with depths `auto`, `fast`, `balanced`, `deep`
- Answer caching with adaptive TTL for time-sensitive queries

### Warm embedding model

Search, research, and memory commands route embeddings through the daemon.
The sentence-transformers model (`all-MiniLM-L6-v2`) loads once (~2s) and stays
hot. `omniscout daemon status` reports `embed_model_loaded`.

### Content extraction

Fetch URLs to clean Markdown, plain text, or structured JSON via trafilatura +
markdownify. On-disk HTML cache.

### Research pipeline

Multi-step: search → crawl → extract → embed → rerank → summarize.

### Browser memory

Remember visits and notes; semantic search over your browsing history.

- `omniscout remember <url>` — visit, extract, index
- `omniscout memory list|show|note|delete|stats|clear`

### Workflow shortcuts

Top-level commands for agent ergonomics:

- `omniscout open <url|index>` — open URL or latest search result
- `omniscout snapshot`, `omniscout context`, `omniscout reset`
- `omniscout workflow export` — JSON steps from workflow state + action history

### Replay & observability

Every daemon action is logged to `$OMNISCOUT_DATA_DIR/daemon/actions.jsonl`:

- `omniscout daemon trace` — recent activity table or JSON
- `omniscout daemon replay <action_id>` — re-run a single action
- `omniscout daemon watch` — live SSE event stream
- Top-level `omniscout replay action-<id>` and `omniscout replay session-<name>`

### Benchmarks

- `omniscout benchmark answers` — latency + correctness matrix over answer modes
- `omniscout benchmark startup` — CLI process launch overhead

---

## Quickstart

```bash
# Search
omniscout search "local-first browser agents"
omniscout search "who is the president" --answer --depth balanced

# Extract
omniscout extract https://example.com

# Browser (daemon auto-starts)
omniscout browser navigate https://example.com
omniscout browser snapshot --refs-only
omniscout browser click '@e1'
omniscout browser screenshot --out /tmp/state.png
omniscout browser close --all

# Research
omniscout research "state of local AI agents in 2026"

# Profiles & sessions
omniscout profile create work
omniscout browser open https://news.ycombinator.com --profile work --headful
omniscout session start --headful
```

Optional warm-up before a batch of searches:

```bash
omniscout warmup
```

---

## JSON output (for agents)

Every command supports `--json`. Set `OMNISCOUT_JSON=1` to make JSON the default
for an entire shell session. Logs go to stderr; stdout is the structured result.

```bash
export OMNISCOUT_JSON=1
omniscout search "robotics simulators" --limit 5
omniscout browser navigate https://example.com --session demo
```

Direct HTTP (no CLI wrapper):

```bash
curl -s -X POST http://127.0.0.1:7720/command \
  -H 'Content-Type: application/json' \
  -d '{"action":"navigate","args":{"url":"https://example.com"},"session":"demo"}'
```

---

## Architecture

```
omniscout CLI ──HTTP POST /command──▶ omniscout daemon (127.0.0.1:7720)
     │                                      ├─ Playwright backend
     │                                      ├─ Extension backend (opt-in)
     │                                      └─ Embed service (warm model)
     └── Search / Extract / Research engines (local Qdrant + DDG)
```

Python package layout (for contributors — internal module name is still `harness`):

```
cli/harness/
  app.py              # Typer root (binary: omniscout)
  commands/           # CLI sub-commands
  daemon/             # HTTP server, backends, replay, events
  engines/            # browser, search, research, extractor, crawler
  store/              # SQLite cache, sessions, workflow, memory
  models.py           # pydantic JSON contract
```

---

## On-disk state

| Path | Purpose |
| --- | --- |
| `profiles/` | Persistent Chrome user-data-dirs |
| `qdrant/` | Embedded vector index |
| `models/sentence-transformers/` | Prefetched embedding model |
| `memory.sqlite` | Browser memory (visits + notes) |
| `sessions.sqlite` | Long-lived browser session registry |
| `cache/pages/` | Content-hashed HTML cache |
| `daemon/` | PID, port, logs, action history, session restore |

Default locations:

- macOS — `~/Library/Application Support/omniscout/`
- Linux — `~/.local/share/omniscout/`

Override with `OMNISCOUT_DATA_DIR`, `OMNISCOUT_CONFIG_DIR`, `OMNISCOUT_CACHE_DIR`.
Legacy `HARNESS_*` names are still accepted.

---

## Configuration

`config.toml` (in config dir):

```toml
default_source = "ddg"
search_limit = 10
research_results = 8
request_throttle_seconds = 1.0
embedding_model = "sentence-transformers/all-MiniLM-L6-v2"
embedding_local_only = true
browser_channel = "chrome"
summary_sentences = 6
```

## Environment variables

| Variable | Purpose |
| --- | --- |
| `OMNISCOUT_JSON=1` | Force JSON output on every command |
| `OMNISCOUT_EMBED_DAEMON=1` | Route embeds through daemon (default on) |
| `OMNISCOUT_DAEMON_AUTO_START=0` | Don't auto-start daemon |
| `OMNISCOUT_DAEMON_PORT` | Daemon port (default 7720) |
| `OMNISCOUT_DATA_DIR` | Override data directory |
| `OMNISCOUT_EMBED_LOCAL_ONLY=0` | Allow runtime Hugging Face fetches |
| `TWOCAPTCHA_API_KEY` | CAPTCHA solver API key |

Legacy `HARNESS_*` equivalents work for all of the above.

---

## Why local Chrome?

Using system Chrome gives you real cookies, login state, extensions, and the same
fingerprint as daily browsing — without a separate ~190MB Chromium download.
Falls back to Playwright Chromium automatically when Chrome is unavailable.

---

## License

Modified MIT — see [LICENSE](LICENSE). Products built on OmniScout must
prominently display **Powered by OmniScout** on the user interface.
