Metadata-Version: 2.4
Name: omniscout
Version: 0.2.1
Summary: OmniScout CLI (harness): local-first browser automation, semantic search, and research for AI agents
Project-URL: Homepage, https://github.com/sriramramnath/omniscout
Project-URL: Repository, https://github.com/sriramramnath/omniscout
Project-URL: Issues, https://github.com/sriramramnath/omniscout/issues
Author: OmniScout
License: Modified MIT License
        
        Copyright (c) 2026 OmniScout
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
        Our only modification part is that, if the Software (or any derivative works
        thereof) is used for any of your products or services, you shall prominently
        display "Powered by OmniScout" on the user interface of such product or
        service.
License-File: LICENSE
Keywords: agent,cli,playwright,research,scraping,search
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Requires-Dist: aiohttp>=3.9
Requires-Dist: httpx>=0.27
Requires-Dist: markdownify>=0.13
Requires-Dist: nltk>=3.8
Requires-Dist: platformdirs>=4.2
Requires-Dist: playwright>=1.45
Requires-Dist: pydantic>=2.7
Requires-Dist: qdrant-client>=1.9
Requires-Dist: rich>=13.7
Requires-Dist: selectolax>=0.3.21
Requires-Dist: sentence-transformers>=2.7
Requires-Dist: sumy>=0.11
Requires-Dist: tomli>=2.0; python_version < '3.11'
Requires-Dist: trafilatura>=1.12
Requires-Dist: typer>=0.12
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: respx>=0.21; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Description-Content-Type: text/markdown

# OmniScout CLI

Local-first browser automation, semantic search, and research for AI agents.
No cloud APIs, no hosted browser sessions, no MCP yet, no SDK.

The CLI is the interface.

## Install

Requires Python 3.11+ and Google Chrome (already installed on most macOS
machines at `/Applications/Google Chrome.app`).

### Recommended: install as a global tool

```bash
pip install omniscout            # verifies Chrome + prefetches embedding model
```

After this, `scout` works from any directory. Edits to source files are
picked up live (editable install).

`omniscout` remain available as compatibility aliases.


If you don't have Chrome installed, add `--bundled` to also download
Playwright's bundled Chromium (~190MB).

`scout install` also prefetches the local sentence-transformers model into
OmniScout's app data directory so later commands do not need to fetch it again.
Use `--no-model` to skip model prefetch.

## Quickstart

```bash
# Search the web (DuckDuckGo HTML + local embedding rerank)
scout search "local-first browser agents"
# same command via alias:
scout search "local-first browser agents"

# Extract a URL to clean Markdown
scout extract https://example.com

# Capture a screenshot of a real page using your installed Chrome
scout browser screenshot https://example.com --out page.png

# Run a multi-step research pipeline (search -> crawl -> extract -> rerank -> summarize)
scout research "state of local AI agents in 2026"

# Manage persistent browser profiles (cookies, logins persist across runs)
scout profile create work
scout browser open https://news.ycombinator.com --profile work --headful

# Long-lived browser sessions (other tools can attach via CDP)
scout session start --headful
scout session list
scout session kill --all
```

## JSON output (for agents)

Every command emits structured JSON when invoked with `--json` (or with
`OMNISCOUT_JSON=1` in the environment). Logs always go to stderr; stdout is
reserved for the structured result.

```bash
OMNISCOUT_JSON=1 scout search "robotics simulators" --limit 5
```

## Architecture

```
omniscout/
  app.py              # Typer root
  commands/           # CLI sub-commands (thin)
  engines/
    browser.py        # Playwright + system Chrome
    extractor.py      # trafilatura + markdownify
    crawler.py        # async httpx + Chrome fallback
    search/
      ddg.py          # DuckDuckGo HTML
      embed.py        # sentence-transformers (all-MiniLM-L6-v2)
      index.py        # embedded Qdrant on-disk
      rerank.py       # cosine rerank
      pipeline.py     # ddg | index | hybrid
    research.py       # full pipeline (search -> crawl -> extract -> rerank -> summarize)
  store/
    cache.py          # SQLite + content-hashed HTML cache
    sessions.py       # SQLite registry of browser sessions
  models.py           # pydantic result types (the JSON contract)
```

On-disk state lives under `~/Library/Application Support/omniscout/` (macOS) /
`$XDG_DATA_HOME/omniscout/` (Linux):

| Path             | Purpose                                            |
| ---------------- | -------------------------------------------------- |
| `profiles/`      | Persistent Chrome user-data-dirs                   |
| `qdrant/`        | Embedded vector index                              |
| `sessions.sqlite`| Registry of long-lived browser sessions            |
| `cache/pages/`   | Content-hashed HTML cache used by extract+crawler |

Override via `OMNISCOUT_DATA_DIR`, `OMNISCOUT_CONFIG_DIR`, `OMNISCOUT_CACHE_DIR`,
or settings in `~/Library/Application Support/omniscout/config.toml`.

## Configuration

`config.toml` example:

```toml
default_source = "ddg"           # search source default
search_limit = 10
research_results = 8
request_throttle_seconds = 1.0   # per-host throttle in the crawler
embedding_model = "sentence-transformers/all-MiniLM-L6-v2"
embedding_local_only = true         # default; never fetch model files at query time
browser_channel = "chrome"       # uses installed Google Chrome
# browser_executable = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
summary_sentences = 6
```

Set `OMNISCOUT_EMBED_LOCAL_ONLY=0` to allow runtime Hugging Face fetches.

\
## Why local Chrome?

Using your system Chrome (channel = "chrome") gives you:

- Real cookies, login state, extensions, and font rendering
- No extra ~190MB Chromium download
- The same user-agent fingerprint as your daily browsing
- Cleaner integration with `omniscout session start` for long-lived sessions
  that other tools can attach to over CDP

If Chrome isn't available, the engine transparently falls back to Playwright's
bundled Chromium.
