This file is a merged representation of a subset of the codebase, containing files not matching ignore patterns, combined into a single document by Repomix. The content has been processed where empty lines have been removed.

================================================================
File Summary
================================================================

Purpose:
--------
This file contains a packed representation of the entire repository's contents.
It is designed to be easily consumable by AI systems for analysis, code review,
or other automated processes.

File Format:
------------
The content is organized as follows:
1. This summary section
2. Repository information
3. Directory structure
4. Multiple file entries, each consisting of:
  a. A separator line (================)
  b. The file path (File: path/to/file)
  c. Another separator line
  d. The full contents of the file
  e. A blank line

Usage Guidelines:
-----------------
- This file should be treated as read-only. Any changes should be made to the
  original repository files, not this packed version.
- When processing this file, use the file path to distinguish
  between different files in the repository.
- Be aware that this file may contain sensitive information. Handle it with
  the same level of security as you would the original repository.

Notes:
------
- Some files may have been excluded based on .gitignore rules and Repomix's configuration
- Binary files are not included in this packed representation. Please refer to the Repository Structure section for a complete list of file paths, including binary files
- Files matching these patterns are excluded: .specstory/**/*.md, .venv/**, _private/**, CLEANUP.txt, **/*.json, *.lock
- Files matching patterns in .gitignore are excluded
- Files matching default ignore patterns are excluded
- Empty lines have been removed from all files

Additional Info:
----------------

================================================================
Directory Structure
================================================================
.cursor/
  rules/
    0project.mdc
    cleanup.mdc
    filetree.mdc
.github/
  workflows/
    push.yml
    release.yml
src/
  twat_search/
    web/
      engines/
        __init__.py
        base.py
        bing_scraper.py
        brave.py
        critique.py
        duckduckgo.py
        hasdata.py
        pplx.py
        serpapi.py
        tavily.py
        you.py
      __init__.py
      api.py
      cli.py
      config.py
      exceptions.py
      models.py
      utils.py
    __init__.py
    __main__.py
tests/
  unit/
    web/
      engines/
        __init__.py
        test_base.py
      __init__.py
      test_api.py
      test_config.py
      test_exceptions.py
      test_models.py
      test_utils.py
    __init__.py
    mock_engine.py
  web/
    test_bing_scraper.py
  conftest.py
  test_twat_search.py
.gitignore
.pre-commit-config.yaml
cleanup.py
LICENSE
PROGRESS.md
pyproject.toml
README.md
TODO.md
VERSION.txt

================================================================
Files
================================================================

================
File: .cursor/rules/0project.mdc
================
---
description: About this project
globs: 
alwaysApply: false
---
# About this project

`twat-search` is a multi-provider search 

## Development Notes
- Uses `uv` for Python package management
- Quality tools: ruff, mypy, pytest
- Clear provider protocol for adding new search backends
- Strong typing and runtime checks throughout

================
File: .cursor/rules/cleanup.mdc
================
---
description: Run `cleanup.py` script before and after changes
globs: 
alwaysApply: false
---
Before you do any changes or if I say "cleanup", run the `cleanup.py update` script in the main folder. Analyze the results, describe recent changes in [PROGRESS.md](mdc:PROGRESS.md) and edit @TODO.md to update priorities and plan next changes. PERFORM THE CHANGES, then run the `cleanup.py status` script and react to the results.

When you edit @TODO.md, lead in lines with empty GFM checkboxes if things aren't done (`- [ ] `) vs. filled (`- [x] `) if done.

================
File: .cursor/rules/filetree.mdc
================
---
description: File tree of the project
globs: 
---
[ 896]  .
├── [  64]  .benchmarks
├── [  96]  .cursor
│   └── [ 192]  rules
│       ├── [ 334]  0project.mdc
│       ├── [ 558]  cleanup.mdc
│       └── [4.4K]  filetree.mdc
├── [  96]  .github
│   └── [ 128]  workflows
│       ├── [2.7K]  push.yml
│       └── [1.4K]  release.yml
├── [3.5K]  .gitignore
├── [ 532]  .pre-commit-config.yaml
├── [  96]  .specstory
│   └── [ 736]  history
│       ├── [2.0K]  .what-is-this.md
│       ├── [ 52K]  2025-02-25_01-58-creating-and-tracking-project-tasks.md
│       ├── [7.4K]  2025-02-25_02-17-project-task-continuation-and-progress-update.md
│       ├── [ 11K]  2025-02-25_02-24-planning-tests-for-twat-search-web-package.md
│       ├── [196K]  2025-02-25_02-27-implementing-tests-for-twat-search-package.md
│       ├── [ 46K]  2025-02-25_02-58-transforming-python-script-into-cli-tool.md
│       ├── [ 93K]  2025-02-25_03-09-generating-a-name-for-the-chat.md
│       ├── [5.5K]  2025-02-25_03-33-untitled.md
│       ├── [ 57K]  2025-02-25_03-54-integrating-search-engines-into-twat-search.md
│       ├── [ 72K]  2025-02-25_04-05-consolidating-you-py-and-youcom-py.md
│       ├── [6.1K]  2025-02-25_04-13-missing-env-api-key-names-in-pplx-py.md
│       ├── [118K]  2025-02-25_04-16-implementing-functions-for-brave-search-engines.md
│       ├── [286K]  2025-02-25_04-48-unifying-search-engine-parameters-in-twat-search.md
│       ├── [ 83K]  2025-02-25_05-36-implementing-duckduckgo-search-engine.md
│       ├── [194K]  2025-02-25_05-43-implementing-the-webscout-search-engine.md
│       ├── [ 23K]  2025-02-25_06-07-implementing-bing-scraper-engine.md
│       ├── [ 15K]  2025-02-25_06-12-continuing-bing-scraper-engine-implementation.md
│       ├── [121K]  2025-02-25_06-34-implementing-safe-import-patterns-in-modules.md
│       ├── [9.9K]  2025-02-25_07-09-refactoring-plan-and-progress-update.md
│       ├── [ 40K]  2025-02-25_07-17-implementing-phase-1-from-todo-md.md
│       └── [292K]  2025-02-25_07-34-integrating-hasdata-google-serp-apis.md
├── [ 499]  CLEANUP.txt
├── [1.0K]  LICENSE
├── [1.2K]  PROGRESS.md
├── [3.2K]  README.md
├── [4.1K]  TODO.md
├── [   7]  VERSION.txt
├── [ 12K]  cleanup.py
├── [ 128]  dist
├── [9.6K]  pyproject.toml
├── [ 128]  src
│   └── [ 256]  twat_search
│       ├── [ 556]  __init__.py
│       ├── [2.0K]  __main__.py
│       └── [ 384]  web
│           ├── [1.6K]  __init__.py
│           ├── [4.8K]  api.py
│           ├── [ 33K]  cli.py
│           ├── [4.3K]  config.py
│           ├── [ 480]  engines
│           │   ├── [4.2K]  __init__.py
│           │   ├── [3.7K]  base.py
│           │   ├── [ 11K]  bing_scraper.py
│           │   ├── [7.6K]  brave.py
│           │   ├── [8.2K]  critique.py
│           │   ├── [6.7K]  duckduckgo.py
│           │   ├── [7.1K]  hasdata.py
│           │   ├── [4.9K]  pplx.py
│           │   ├── [6.9K]  serpapi.py
│           │   ├── [7.4K]  tavily.py
│           │   └── [7.3K]  you.py
│           ├── [1.0K]  exceptions.py
│           ├── [1.3K]  models.py
│           └── [1.5K]  utils.py
├── [ 256]  tests
│   ├── [  64]  .benchmarks
│   ├── [2.0K]  conftest.py
│   ├── [ 157]  test_twat_search.py
│   ├── [ 192]  unit
│   │   ├── [  42]  __init__.py
│   │   ├── [1.5K]  mock_engine.py
│   │   └── [ 320]  web
│   │       ├── [  46]  __init__.py
│   │       ├── [ 160]  engines
│   │       │   ├── [  37]  __init__.py
│   │       │   └── [4.3K]  test_base.py
│   │       ├── [5.1K]  test_api.py
│   │       ├── [2.7K]  test_config.py
│   │       ├── [2.0K]  test_exceptions.py
│   │       ├── [4.5K]  test_models.py
│   │       └── [3.5K]  test_utils.py
│   └── [ 160]  web
│       └── [ 10K]  test_bing_scraper.py
└── [ 87K]  twat_search.txt

19 directories, 70 files

================
File: .github/workflows/push.yml
================
name: Build & Test
on:
  push:
    branches: [main]
    tags-ignore: ["v*"]
  pull_request:
    branches: [main]
  workflow_dispatch:
permissions:
  contents: write
  id-token: write
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true
jobs:
  quality:
    name: Code Quality
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Run Ruff lint
        uses: astral-sh/ruff-action@v3
        with:
          version: "latest"
          args: "check --output-format=github"
      - name: Run Ruff Format
        uses: astral-sh/ruff-action@v3
        with:
          version: "latest"
          args: "format --check --respect-gitignore"
  test:
    name: Run Tests
    needs: quality
    strategy:
      matrix:
        python-version: ["3.10", "3.11", "3.12"]
        os: [ubuntu-latest]
      fail-fast: true
    runs-on: ${{ matrix.os }}
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install UV
        uses: astral-sh/setup-uv@v5
        with:
          version: "latest"
          python-version: ${{ matrix.python-version }}
          enable-cache: true
          cache-suffix: ${{ matrix.os }}-${{ matrix.python-version }}
      - name: Install test dependencies
        run: |
          uv pip install --system --upgrade pip
          uv pip install --system ".[test]"
      - name: Run tests with Pytest
        run: uv run pytest -n auto --maxfail=1 --disable-warnings --cov-report=xml --cov-config=pyproject.toml --cov=src/twat_search --cov=tests tests/
      - name: Upload coverage report
        uses: actions/upload-artifact@v4
        with:
          name: coverage-${{ matrix.python-version }}-${{ matrix.os }}
          path: coverage.xml
  build:
    name: Build Distribution
    needs: test
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - name: Install UV
        uses: astral-sh/setup-uv@v5
        with:
          version: "latest"
          python-version: "3.12"
          enable-cache: true
      - name: Install build tools
        run: uv pip install build hatchling hatch-vcs
      - name: Build distributions
        run: uv run python -m build --outdir dist
      - name: Upload distribution artifacts
        uses: actions/upload-artifact@v4
        with:
          name: dist-files
          path: dist/
          retention-days: 5

================
File: .github/workflows/release.yml
================
name: Release
on:
  push:
    tags: ["v*"]
permissions:
  contents: write
  id-token: write
jobs:
  release:
    name: Release to PyPI
    runs-on: ubuntu-latest
    environment:
      name: pypi
      url: https://pypi.org/p/twat-search
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - name: Install UV
        uses: astral-sh/setup-uv@v5
        with:
          version: "latest"
          python-version: "3.12"
          enable-cache: true
      - name: Install build tools
        run: uv pip install build hatchling hatch-vcs
      - name: Build distributions
        run: uv run python -m build --outdir dist
      - name: Verify distribution files
        run: |
          ls -la dist/
          test -n "$(find dist -name '*.whl')" || (echo "Wheel file missing" && exit 1)
          test -n "$(find dist -name '*.tar.gz')" || (echo "Source distribution missing" && exit 1)
      - name: Publish to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          password: ${{ secrets.PYPI_TOKEN }}
      - name: Create GitHub Release
        uses: softprops/action-gh-release@v1
        with:
          files: dist/*
          generate_release_notes: true
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

================
File: src/twat_search/web/engines/__init__.py
================
    __all__.extend(
    __all__.extend(["SerpApiSearchEngine", "serpapi"])
    __all__.extend(["TavilySearchEngine", "tavily"])
    __all__.extend(["PerplexitySearchEngine", "pplx"])
    __all__.extend(["YouNewsSearchEngine", "YouSearchEngine", "you", "you_news"])
    __all__.extend(["CritiqueSearchEngine", "critique"])
    __all__.extend(["DuckDuckGoSearchEngine", "duckduckgo"])
    __all__.extend(["BingScraperSearchEngine", "bing_scraper"])
def is_engine_available(engine_name: str) -> bool:
def get_engine_function(
    return available_engine_functions.get(engine_name)
def get_available_engines() -> list[str]:
    return list(available_engine_functions.keys())

================
File: src/twat_search/web/engines/base.py
================
class SearchEngine(abc.ABC):
    def __init__(self, config: EngineConfig, **kwargs: Any) -> None:
        self.num_results = kwargs.get("num_results", 5)
        self.country = kwargs.get("country", None)
        self.language = kwargs.get("language", None)
        self.safe_search = kwargs.get("safe_search", True)
        self.time_frame = kwargs.get("time_frame", None)
            raise SearchError(msg)
    async def search(self, query: str) -> list[SearchResult]:
def register_engine(engine_class: type[SearchEngine]) -> type[SearchEngine]:
    if not hasattr(engine_class, "env_api_key_names"):
        engine_class.env_api_key_names = [f"{engine_class.name.upper()}_API_KEY"]
    if not hasattr(engine_class, "env_enabled_names"):
        engine_class.env_enabled_names = [f"{engine_class.name.upper()}_ENABLED"]
    if not hasattr(engine_class, "env_params_names"):
        engine_class.env_params_names = [f"{engine_class.name.upper()}_DEFAULT_PARAMS"]
def get_engine(engine_name: str, config: EngineConfig, **kwargs: Any) -> SearchEngine:
    engine_class = _engine_registry.get(engine_name)
    return engine_class(config, **kwargs)
def get_registered_engines() -> dict[str, type[SearchEngine]]:
    return _engine_registry.copy()

================
File: src/twat_search/web/engines/bing_scraper.py
================
    class BingScraper:  # type: ignore
        def __init__(
        def search(self, query: str, num_results: int = 10) -> list[Any]:
logger = logging.getLogger(__name__)
class BingScraperResult(BaseModel):
class BingScraperSearchEngine(SearchEngine):
        super().__init__(config, **kwargs)
        self.max_results: int = num_results or self.config.default_params.get(
        self.max_retries: int = kwargs.get(
        ) or self.config.default_params.get("max_retries", 3)
        self.delay_between_requests: float = kwargs.get(
        ) or self.config.default_params.get("delay_between_requests", 1.0)
            unused_params.append(f"country='{country}'")
            unused_params.append(f"language='{language}'")
            unused_params.append(f"safe_search={safe_search}")
            unused_params.append(f"time_frame='{time_frame}'")
            logger.debug(
                f"Parameters {', '.join(unused_params)} set but not used by Bing Scraper"
    def _convert_result(self, result: Any) -> SearchResult | None:
            logger.warning("Empty result received from Bing Scraper")
        if not hasattr(result, "title") or not hasattr(result, "url"):
            logger.warning(f"Invalid result format: {result}")
            validated = BingScraperResult(
                if hasattr(result, "description")
            return SearchResult(
                    "url": str(result.url),
            logger.warning(f"Validation error for result: {exc}")
            logger.warning(f"Unexpected error converting result: {exc}")
    async def search(self, query: str) -> list[SearchResult]:
            raise EngineError(self.name, "Search query cannot be empty")
        logger.info(f"Searching Bing with query: '{query}'")
            scraper = BingScraper(
            raw_results = scraper.search(query, num_results=self.max_results)
                logger.info("No results returned from Bing Scraper")
            logger.debug(f"Received {len(raw_results)} raw results from Bing Scraper")
            logger.error(error_msg)
            raise EngineError(self.name, error_msg) from exc
                self._convert_result(result) for result in raw_results
        logger.info(f"Returning {len(results)} validated results from Bing Scraper")
async def bing_scraper(
    return await search(

================
File: src/twat_search/web/engines/brave.py
================
class BraveResult(BaseModel):
class BraveNewsResult(BaseModel):
class BaseBraveEngine(SearchEngine):
    def __init__(
        super().__init__(config)
        count = kwargs.get("count", num_results)
        self.count = count or self.config.default_params.get("count", 10)
            or kwargs.get("country")
            or self.config.default_params.get("country", None)
        search_lang = kwargs.get("search_lang", language)
        self.search_lang = search_lang or self.config.default_params.get(
        ui_lang = kwargs.get("ui_lang", language)
        self.ui_lang = ui_lang or self.config.default_params.get("ui_lang", None)
        safe = kwargs.get("safe_search", safe_search)
        if isinstance(safe, bool):
        self.safe_search = safe or self.config.default_params.get("safe_search", None)
        freshness = kwargs.get("freshness", time_frame)
        self.freshness = freshness or self.config.default_params.get("freshness", None)
            raise EngineError(
                f"Brave API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
    async def search(self, query: str) -> list[SearchResult]:
        async with httpx.AsyncClient() as client:
                response = await client.get(
                response.raise_for_status()
                data = response.json()
                section = data.get(self.response_key, {})
                if section.get("results"):
                            parsed = self.result_model(**result)
                            results.append(self.convert_result(parsed, result))
                raise EngineError(self.name, f"HTTP Request failed: {exc}") from exc
                raise EngineError(self.name, f"Response parsing error: {exc}") from exc
    def convert_result(self, parsed: BaseModel, raw: dict[str, Any]) -> SearchResult:
            publisher = getattr(parsed, "publisher", None)
            published_time = getattr(parsed, "published_time", None)
        return SearchResult(
class BraveSearchEngine(BaseBraveEngine):
class BraveNewsSearchEngine(BaseBraveEngine):
async def brave(
    config = EngineConfig(api_key=api_key, enabled=True)
    engine = BraveSearchEngine(
    return await engine.search(query)
async def brave_news(
    engine = BraveNewsSearchEngine(

================
File: src/twat_search/web/engines/critique.py
================
class CritiqueResult(BaseModel):
    url: str = Field(default="")  # URL of the result source
    title: str = Field(default="")  # Title of the result
    summary: str = Field(default="")  # Summary or snippet from the result
    source: str = Field(default="")  # Source of the result
class CritiqueResponse(BaseModel):
    results: list[CritiqueResult] = Field(default_factory=list)
class CritiqueSearchEngine(SearchEngine):
    def __init__(
        super().__init__(config)
        self.image_url = image_url or kwargs.get("image_url")
        self.image_base64 = image_base64 or kwargs.get("image_base64")
        self.source_whitelist = source_whitelist or kwargs.get("source_whitelist")
        self.source_blacklist = source_blacklist or kwargs.get("source_blacklist")
        self.output_format = output_format or kwargs.get("output_format")
            raise EngineError(
                f"Critique Labs API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
    async def _convert_image_url_to_base64(self, image_url: str) -> str:
            async with httpx.AsyncClient() as client:
                response = await client.get(image_url, timeout=30)
                response.raise_for_status()
                encoded = base64.b64encode(response.content).decode("utf-8")
            raise EngineError(self.name, f"Failed to fetch image from URL: {e}")
            raise EngineError(self.name, f"Error processing image: {e}")
    async def _build_payload(self, query: str) -> dict[str, Any]:
            payload["image"] = await self._convert_image_url_to_base64(self.image_url)
    def _build_result(self, item: CritiqueResult, rank: int) -> SearchResult:
                HttpUrl(item.url) if item.url else HttpUrl("https://critique-labs.ai")
            url_obj = HttpUrl("https://critique-labs.ai")
        return SearchResult(
            raw=item.dict(),
    def _parse_results(self, data: dict[str, Any]) -> list[SearchResult]:
        critique_data = CritiqueResponse(
            results=data.get("results", []),
            response=data.get("response"),
            structured_output=data.get("structured_output"),
            results.append(
                SearchResult(
                    url=HttpUrl("https://critique-labs.ai"),
        for idx, item in enumerate(critique_data.results, 1):
                results.append(self._build_result(item, idx))
    async def search(self, query: str) -> list[SearchResult]:
        payload = await self._build_payload(query)
                response = await client.post(
                data = response.json()
                return self._parse_results(data)
                raise EngineError(self.name, f"HTTP Request failed: {exc}") from exc
                raise EngineError(self.name, f"Response parsing error: {exc}") from exc
                raise EngineError(self.name, f"Search failed: {exc}") from exc
async def critique(
    config = EngineConfig(api_key=api_key, enabled=True)
    engine = CritiqueSearchEngine(
    return await engine.search(query)

================
File: src/twat_search/web/engines/duckduckgo.py
================
logger = logging.getLogger(__name__)
class DuckDuckGoResult(BaseModel):
class DuckDuckGoSearchEngine(SearchEngine):
    def __init__(
        super().__init__(config, **kwargs)
        ) = self._map_init_params(
            logger.debug(
    def _map_init_params(
        max_results = kwargs.get(
        ) or config.default_params.get("max_results", 10)
        region = kwargs.get("region", country) or config.default_params.get(
        lang = language or config.default_params.get("language", None)
        timelimit = kwargs.get("timelimit", time_frame) or config.default_params.get(
        if timelimit and not kwargs.get("timelimit"):
            timelimit = time_mapping.get(timelimit.lower(), timelimit)
        safesearch = kwargs.get("safesearch", safe_search)
        if isinstance(safesearch, str):
            safesearch = False if safesearch.lower() in ["off", "false"] else True
        proxy = kwargs.get("proxy") or config.default_params.get("proxy", None)
        timeout = kwargs.get("timeout") or config.default_params.get("timeout", 10)
    def _convert_result(self, raw: dict[str, Any]) -> SearchResult | None:
            ddg_result = DuckDuckGoResult(
            return SearchResult(
            logger.warning(f"Validation error for result: {exc}")
    async def search(self, query: str) -> list[SearchResult]:
            ddgs = DDGS(proxy=self.proxy, timeout=self.timeout)
            raw_results = ddgs.text(**params)
                converted = self._convert_result(raw)
                    results.append(converted)
            raise EngineError(self.name, f"Search failed: {exc}") from exc
async def duckduckgo(
    return await search(

================
File: src/twat_search/web/engines/hasdata.py
================
class HasDataGoogleResult(BaseModel):
    def from_api_result(cls, result: dict[str, Any]) -> "HasDataGoogleResult":
        return cls(
            title=result.get("title", ""),
            url=result.get("link", ""),
            snippet=result.get("snippet", ""),
class HasDataBaseEngine(SearchEngine):
    def __init__(
        super().__init__(config)
            or kwargs.get("location")
            or self.config.default_params.get("location")
            or kwargs.get("device_type")
            or self.config.default_params.get("device_type", "desktop")
            raise EngineError(
                f"HasData API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
    async def search(self, query: str) -> list[SearchResult]:
        async with httpx.AsyncClient() as client:
                response = await client.get(
                response.raise_for_status()
                data = response.json()
                organic_results = data.get("organicResults", [])
                for i, result in enumerate(organic_results):
                        parsed = HasDataGoogleResult.from_api_result(result)
                        results.append(
                            SearchResult(
                raise EngineError(self.name, f"HTTP Request failed: {exc}") from exc
                raise EngineError(self.name, f"Response parsing error: {exc}") from exc
                raise EngineError(self.name, f"Invalid JSON response: {exc}") from exc
class HasDataGoogleEngine(HasDataBaseEngine):
class HasDataGoogleLightEngine(HasDataBaseEngine):
async def hasdata_google(
    config = EngineConfig(api_key=api_key, enabled=True)
    engine = HasDataGoogleEngine(
    return await engine.search(query)
async def hasdata_google_light(
    engine = HasDataGoogleLightEngine(

================
File: src/twat_search/web/engines/pplx.py
================
class PerplexityResult(BaseModel):
    answer: str = Field(default="")  # Perplexity may sometimes not include all details
    url: str = Field(default="https://perplexity.ai")  # Default URL if none provided
    title: str = Field(default="Perplexity AI Response")  # Default title
class PerplexitySearchEngine(SearchEngine):
    def __init__(
        super().__init__(config)
            or kwargs.get("model")
            or self.config.default_params.get("model", "pplx-70b-online")
            raise EngineError(
                f"Perplexity API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}.",
    async def search(self, query: str) -> list[SearchResult]:
            async with httpx.AsyncClient() as client:
                response = await client.post(
                response.raise_for_status()
                data = response.json()
            raise EngineError(self.name, f"HTTP Request failed: {exc}") from exc
        for choice in data.get("choices", []):
            answer = choice.get("message", {}).get("content", "")
                pr = PerplexityResult(answer=answer, url=url, title=title)
                url_obj = HttpUrl(pr.url)  # Validate URL format
                results.append(
                    SearchResult(
async def pplx(
    config = EngineConfig(
    engine = PerplexitySearchEngine(
    return await engine.search(query)

================
File: src/twat_search/web/engines/serpapi.py
================
class SerpApiResult(BaseModel):
class SerpApiResponse(BaseModel):
class SerpApiSearchEngine(SearchEngine):
    def __init__(
        super().__init__(config)
            "num": kwargs.get("num", num_results)
            or self.config.default_params.get("num", 10),
            "google_domain": kwargs.get("google_domain")
            or self.config.default_params.get("google_domain", "google.com"),
            "gl": kwargs.get("gl", country) or self.config.default_params.get("gl"),
            "hl": kwargs.get("hl", language) or self.config.default_params.get("hl"),
            "safe": self._convert_safe(kwargs.get("safe", safe_search))
            or self.config.default_params.get("safe"),
            "time_period": kwargs.get("time_period", time_frame)
            or self.config.default_params.get("time_period"),
            raise EngineError(
                f"SerpApi API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
    def _convert_safe(self, safe: bool | str | None) -> str | None:
        if isinstance(safe, bool):
    async def search(self, query: str) -> list[SearchResult]:
        params.update({k: v for k, v in self._params.items() if v is not None})
        async with httpx.AsyncClient() as client:
                response = await client.get(
                response.raise_for_status()
                data = response.json()
                serpapi_response = SerpApiResponse(**data)
                        results.append(
                            SearchResult(
                                raw=result.model_dump(),  # Include raw result for debugging
                raise EngineError(self.name, f"HTTP Request failed: {exc}") from exc
                raise EngineError(self.name, f"Response parsing error: {exc}") from exc
async def serpapi(
    config = EngineConfig(
    engine = SerpApiSearchEngine(
    return await engine.search(query)

================
File: src/twat_search/web/engines/tavily.py
================
class TavilySearchResult(BaseModel):
class TavilySearchResponse(BaseModel):
class TavilySearchEngine(SearchEngine):
    def __init__(
        super().__init__(config)
        def get_default(value, key, fallback):
                else self.config.default_params.get(key, fallback)
        self.max_results = get_default(
            kwargs.get("max_results", num_results), "max_results", 5
        self.search_depth = get_default(search_depth, "search_depth", "basic")
        self.include_domains = get_default(include_domains, "include_domains", None)
        self.exclude_domains = get_default(exclude_domains, "exclude_domains", None)
        self.include_answer = get_default(include_answer, "include_answer", False)
        self.max_tokens = get_default(max_tokens, "max_tokens", None)
        self.search_type = get_default(search_type, "search_type", "search")
            raise EngineError(
                f"Tavily API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
    def _build_payload(self, query: str) -> dict:
    def _convert_result(self, item: dict, rank: int) -> SearchResult | None:
            validated_url = HttpUrl(item.get("url", ""))
            return SearchResult(
                title=item.get("title", ""),
                snippet=textwrap.shorten(
                    item.get("content", "").strip(), width=500, placeholder="..."
    async def search(self, query: str) -> list[SearchResult]:
        payload = self._build_payload(query)
        async with httpx.AsyncClient() as client:
                response = await client.post(
                response.raise_for_status()
                data = response.json()
                raise EngineError(self.name, f"HTTP error: {e}")
                raise EngineError(self.name, f"Request error: {e}")
                raise EngineError(self.name, f"Error: {e!s}")
            parsed_response = TavilySearchResponse.parse_obj(data)
            items = [item.dict() for item in parsed_response.results]
            items = data.get("results", [])
        for idx, item in enumerate(items, start=1):
            converted = self._convert_result(item, idx)
                results.append(converted)
async def tavily(
    config = EngineConfig(
    engine = TavilySearchEngine(
    return await engine.search(query)

================
File: src/twat_search/web/engines/you.py
================
class YouSearchHit(BaseModel):
    snippet: str = Field(alias="description")
class YouSearchResponse(BaseModel):
    search_id: str | None = Field(None, alias="searchId")
class YouNewsArticle(BaseModel):
class YouNewsResponse(BaseModel):
class YouBaseEngine(SearchEngine):
    def __init__(
        super().__init__(config)
            raise EngineError(
                f"You.com API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
        self.num_results = num_results or self.config.default_params.get(
        self.country_code = country or self.config.default_params.get(
        self.safe_search = safe_search or self.config.default_params.get(
    async def _make_api_call(self, query: str) -> dict:
            params["safe_search"] = str(self.safe_search).lower()
        async with httpx.AsyncClient() as client:
                response = await client.get(
                response.raise_for_status()
                return response.json()
class YouSearchEngine(YouBaseEngine):
    async def search(self, query: str) -> list[SearchResult]:
        data = await self._make_api_call(query)
            you_response = YouSearchResponse(**data)
                    results.append(
                        SearchResult(
                            raw=hit.model_dump(by_alias=True),
class YouNewsSearchEngine(YouBaseEngine):
            you_response = YouNewsResponse(**data)
                            raw=article.model_dump(by_alias=True),
async def you(
    config = EngineConfig(api_key=api_key, enabled=True)
    engine = YouSearchEngine(
    return await engine.search(query)
async def you_news(
    engine = YouNewsSearchEngine(

================
File: src/twat_search/web/__init__.py
================
    __all__.extend(["Config", "EngineConfig", "SearchResult", "search"])
    __all__.extend(["brave", "brave_news"])
    __all__.extend(["pplx"])
    __all__.extend(["serpapi"])
    __all__.extend(["tavily"])
    __all__.extend(["you", "you_news"])
    __all__.extend(["critique"])
    __all__.extend(["duckduckgo"])
    __all__.extend(["bing_scraper"])

================
File: src/twat_search/web/api.py
================
logger = logging.getLogger(__name__)
def get_engine_params(
        k[len(engine_name) + 1 :]: v
        for k, v in kwargs.items()
        if k.startswith(engine_name + "_")
        if not any(k.startswith(e + "_") for e in engines)
def init_engine_task(
    engine_config = config.engines.get(engine_name)
        logger.warning(f"Engine '{engine_name}' not configured.")
        engine_params = get_engine_params(engine_name, engines, kwargs, common_params)
        engine_instance: SearchEngine = get_engine(
        logger.info(f"🔍 Querying engine: {engine_name}")
        return (engine_name, engine_instance.search(query))
        logger.warning(
        logger.error(f"Error initializing engine '{engine_name}': {e}")
async def search(
        config = config or Config()
        engines = engines or list(config.engines.keys())
            raise SearchError(msg)
            }.items()
            task = init_engine_task(
                engine_names.append(task[0])
                tasks.append(task[1])
        results = await asyncio.gather(*tasks, return_exceptions=True)
        for engine_name, result in zip(engine_names, results, strict=False):
            if isinstance(result, Exception):
                logger.error(f"Search with engine '{engine_name}' failed: {result}")
            elif isinstance(result, list):
                logger.info(f"✅ Engine '{engine_name}' returned {len(result)} results")
                flattened_results.extend(result)
                logger.info(
                    f"⚠️ Engine '{engine_name}' returned no results or unexpected type: {type(result)}"
        logger.error(f"Search failed: {e}")

================
File: src/twat_search/web/cli.py
================
class CustomJSONEncoder(json_lib.JSONEncoder):
    def default(self, o: Any) -> Any:
            return json_lib.JSONEncoder.default(self, o)
            return str(o)
console = Console()
class SearchCLI:
    def __init__(self) -> None:
        self.logger = logging.getLogger("twat_search.cli")
        self.log_handler = RichHandler(rich_tracebacks=True)
        self._configure_logging()
        self.console = Console()
        available_engines = get_available_engines()
            self.logger.warning(
                f"{', '.join(missing_engines)}. "
    def _configure_logging(self, verbose: bool = False) -> None:
        logging.basicConfig(
        self.logger.setLevel(level)
        logging.getLogger("twat_search.web.api").setLevel(level)
        logging.getLogger("twat_search.web.engines").setLevel(level)
        logging.getLogger("httpx").setLevel(level)
    def _parse_engines(self, engines_arg: Any) -> list[str] | None:
        if isinstance(engines_arg, str):
            return [e.strip() for e in engines_arg.split(",") if e.strip()]
        if isinstance(engines_arg, list | tuple):
            return [str(e).strip() for e in engines_arg if str(e).strip()]
            f"Unexpected engines type: {type(engines_arg)}. Using all available engines."
    async def _run_search(
                if engine == "all" or is_engine_available(engine):
                    available.append(engine)
            self.logger.debug(f"Attempting to search with engines: {engines}")
            results = await search(query=query, engines=engines, **kwargs)
            return self._process_results(results)
            self.logger.error(f"Search failed: {e}")
            self._display_errors([str(e)])
    def _process_results(self, results: list) -> list[dict[str, Any]]:
            engine_name = getattr(result, "source", None) or "unknown"
            engine_results.setdefault(engine_name, []).append(result)
        for engine, engine_results_list in engine_results.items():
                processed.append(
            for idx, result in enumerate(engine_results_list):
                url = str(result.url)
                        if len(result.snippet) > 100
                        "raw_result": getattr(result, "raw", None),
    def _display_results(
            console.print("[bold red]No results found![/bold red]")
        table = Table()  # Remove show_lines=True to eliminate row separator lines
        table.add_column("Engine", style="cyan", no_wrap=True)
            table.add_column("Status", style="magenta")
            table.add_column("Title", style="green")
            table.add_column("URL", style="blue", overflow="fold")
                table.add_row(
            table.add_column("URL", style="blue", overflow="fold", max_width=70)
                table.add_row(result["engine"], result["url"])
        console.print(table)
                    console.print(result)
    def _display_json_results(self, processed_results: list[dict[str, Any]]) -> None:
            results_by_engine[engine]["results"].append(
                    "snippet": result.get("snippet")
                    if result.get("snippet") != "N/A"
                    "raw": result.get("raw_result"),
    def _display_errors(self, error_messages: list[str]) -> None:
        table = Table(title="❌ Search Errors")
        table.add_column("Error", style="red")
            table.add_row(error)
    async def _search_engine(
        self._configure_logging(verbose)
        if not is_engine_available(engine):
            error_msg = f"{engine.capitalize()} search engine is not available. Make sure the required dependency is installed."
            self.logger.error(error_msg)
            self._display_errors([error_msg])
        engine_func = get_engine_function(engine)
                f"{engine.capitalize()} search engine function could not be loaded."
        friendly = friendly_names.get(engine, engine)
            self.console.print(f"[bold]Searching {friendly}[/bold]: {query}")
            results = await engine_func(query=query, **params)
            processed_results = self._process_results(results)
                self._display_json_results(processed_results)
                self._display_results(processed_results, verbose)
            self.logger.error(f"{friendly} search failed: {e}")
    def q(
        engine_list = self._parse_engines(engines)
        common_params = {k: v for k, v in common_params.items() if v is not None}
            results = asyncio.run(
                self._run_search(query, engine_list, **common_params, **kwargs)
            with self.console.status(
            self._display_json_results(results)
            self._display_results(results, verbose)
    def info(self, engine: str | None = None, json: bool = False) -> None:
            config = Config()
                self._display_engines_json(engine, config)
                self._list_all_engines(config)
                self._show_engine_details(engine, config)
                self.logger.error(f"❌ Failed to display engine information: {e}")
    def _list_all_engines(self, config: "Config") -> None:
        table = Table(title="🔎 Available Search Engines")
        table.add_column("Enabled", style="magenta")
        table.add_column("API Key Required", style="yellow")
            registered_engines = get_registered_engines()
        for engine, engine_config in config.engines.items():
                hasattr(engine_config, "api_key") and engine_config.api_key is not None
                engine_class = registered_engines.get(engine)
                if engine_class and hasattr(engine_class, "env_api_key_names"):
                    api_key_required = bool(engine_class.env_api_key_names)
        self.console.print(table)
        self.console.print(
    def _show_engine_details(self, engine_name: str, config: "Config") -> None:
            self.console.print("\nAvailable engines:")
                self.console.print(f"- {name}")
            engine_class = registered_engines.get(engine_name)
                and hasattr(engine_class, "env_api_key_names")
            self.console.print(f"\n[bold cyan]🔍 Engine: {engine_name}[/bold cyan]")
                self.console.print("\n[bold]API Key Environment Variables:[/bold]")
                    value_status = "✅" if os.environ.get(env_name) else "❌"
                    self.console.print(f"  {env_name}: {value_status}")
            self.console.print("\n[bold]Default Parameters:[/bold]")
                for param, value in engine_config.default_params.items():
                    self.console.print(f"  {param}: {value}")
                self.console.print("  No default parameters specified")
                base_engine = engine_name.split("-")[0]
                engine_module = importlib.import_module(module_name)
                function_name = engine_name.replace("-", "_")
                if hasattr(engine_module, function_name):
                    func = getattr(engine_module, function_name)
                    self.console.print("\n[bold]Function Interface:[/bold]")
                        f"  [green]{function_name}()[/green] - {func.__doc__.strip().split('\\n')[0]}"
                    self.console.print("\n[bold]Example Usage:[/bold]")
            self.console.print("\n[bold]Basic Configuration:[/bold]")
            self.console.print(f"Enabled: {'✅' if engine_config.enabled else '❌'}")
            self.console.print(f"Default Parameters: {engine_config.default_params}")
    def _display_engines_json(self, engine: str | None, config: "Config") -> None:
            result[engine] = self._get_engine_info(
            for engine_name, engine_config in config.engines.items():
                result[engine_name] = self._get_engine_info(
    def _get_engine_info(
        if hasattr(engine_config, "api_key") and engine_config.api_key is not None:
                    {"name": env_name, "set": bool(os.environ.get(env_name))}
            if hasattr(engine_config, "default_params")
            if hasattr(engine_config, "enabled")
    def _check_engine_availability(self, engine_name: str) -> bool:
        return is_engine_available(engine_name)
    async def critique(
                s.strip() for s in source_whitelist.split(",") if s.strip()
                s.strip() for s in source_blacklist.split(",") if s.strip()
        params.update(kwargs)
        params = {k: v for k, v in params.items() if v is not None}
        return await self._search_engine("critique", query, params, json, verbose)
    async def brave(
        return await self._search_engine("brave", query, params, json, verbose)
    async def brave_news(
        return await self._search_engine("brave_news", query, params, json, verbose)
    async def serpapi(
        return await self._search_engine("serpapi", query, params, json, verbose)
    async def tavily(
                s.strip() for s in include_domains.split(",") if s.strip()
                s.strip() for s in exclude_domains.split(",") if s.strip()
        return await self._search_engine("tavily", query, params, json, verbose)
    async def pplx(
        return await self._search_engine("pplx", query, params, json, verbose)
    async def you(
        return await self._search_engine("you", query, params, json, verbose)
    async def you_news(
        return await self._search_engine("you_news", query, params, json, verbose)
    async def duckduckgo(
        return await self._search_engine("duckduckgo", query, params, json, verbose)
    async def hasdata_google(
        return await self._search_engine("hasdata-google", query, params, json, verbose)
    async def hasdata_google_light(
        return await self._search_engine(
def main() -> None:
    fire.Fire(SearchCLI())
    main()

================
File: src/twat_search/web/config.py
================
    load_dotenv()  # Load variables from .env file into environment
class EngineConfig(BaseModel):
    default_params: dict[str, Any] = Field(default_factory=dict)
class Config:
    def __init__(self, **kwargs: Any) -> None:
        self.engines: dict[str, EngineConfig] = kwargs.get("engines", {})
            self._load_engine_configs()
    def _load_engine_configs(self) -> None:
            registered_engines = get_registered_engines()
        for engine_name, engine_class in registered_engines.items():
                api_key = os.environ.get(env_name)
                enabled = os.environ.get(env_name)
                    engine_settings[engine_name]["enabled"] = enabled.lower() in (
                params = os.environ.get(env_name)
                        engine_settings[engine_name]["default_params"] = json.loads(
        for engine_name, settings in engine_settings.items():
                for key, value in settings.items():
                    setattr(existing_config, key, value)
                self.engines[engine_name] = EngineConfig(**settings)

================
File: src/twat_search/web/exceptions.py
================
class SearchError(Exception):
    def __init__(self, message: str) -> None:
        super().__init__(message)
class EngineError(SearchError):
    def __init__(self, engine_name: str, message: str) -> None:
        super().__init__(f"Engine '{engine_name}': {message}")

================
File: src/twat_search/web/models.py
================
class SearchResult(BaseModel):
    @field_validator("title", "snippet", "source")
    def validate_non_empty(cls, v: str) -> str:
        if not v or not v.strip():
            raise ValueError(msg)
        return v.strip()

================
File: src/twat_search/web/utils.py
================
logger = logging.getLogger(__name__)
class RateLimiter:
    def __init__(self, calls_per_second: int = 10):
    def wait_if_needed(self) -> None:
        now = time.time()
        if len(self.call_timestamps) >= self.calls_per_second:
                    logger.debug(f"Rate limiting: sleeping for {sleep_time:.2f}s")
                time.sleep(sleep_time)
        self.call_timestamps.append(time.time())

================
File: src/twat_search/__init__.py
================
    __all__.append("__version__")
    __all__.append("web")

================
File: src/twat_search/__main__.py
================
logging.basicConfig(
    handlers=[RichHandler(rich_tracebacks=True)],
logger = logging.getLogger(__name__)
console = Console()
SearchCLIType = TypeVar("SearchCLIType")
class TwatSearchCLI:
    def __init__(self) -> None:
            self.web: Any = web_cli.SearchCLI()
            logger.error(f"Web CLI not available: {e!s}")
            logger.error("Make sure twat_search.web.cli is properly installed.")
    def _cli_error(self, *args: Any, **kwargs: Any) -> int:  # noqa: ARG002
        console.print(
    def version(self) -> str:
def main() -> None:
    fire.Fire(TwatSearchCLI(), name="twat-search")
    main()

================
File: tests/unit/web/engines/__init__.py
================


================
File: tests/unit/web/engines/test_base.py
================
class TestSearchEngine(SearchEngine):
    async def search(self, query: str) -> list[SearchResult]:
            SearchResult(
                url=HttpUrl("https://example.com/test"),
register_engine(TestSearchEngine)
class DisabledTestSearchEngine(SearchEngine):
        raise NotImplementedError(msg)
register_engine(DisabledTestSearchEngine)
def test_search_engine_is_abstract() -> None:
    assert hasattr(SearchEngine, "__abstractmethods__")
    with pytest.raises(TypeError):
        SearchEngine(EngineConfig())  # type: ignore
def test_search_engine_name_class_var() -> None:
    assert hasattr(SearchEngine, "name")
def test_engine_registration() -> None:
    class NewEngine(SearchEngine):
    returned_class = register_engine(NewEngine)
    engine_instance = get_engine("new_engine", EngineConfig())
    assert isinstance(engine_instance, NewEngine)
def test_get_engine_with_invalid_name() -> None:
    with pytest.raises(SearchError, match="Unknown search engine"):
        get_engine("nonexistent_engine", EngineConfig())
def test_get_engine_with_disabled_engine() -> None:
    config = EngineConfig(enabled=False)
    with pytest.raises(SearchError, match="is disabled"):
        get_engine("disabled_engine", config)
def test_get_engine_with_config() -> None:
    config = EngineConfig(
    engine = get_engine("test_engine", config)
def test_get_engine_with_kwargs() -> None:
    engine = get_engine("test_engine", EngineConfig(), **kwargs)

================
File: tests/unit/web/__init__.py
================


================
File: tests/unit/web/test_api.py
================
logging.basicConfig(level=logging.DEBUG)
T = TypeVar("T")
class MockSearchEngine(SearchEngine):
    def __init__(self, config: EngineConfig, **kwargs: Any) -> None:
        super().__init__(config, **kwargs)
        self.should_fail = kwargs.get("should_fail", False)
    async def search(self, query: str) -> list[SearchResult]:
            raise Exception(msg)
        result_count = self.kwargs.get("result_count", 1)
            SearchResult(
                url=HttpUrl(f"https://example.com/{i + 1}"),
            for i in range(result_count)
register_engine(MockSearchEngine)
def mock_config() -> Config:
    config = Config()
        "mock": EngineConfig(
async def setup_teardown() -> AsyncGenerator[None, None]:
    tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
    with contextlib.suppress(asyncio.CancelledError):
        await asyncio.gather(*tasks)
async def test_search_with_mock_engine(
    results = await search("test query", engines=["mock"], config=mock_config)
    assert len(results) == 2
    assert all(isinstance(result, SearchResult) for result in results)
    assert all(result.source == "mock" for result in results)
async def test_search_with_additional_params(
    results = await search(
    assert len(results) == 3
async def test_search_with_engine_specific_params(
    assert len(results) == 4
async def test_search_with_no_engines(setup_teardown: None) -> None:
    with pytest.raises(SearchError, match="No search engines configured"):
        await search("test query", engines=[])
async def test_search_with_failing_engine(
    assert len(results) == 0
async def test_search_with_nonexistent_engine(
    with pytest.raises(SearchError, match="No search engines could be initialized"):
        await search("test query", engines=["nonexistent"], config=mock_config)
async def test_search_with_disabled_engine(
        await search("test query", engines=["mock"], config=mock_config)

================
File: tests/unit/web/test_config.py
================
def test_engine_config_defaults() -> None:
    config = EngineConfig()
def test_engine_config_values() -> None:
    config = EngineConfig(
def test_config_defaults(isolate_env_vars: None) -> None:
    config = Config()
    assert isinstance(config.engines, dict)
    assert len(config.engines) == 0
def test_config_with_env_vars(
def test_config_with_direct_initialization() -> None:
    custom_config = Config(
            "test_engine": EngineConfig(
def test_config_env_vars_override_direct_config(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setenv("BRAVE_API_KEY", "env_key")
            "brave": EngineConfig(

================
File: tests/unit/web/test_exceptions.py
================
def test_search_error() -> None:
    exception = SearchError(error_message)
    assert str(exception) == error_message
    assert isinstance(exception, Exception)
def test_engine_error() -> None:
    exception = EngineError(engine_name, error_message)
    assert str(exception) == f"Engine '{engine_name}': {error_message}"
    assert isinstance(exception, SearchError)
def test_engine_error_inheritance() -> None:
        raise EngineError(msg, "Test error")
        if isinstance(e, EngineError):
def test_search_error_as_base_class() -> None:
        raise SearchError(msg)
        exceptions.append(e)
        raise EngineError(msg, "API key missing")
    assert len(exceptions) == 2
    assert isinstance(exceptions[0], SearchError)
    assert isinstance(exceptions[1], EngineError)
    assert "General search error" in str(exceptions[0])
    assert "Engine 'brave': API key missing" in str(exceptions[1])

================
File: tests/unit/web/test_models.py
================
def test_search_result_valid_data() -> None:
    url = HttpUrl("https://example.com")
    result = SearchResult(
    assert str(result.url) == "https://example.com/"
def test_search_result_with_optional_fields() -> None:
def test_search_result_invalid_url() -> None:
    with pytest.raises(ValidationError):
        SearchResult.model_validate(
def test_search_result_empty_fields() -> None:
                "url": str(url),
def test_search_result_serialization() -> None:
    result_dict = result.model_dump()
    assert str(result_dict["url"]) == "https://example.com/"
    result_json = result.model_dump_json()
    assert isinstance(result_json, str)
def test_search_result_deserialization() -> None:
    result = SearchResult.model_validate(data)

================
File: tests/unit/web/test_utils.py
================
def rate_limiter() -> RateLimiter:
    return RateLimiter(calls_per_second=5)
def test_rate_limiter_init() -> None:
    limiter = RateLimiter(calls_per_second=10)
def test_rate_limiter_wait_when_not_needed(rate_limiter: RateLimiter) -> None:
    with patch("time.sleep") as mock_sleep:
        rate_limiter.wait_if_needed()
        mock_sleep.assert_not_called()
        for _ in range(3):  # 4 total calls including the one above
def test_rate_limiter_wait_when_needed(rate_limiter: RateLimiter) -> None:
    now = time.time()
        now - 0.01 * i for i in range(rate_limiter.calls_per_second)
    with patch("time.sleep") as mock_sleep, patch("time.time", return_value=now):
        mock_sleep.assert_called_once()
def test_rate_limiter_cleans_old_timestamps(rate_limiter: RateLimiter) -> None:
    with patch("time.time", return_value=now):
        len(rate_limiter.call_timestamps) == len(recent_stamps) + 1
@pytest.mark.parametrize("calls_per_second", [1, 5, 10, 100])
def test_rate_limiter_with_different_rates(calls_per_second: int) -> None:
    limiter = RateLimiter(calls_per_second=calls_per_second)
        for _ in range(calls_per_second):
            limiter.wait_if_needed()
        patch("time.sleep") as mock_sleep,
        patch("time.time", return_value=time.time()),

================
File: tests/unit/__init__.py
================


================
File: tests/unit/mock_engine.py
================
class MockSearchEngine(SearchEngine):
    def __init__(self, config: EngineConfig, **kwargs: Any) -> None:
        super().__init__(config, **kwargs)
        self.should_fail = kwargs.get("should_fail", False)
    async def search(self, query: str) -> list[SearchResult]:
            raise Exception(msg)
        result_count = self.kwargs.get("result_count", 1)
            SearchResult(
                url=HttpUrl(f"https://example.com/{i + 1}"),
            for i in range(result_count)
register_engine(MockSearchEngine)

================
File: tests/web/test_bing_scraper.py
================
class MockSearchResult:
    def __init__(self, title: str, url: str, description: str = "") -> None:
def engine_config() -> EngineConfig:
    return EngineConfig(enabled=True)
def engine(engine_config: EngineConfig) -> BingScraperSearchEngine:
    return BingScraperSearchEngine(config=engine_config, num_results=5)
def mock_results() -> list[MockSearchResult]:
        MockSearchResult(
class TestBingScraperEngine:
    @patch("twat_search.web.engines.bing_scraper.BingScraper")
    def test_init(self, mock_BingScraper: MagicMock, engine: Any) -> None:
        mock_BingScraper.assert_not_called()
    async def test_search_basic(
        mock_instance = MagicMock()
        results = await engine.search("test query")
        assert len(results) == 2
        assert isinstance(results[0], SearchResult)
        assert str(results[0].url) == "https://example.com/1"
        mock_BingScraper.assert_called_once_with(
        mock_instance.search.assert_called_once_with("test query", num_results=5)
    async def test_custom_parameters(self, mock_BingScraper: MagicMock) -> None:
        engine = BingScraperSearchEngine(
            config=EngineConfig(enabled=True),
        await engine.search("test query")
        mock_instance.search.assert_called_once_with("test query", num_results=10)
    async def test_invalid_url_handling(
        assert len(results) == 1
    @patch("twat_search.web.api.search")
    async def test_bing_scraper_convenience_function(
            SearchResult(
                url=HttpUrl("https://example.com"),
        results = await bing_scraper(
        mock_search.assert_called_once()
    async def test_empty_query(
        with pytest.raises(EngineError) as excinfo:
            await engine.search("")
        assert "Search query cannot be empty" in str(excinfo.value)
    async def test_no_results(
        assert isinstance(results, list)
        assert len(results) == 0
    async def test_network_error(
        mock_instance.search.side_effect = ConnectionError("Network timeout")
        assert "Network error connecting to Bing" in str(excinfo.value)
    async def test_parsing_error(
        mock_instance.search.side_effect = RuntimeError("Failed to parse HTML")
        assert "Error parsing Bing search results" in str(excinfo.value)
    async def test_invalid_result_format(
        class InvalidResult:
            def __init__(self):
        mock_instance.search.return_value = [InvalidResult()]

================
File: tests/conftest.py
================
@pytest.fixture(autouse=True)
def isolate_env_vars(monkeypatch: MonkeyPatch) -> None:
    for env_var in list(os.environ.keys()):
        if any(
            env_var.endswith(suffix)
            monkeypatch.delenv(env_var, raising=False)
    monkeypatch.setenv("_TEST_ENGINE", "true")
def env_vars_for_brave(monkeypatch: MonkeyPatch) -> None:
        sys.path.insert(0, str(Path(__file__).parent.parent))
        class MockBraveEngine(SearchEngine):
        register_engine(MockBraveEngine)
    monkeypatch.setenv("BRAVE_API_KEY", "test_brave_key")
    monkeypatch.setenv("BRAVE_ENABLED", "true")
    monkeypatch.setenv("BRAVE_DEFAULT_PARAMS", '{"count": 10}')
    monkeypatch.delenv("_TEST_ENGINE", raising=False)

================
File: tests/test_twat_search.py
================
def test_version():

================
File: .gitignore
================
*_autogen/
.DS_Store
__version__.py
__pycache__/
_Chutzpah*
_deps
_NCrunch_*
_pkginfo.txt
_Pvt_Extensions
_ReSharper*/
_TeamCity*
_UpgradeReport_Files/
!?*.[Cc]ache/
!.axoCover/settings.json
!.vscode/extensions.json
!.vscode/launch.json
!.vscode/settings.json
!.vscode/tasks.json
!**/[Pp]ackages/build/
!Directory.Build.rsp
.*crunch*.local.xml
.axoCover/*
.builds
.cr/personal
.fake/
.history/
.ionide/
.localhistory/
.mfractor/
.ntvs_analysis.dat
.paket/paket.exe
.sass-cache/
.vs/
.vscode
.vscode/*
.vshistory/
[Aa][Rr][Mm]/
[Aa][Rr][Mm]64/
[Bb]in/
[Bb]uild[Ll]og.*
[Dd]ebug/
[Dd]ebugPS/
[Dd]ebugPublic/
[Ee]xpress/
[Ll]og/
[Ll]ogs/
[Oo]bj/
[Rr]elease/
[Rr]eleasePS/
[Rr]eleases/
[Tt]est[Rr]esult*/
[Ww][Ii][Nn]32/
*_h.h
*_i.c
*_p.c
*_wpftmp.csproj
*- [Bb]ackup ([0-9]).rdl
*- [Bb]ackup ([0-9][0-9]).rdl
*- [Bb]ackup.rdl
*.[Cc]ache
*.[Pp]ublish.xml
*.[Rr]e[Ss]harper
*.a
*.app
*.appx
*.appxbundle
*.appxupload
*.aps
*.azurePubxml
*.bim_*.settings
*.bim.layout
*.binlog
*.btm.cs
*.btp.cs
*.build.csdef
*.cab
*.cachefile
*.code-workspace
*.coverage
*.coveragexml
*.d
*.dbmdl
*.dbproj.schemaview
*.dll
*.dotCover
*.DotSettings.user
*.dsp
*.dsw
*.dylib
*.e2e
*.exe
*.gch
*.GhostDoc.xml
*.gpState
*.ilk
*.iobj
*.ipdb
*.jfm
*.jmconfig
*.la
*.lai
*.ldf
*.lib
*.lo
*.log
*.mdf
*.meta
*.mm.*
*.mod
*.msi
*.msix
*.msm
*.msp
*.ncb
*.ndf
*.nuget.props
*.nuget.targets
*.nupkg
*.nvuser
*.o
*.obj
*.odx.cs
*.opendb
*.opensdf
*.opt
*.out
*.pch
*.pdb
*.pfx
*.pgc
*.pgd
*.pidb
*.plg
*.psess
*.publishproj
*.publishsettings
*.pubxml
*.pyc
*.rdl.data
*.rptproj.bak
*.rptproj.rsuser
*.rsp
*.rsuser
*.sap
*.sbr
*.scc
*.sdf
*.sln.docstates
*.sln.iml
*.slo
*.smod
*.snupkg
*.so
*.suo
*.svclog
*.tlb
*.tlh
*.tli
*.tlog
*.tmp
*.tmp_proj
*.tss
*.user
*.userosscache
*.userprefs
*.vbp
*.vbw
*.VC.db
*.VC.VC.opendb
*.VisualState.xml
*.vsp
*.vspscc
*.vspx
*.vssscc
*.xsd.cs
**/[Pp]ackages/*
**/*.DesktopClient/GeneratedArtifacts
**/*.DesktopClient/ModelManifest.xml
**/*.HTMLClient/GeneratedArtifacts
**/*.Server/GeneratedArtifacts
**/*.Server/ModelManifest.xml
*~
~$*
$tf/
AppPackages/
artifacts/
ASALocalRun/
AutoTest.Net/
Backup*/
BenchmarkDotNet.Artifacts/
bld/
BundleArtifacts/
ClientBin/
cmake_install.cmake
CMakeCache.txt
CMakeFiles
CMakeLists.txt.user
CMakeScripts
CMakeUserPresets.json
compile_commands.json
coverage*.info
coverage*.json
coverage*.xml
csx/
CTestTestfile.cmake
dlldata.c
DocProject/buildhelp/
DocProject/Help/*.hhc
DocProject/Help/*.hhk
DocProject/Help/*.hhp
DocProject/Help/*.HxC
DocProject/Help/*.HxT
DocProject/Help/html
DocProject/Help/Html2
ecf/
FakesAssemblies/
FodyWeavers.xsd
Generated_Code/
Generated\ Files/
healthchecksdb
install_manifest.txt
ipch/
Makefile
MigrationBackup/
mono_crash.*
nCrunchTemp_*
node_modules/
nunit-*.xml
OpenCover/
orleans.codegen.cs
Package.StoreAssociation.xml
paket-files/
project.fragment.lock.json
project.lock.json
publish/
PublishScripts/
rcf/
ScaffoldingReadMe.txt
ServiceFabricBackup/
StyleCopReport.xml
Testing
TestResult.xml
UpgradeLog*.htm
UpgradeLog*.XML
x64/
x86/
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Distribution / packaging
!dist/.gitkeep

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
.ruff_cache/

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# IDE
.idea/
.vscode/
*.swp
*.swo
*~

# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Project specific
__version__.py
_private

================
File: .pre-commit-config.yaml
================
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.3.4
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format
        args: [--respect-gitignore]
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-toml
      - id: check-added-large-files
      - id: debug-statements
      - id: check-case-conflict
      - id: mixed-line-ending
        args: [--fix=lf]

================
File: cleanup.py
================
LOG_FILE = Path("CLEANUP.txt")
os.chdir(Path(__file__).parent)
def new() -> None:
    if LOG_FILE.exists():
        LOG_FILE.unlink()
def prefix() -> None:
    readme = Path(".cursor/rules/0project.mdc")
    if readme.exists():
        log_message("\n=== PROJECT STATEMENT ===")
        content = readme.read_text()
        log_message(content)
def suffix() -> None:
    todo = Path("TODO.md")
    if todo.exists():
        log_message("\n=== TODO.md ===")
        content = todo.read_text()
def log_message(message: str) -> None:
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    with LOG_FILE.open("a") as f:
        f.write(log_line)
def run_command(cmd: list[str], check: bool = True) -> subprocess.CompletedProcess:
        result = subprocess.run(cmd, check=check, capture_output=True, text=True)
            log_message(result.stdout)
        log_message(f"Command failed: {' '.join(cmd)}")
        log_message(f"Error: {e.stderr}")
        return subprocess.CompletedProcess(cmd, 1, "", str(e))
def check_command_exists(cmd: str) -> bool:
        subprocess.run(["which", cmd], check=True, capture_output=True)
class Cleanup:
    def __init__(self) -> None:
        self.workspace = Path.cwd()
    def _print_header(self, message: str) -> None:
        log_message(f"\n=== {message} ===")
    def _check_required_files(self) -> bool:
            if not (self.workspace / file).exists():
                log_message(f"Error: {file} is missing")
    def _generate_tree(self) -> None:
        if not check_command_exists("tree"):
            log_message("Warning: 'tree' command not found. Skipping tree generation.")
            rules_dir = Path(".cursor/rules")
            rules_dir.mkdir(parents=True, exist_ok=True)
            tree_result = run_command(
            with open(rules_dir / "filetree.mdc", "w") as f:
                f.write("---\ndescription: File tree of the project\nglobs: \n---\n")
                f.write(tree_text)
            log_message("\nProject structure:")
            log_message(tree_text)
            log_message(f"Failed to generate tree: {e}")
    def _git_status(self) -> bool:
        result = run_command(["git", "status", "--porcelain"], check=False)
        return bool(result.stdout.strip())
    def _venv(self) -> None:
        log_message("Setting up virtual environment")
            run_command(["uv", "venv"])
            if venv_path.exists():
                os.environ["VIRTUAL_ENV"] = str(self.workspace / ".venv")
                log_message("Virtual environment created and activated")
                log_message("Virtual environment created but activation failed")
            log_message(f"Failed to create virtual environment: {e}")
    def _install(self) -> None:
        log_message("Installing package with all extras")
            self._venv()
            run_command(["uv", "pip", "install", "-e", ".[test,dev]"])
            log_message("Package installed successfully")
            log_message(f"Failed to install package: {e}")
    def _run_checks(self) -> None:
        log_message("Running code quality checks")
            log_message(">>> Running code fixes...")
            run_command(
            log_message(">>>Running type checks...")
            run_command(["python", "-m", "mypy", "src", "tests"], check=False)
            log_message(">>> Running tests...")
            run_command(["python", "-m", "pytest", "tests"], check=False)
            log_message("All checks completed")
            log_message(f"Failed during checks: {e}")
    def status(self) -> None:
        prefix()  # Add README.md content at start
        self._print_header("Current Status")
        self._check_required_files()
        self._generate_tree()
        result = run_command(["git", "status"], check=False)
        self._print_header("Environment Status")
        self._install()
        self._run_checks()
        suffix()  # Add TODO.md content at end
    def venv(self) -> None:
        self._print_header("Virtual Environment Setup")
    def install(self) -> None:
        self._print_header("Package Installation")
    def update(self) -> None:
        self.status()
        if self._git_status():
            log_message("Changes detected in repository")
                run_command(["git", "add", "."])
                run_command(["git", "commit", "-m", commit_msg])
                log_message("Changes committed successfully")
                log_message(f"Failed to commit changes: {e}")
            log_message("No changes to commit")
    def push(self) -> None:
        self._print_header("Pushing Changes")
            run_command(["git", "push"])
            log_message("Changes pushed successfully")
            log_message(f"Failed to push changes: {e}")
def repomix(
            cmd.append("--compress")
            cmd.append("--remove-empty-lines")
            cmd.append("-i")
            cmd.append(ignore_patterns)
        cmd.extend(["-o", output_file])
        run_command(cmd)
        log_message(f"Repository content mixed into {output_file}")
        log_message(f"Failed to mix repository: {e}")
def print_usage() -> None:
    log_message("Usage:")
    log_message("  cleanup.py status   # Show current status and run all checks")
    log_message("  cleanup.py venv     # Create virtual environment")
    log_message("  cleanup.py install  # Install package with all extras")
    log_message("  cleanup.py update   # Update and commit changes")
    log_message("  cleanup.py push     # Push changes to remote")
def main() -> NoReturn:
    new()  # Clear log file
    if len(sys.argv) < 2:
        print_usage()
        sys.exit(1)
    cleanup = Cleanup()
            cleanup.status()
            cleanup.venv()
            cleanup.install()
            cleanup.update()
            cleanup.push()
        log_message(f"Error: {e}")
    repomix()
    main()

================
File: LICENSE
================
MIT License

Copyright (c) 2025 Adam Twardoch

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

================
File: PROGRESS.md
================
---
this_file: PROGRESS.md
---

## Completed
- [x] Defined common parameters in base `SearchEngine` class
- [x] Updated search engines to use unified parameters
- [x] Added support for multiple search engines (Brave, Tavily, Perplexity, You.com, SerpAPI, Critique, DuckDuckGo)
- [x] Implemented initial version of Bing Scraper engine
- [x] Created basic testing infrastructure

## In Progress
- [ ] Fixing type checking and linting issues identified in cleanup report
- [ ] Completing Bing Scraper implementation and tests
- [ ] Addressing failing test in config environment variable loading

## Upcoming
- [ ] Enhancing test framework with mocks and fixtures
- [ ] Standardizing error handling across all engines
- [ ] Improving documentation and adding comprehensive examples
- [ ] Implementing performance optimizations
- [ ] Adding advanced features (caching, rate limiting, result normalization)

## Known Issues
- Environment variable loading not working correctly in tests
- Missing type annotations in several modules
- Excessive parameter counts in engine initialization methods
- Skipped tests for asynchronous components

See [TODO.md](TODO.md) for the detailed task breakdown and implementation plans.

================
File: pyproject.toml
================
# this_file: twat_search/pyproject.toml

# Build System Configuration
# -------------------------
# Specifies the build system and its requirements for packaging the project
# - hatchling: Modern, extensible build backend for Python projects
# - hatch-vcs: Automatically determines package version from version control system
[build-system]
requires = [
    "hatchling>=1.27.0",     # Core build backend for Hatch, providing modern packaging capabilities
    "hatch-vcs>=0.4.0",      # Plugin to dynamically generate version from Git tags/commits
]
build-backend = "hatchling.build"  # Use Hatchling as the build backend for consistent and flexible builds

# Wheel Distribution Configuration
# --------------------------------
# Controls how the package is built and distributed as a wheel
# Ensures only specific packages are included in the distribution
[tool.hatch.build.targets.wheel]
packages = ["src/twat_search"]  # Only include the src/twat_search directory in the wheel

# Project Metadata Configuration
# ------------------------------
# Comprehensive project description, requirements, and compatibility information
[project]
name = "twat-search"  # Unique package name for PyPI and installation
dynamic = ["version"]  # Version is dynamically determined from version control system
description = "Advanced search utilities and tools for the twat ecosystem"  # Short, descriptive package summary
readme = "README.md"  # Path to the project's README file for package description
requires-python = ">=3.10"  # Minimum Python version required, leveraging modern Python features
license = "MIT"  # Open-source license type
keywords = ["twat", "search", "utilities", "text-search", "indexing"]  # Keywords for package discovery
classifiers = [  # Metadata for package indexes and compatibility
    "Development Status :: 4 - Beta",
    "Programming Language :: Python",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Programming Language :: Python :: Implementation :: CPython",
    "Programming Language :: Python :: Implementation :: PyPy",
]

# Runtime Dependencies
# -------------------
# External packages required for the project to function
dependencies = [
    "twat>=1.8.1", # Core twat package, providing essential functionality
    "pydantic>=2.10.6", # Data validation and settings management
    "pydantic-settings>=2.8.0", # Settings management for Pydantic v2
    "httpx>=0.28.1", # HTTP client for API requests
    "python-dotenv>=1.0.1", # Environment variable management
    "fire>=0.5.0", # Command line interface generator
    "rich>=13.6.0", # Rich text and formatting for terminal output
]

# Project Authors
# ---------------
[[project.authors]]
name = "Adam Twardoch"  # Primary author's name
email = "adam+github@twardoch.com"  # Contact email for the author

# Project URLs
# ------------
# Links to project resources for documentation, issues, and source code
[project.urls]
Documentation = "https://github.com/twardoch/twat-search#readme"
Issues = "https://github.com/twardoch/twat-search/issues"
Source = "https://github.com/twardoch/twat-search"

# Twat Plugin Registration
# -----------------------
# Registers this package as a plugin for the twat ecosystem
[project.entry-points."twat.plugins"]
search = "twat_search"  # Plugin name and module for search utilities

# Version Management
# -----------------
# Configures automatic version generation from version control system
[tool.hatch.version]
source = "vcs"  # Use version control system (Git) to determine version

# Version Scheme
# --------------
# Defines how versions are generated and incremented
[tool.hatch.version.raw-options]
version_scheme = "post-release"  # Generates version numbers based on Git tags

# Version File Generation
# ----------------------
# Automatically creates a version file in the package
[tool.hatch.build.hooks.vcs]
version-file = "src/twat_search/__version__.py"

# Default development environment configuration
[tool.hatch.envs.default]
dependencies = [
    "pytest",                # Testing framework
    "pytest-cov",           # Coverage reporting
    "mypy>=1.15.0",         # Static type checker
    "ruff>=0.9.6",          # Fast Python linter
]

# Scripts available in the default environment
[tool.hatch.envs.default.scripts]
test = "pytest {args:tests}"
test-cov = "pytest --cov-report=term-missing --cov-config=pyproject.toml --cov=src/twat_search --cov=tests {args:tests}"
type-check = "mypy src/twat_search tests"
lint = ["ruff check src/twat_search tests", "ruff format src/twat_search tests"]

# Python version matrix for testing
[[tool.hatch.envs.all.matrix]]
python = ["3.10", "3.11", "3.12"]

# Linting environment configuration
[tool.hatch.envs.lint]
detached = true  # Run in isolated environment
dependencies = [
    "mypy>=1.15.0",         # Static type checker
    "ruff>=0.9.6",          # Fast Python linter
]

# Linting environment scripts
[tool.hatch.envs.lint.scripts]
typing = "mypy --install-types --non-interactive {args:src/twat_search tests}"
style = ["ruff check {args:.}", "ruff format {args:.}"]
fmt = ["ruff format {args:.}", "ruff check --fix {args:.}"]
all = ["style", "typing"]

# Ruff (linter) configuration
[tool.ruff]
target-version = "py310"
line-length = 88

# Ruff lint rules configuration
[tool.ruff.lint]
extend-select = [
    "A",     # flake8-builtins
    "ARG",   # flake8-unused-arguments
    "B",     # flake8-bugbear
    "C",     # flake8-comprehensions
    "DTZ",   # flake8-datetimez
    "E",     # pycodestyle errors
    "EM",    # flake8-errmsg
    "F",     # pyflakes
    "FBT",   # flake8-boolean-trap
    "I",     # isort
    "ICN",   # flake8-import-conventions
    "ISC",   # flake8-implicit-str-concat
    "N",     # pep8-naming
    "PLC",   # pylint convention
    "PLE",   # pylint error
    "PLR",   # pylint refactor
    "PLW",   # pylint warning
    "Q",     # flake8-quotes
    "RUF",   # Ruff-specific rules
    "S",     # flake8-bandit
    "T",     # flake8-debugger
    "TID",   # flake8-tidy-imports
    "UP",    # pyupgrade
    "W",     # pycodestyle warnings
    "YTT",   # flake8-2020
]
ignore = [
    "ARG001", # Unused function argument
    "E501",   # Line too long
    "I001",   # Import block formatting
]

# File-specific Ruff configurations
[tool.ruff.per-file-ignores]
"tests/*" = ["S101"]  # Allow assert in tests

# MyPy (type checker) configuration
[tool.mypy]
python_version = "3.10"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
check_untyped_defs = true
disallow_untyped_decorators = true
no_implicit_optional = true
warn_redundant_casts = true
warn_unused_ignores = true
warn_no_return = true
warn_unreachable = true

# Coverage.py configuration for test coverage
[tool.coverage.run]
source_pkgs = ["twat_search", "tests"]
branch = true
parallel = true
omit = [
    "src/twat_search/__about__.py",
]

# Coverage path mappings
[tool.coverage.paths]
twat_search = ["src/twat_search", "*/twat-search/src/twat_search"]
tests = ["tests", "*/twat-search/tests"]

# Coverage report configuration
[tool.coverage.report]
exclude_lines = [
    "no cov",
    "if __name__ == .__main__.:",
    "if TYPE_CHECKING:",
]

# Optional dependencies
[project.optional-dependencies]
test = [
    "pytest>=8.3.4",
    "pytest-cov>=6.0.0",
    "pytest-xdist>=3.6.1",                # For parallel test execution
    "pytest-benchmark[histogram]>=5.1.0", 
    "pytest-asyncio>=0.25.3", # For async test execution
]

dev = [
    "pre-commit>=4.1.0",     # Git pre-commit hooks
    "ruff>=0.9.6",           # Fast Python linter
    "mypy>=1.15.0",          # Static type checker
]

# Search engine dependencies
brave = []  # Brave Search uses only core dependencies

duckduckgo = [
    "duckduckgo-search>=7.3.0",  # DuckDuckGo search API
]

bing_scraper = [
    "scrape-bing>=0.1.2.1",  # Bing scraper
]

tavily = [
    "tavily-python>=0.5.0",  # Tavily search API
]

pplx = [
]

serpapi = [
    "serpapi>=0.1.5",  # SerpAPI search API
]

hasdata = []  # HasData API uses only core dependencies




# Complete installation with all dependencies
all = [
    "twat>=1.8.1",
    "duckduckgo-search>=7.3.0",
    "scrape-bing>=0.1.2.1",
    "tavily-python>=0.5.0",
    "serpapi>=0.1.5",
    # HasData uses core dependencies
]

# Test environment configuration
[tool.hatch.envs.test]
dependencies = [".[test]"]

# Test environment scripts
[tool.hatch.envs.test.scripts]
test = "python -m pytest -n auto {args:tests}"
test-cov = "python -m pytest -n auto --cov-report=term-missing --cov-config=pyproject.toml --cov=src/twat_search --cov=tests {args:tests}"
bench = "python -m pytest -v -p no:briefcase tests/test_benchmark.py --benchmark-only"
bench-save = "python -m pytest -v -p no:briefcase tests/test_benchmark.py --benchmark-only --benchmark-json=benchmark/results.json"

# Pytest configuration
[tool.pytest.ini_options]
markers = ["benchmark: marks tests as benchmarks (select with '-m benchmark')"]
addopts = "-v -p no:briefcase"
testpaths = ["tests"]
python_files = ["test_*.py"]
filterwarnings = ["ignore::DeprecationWarning", "ignore::UserWarning"]
asyncio_mode = "auto"

# Pytest-benchmark configuration
[tool.pytest-benchmark]
min_rounds = 100
min_time = 0.1
histogram = true
storage = "file"
save-data = true
compare = [
    "min",    # Minimum time
    "max",    # Maximum time
    "mean",   # Mean time
    "stddev", # Standard deviation
    "median", # Median time
    "iqr",    # Inter-quartile range
    "ops",    # Operations per second
    "rounds", # Number of rounds
]

# Console Scripts
# --------------
# Command line interfaces exposed by this package
[project.scripts]
twat-search = "twat_search.__main__:main"
twat-search-web = "twat_search.web.cli:main"

================
File: README.md
================
# twat-search

A powerful Python web search aggregator that combines results from multiple search engines.

## Features

- **Multi-Engine Search**: Unified interface for searching across multiple providers:
  - Brave Search
  - Google Search 
  - Tavily Research
  - Perplexity
  - You.com
  - Bing (via web scraping)
- **Async Support**: Concurrent searches across engines
- **Rate Limiting**: Built-in rate limiting per search engine
- **Type Safety**: Full type annotations and runtime validation
- **Error Handling**: Robust error handling and fallbacks
- **Configuration**: Flexible configuration via environment variables or code

## Installation

```bash
pip install twat-search
```

## Quick Start

```python
from twat_search.web import WebSearch

# Initialize with default configuration
search = WebSearch()

# Search across all configured engines
results = await search.q("Python async programming")

# Print results
for result in results:
  print(f"{result.title} ({result.source})")
  print(f"URL: {result.url}")
  print(f"Snippet: {result.snippet}\n")
```

## Using Specific Search Engines

### Bing Scraper

The Bing Scraper engine uses web scraping to retrieve search results from Bing without requiring an API key:

```python
from twat_search.web.engines.bing_scraper import bing_scraper

# Search using Bing Scraper
results = await bing_scraper(
    "Python asyncio tutorial", 
    num_results=10,                  # Number of results to return
    max_retries=3,                   # Max retry attempts for failed requests
    delay_between_requests=1.0       # Delay between scraping requests
)

# Print results
for result in results:
    print(f"{result.title}")
    print(f"URL: {result.url}")
    print(f"Snippet: {result.snippet}\n")
```

## Configuration

Configure search engines via environment variables:

```bash
# API Keys
BRAVE_API_KEY=...
GOOGLE_API_KEY=...
TAVILY_API_KEY=...
PERPLEXITY_API_KEY=...
YOU_API_KEY=...

# Engine-specific settings
BRAVE_ENABLED=true
GOOGLE_ENABLED=true
TAVILY_ENABLED=true
PERPLEXITY_ENABLED=true
YOU_ENABLED=true
BING_SCRAPER_ENABLED=true

# Bing Scraper specific settings
BING_SCRAPER_DEFAULT_PARAMS='{"max_retries": 3, "delay_between_requests": 1.0}'
```

Or programmatically:

```python
from twat_search.web import WebSearch, Config

config = Config(
    brave_api_key="...",
    google_enabled=True,
    tavily_enabled=False,
    bing_scraper_enabled=True,
    bing_scraper_default_params={"max_retries": 5, "delay_between_requests": 2.0}
)

search = WebSearch(config)
```

## Response Format

Search results are returned as `SearchResult` objects:

```python
@dataclass
class SearchResult:
    title: str           # Result title
    url: HttpUrl        # Validated URL
    snippet: str        # Text snippet/description  
    source: str         # Source search engine
```

## Error Handling

The package provides custom exception classes:

- `SearchError`: Base exception class
- `EngineError`: Engine-specific errors
- `ConfigError`: Configuration errors

## Development Status

Version: 1.8.1

See [TODO.md](TODO.md) for planned improvements and feature roadmap.

## Contributing

Contributions welcome! Please check [TODO.md](TODO.md) for areas that need work.

## License

MIT License - See LICENSE file for details.

================
File: TODO.md
================
# twat-search Web Package - Future Tasks

The basic implementation of the `twat-search` web package is complete.

Tip: Periodically run `./cleanup.py status` to see results of lints and tests.

## 1. Phase 1

### 1.1. Complete Bing Scraper Implementation

- [ ] Fix implementation issues in bing_scraper.py
  - Add proper type annotations to all methods
  - Implement better error handling with appropriate context
- [ ] Complete test coverage for BingScraperSearchEngine
  - Fix skipped tests in test_bing_scraper.py
  - Add tests for error conditions and edge cases
- [ ] Document Bing Scraper functionality
  - Add comprehensive docstrings
  - Include usage examples in README


### 1.2. Documentation and Examples

- [ ] Add comprehensive docstrings to all classes and methods
  - Include parameter descriptions
  - Document exceptions that can be raised
  - Add usage examples
- [ ] Create detailed README examples
  - Basic usage examples for each engine
  - Advanced configuration examples
  - Error handling examples
- [ ] Document environment variable configuration
  - Create a comprehensive list of all supported environment variables
  - Add examples of .env file configuration


## 2. Phase 2

### 2.1. Type Checking Errors

- [ ] Fix missing type stubs for third-party modules
  - `duckduckgo_search` and `scrape_bing` are showing import-not-found errors
  - Create local stub files or install type stubs if available
- [ ] Add type annotations to functions missing them
  - Particularly in bing_scraper.py, need to add annotations to search methods
- [ ] Fix attribute errors in You.py engine
  - "YouBaseEngine" has no attribute errors for num_results_param and base_url
- [ ] Resolve incompatible types in engine assignments in **init**.py
- [ ] Fix the test_config_with_env_vars failure (api_key not being set correctly)

### 2.2. Linting Issues

- [ ] Address boolean parameter warnings (FBT001, FBT002)
  - Consider using keyword-only arguments for boolean parameters
  - Or create specific enum types for boolean options
- [ ] Fix functions with too many parameters (PLR0913)
  - Refactor using parameter objects or configuration objects
  - Consider breaking down complex functions
- [ ] Resolve magic values in code (PLR2004)
  - Replace hardcoded numbers like 100, 5, 10 with named constants
- [ ] Clean up unused imports (F401)
  - Remove or properly use imported modules

### 2.3. Improve Test Framework

- [ ] Implement mock search engines for all providers
  - Create standardized mock responses
  - Enable offline testing without API keys
- [ ] Add integration tests
  - Test the entire search workflow
  - Test concurrent searches across multiple engines
- [ ] Create test fixtures for common configurations
  - Standard API response data
  - Common error cases
- [ ] Fix test_config_with_env_vars failure
  - Debug why environment variables aren't being properly loaded

## 3. Phase 3

### 3.1. Enhance Engine Implementations

- [ ] Standardize error handling across all engines
  - Use consistent error context and messages
  - Properly propagate exceptions with 'from exc'
- [ ] Optimize parameter handling in engines
  - Reduce code duplication in parameter mapping
  - Create utility functions for common parameter conversions
- [ ] Add timeouts to all HTTP requests
  - Ensure all engines have consistent timeout handling
  - Add configurable timeout parameters

## 4. Phase 4

### 4.1. Additional Features

- [ ] Add result caching mechanism
  - Implement optional caching of search results
  - Add configurable cache expiration
- [ ] Implement rate limiting for all engines
  - Ensure all engines respect API rate limits
  - Add configurable backoff strategies
- [ ] Add result normalization
  - Create a more consistent result format across engines
  - Implement result scoring and ranking

### 4.2. Performance Improvements

- [ ] Profile search performance across engines
  - Measure latency and throughput
  - Identify performance bottlenecks
- [ ] Implement connection pooling for HTTP clients
  - Reuse connections where possible
  - Configure appropriate connection limits
- [ ] Add parallelization options for multiple searches
  - Control concurrency limits
  - Implement proper resource cleanup

================
File: VERSION.txt
================
v1.8.1



================================================================
End of Codebase
================================================================
