This file is a merged representation of a subset of the codebase, containing files not matching ignore patterns, combined into a single document by Repomix. The content has been processed where empty lines have been removed.

================================================================
File Summary
================================================================

Purpose:
--------
This file contains a packed representation of the entire repository's contents.
It is designed to be easily consumable by AI systems for analysis, code review,
or other automated processes.

File Format:
------------
The content is organized as follows:
1. This summary section
2. Repository information
3. Directory structure
4. Multiple file entries, each consisting of:
  a. A separator line (================)
  b. The file path (File: path/to/file)
  c. Another separator line
  d. The full contents of the file
  e. A blank line

Usage Guidelines:
-----------------
- This file should be treated as read-only. Any changes should be made to the
  original repository files, not this packed version.
- When processing this file, use the file path to distinguish
  between different files in the repository.
- Be aware that this file may contain sensitive information. Handle it with
  the same level of security as you would the original repository.

Notes:
------
- Some files may have been excluded based on .gitignore rules and Repomix's configuration
- Binary files are not included in this packed representation. Please refer to the Repository Structure section for a complete list of file paths, including binary files
- Files matching these patterns are excluded: .specstory/**/*.md, .venv/**, _private/**, CLEANUP.txt, **/*.json, *.lock
- Files matching patterns in .gitignore are excluded
- Files matching default ignore patterns are excluded
- Empty lines have been removed from all files

Additional Info:
----------------

================================================================
Directory Structure
================================================================
.cursor/
  rules/
    0project.mdc
    cleanup.mdc
    filetree.mdc
.github/
  workflows/
    push.yml
    release.yml
src/
  twat_search/
    web/
      engines/
        __init__.py
        anywebsearch.py
        base.py
        bing_scraper.py
        brave.py
        critique.py
        duckduckgo.py
        google_scraper.py
        hasdata.py
        pplx.py
        searchit.py
        serpapi.py
        tavily.py
        you.py
      __init__.py
      api.py
      cli.py
      config.py
      exceptions.py
      models.py
      utils.py
    __init__.py
    __main__.py
tests/
  unit/
    web/
      engines/
        __init__.py
        test_base.py
      __init__.py
      test_api.py
      test_config.py
      test_exceptions.py
      test_models.py
      test_utils.py
    __init__.py
    mock_engine.py
  web/
    test_bing_scraper.py
  conftest.py
  test_twat_search.py
.gitignore
.pre-commit-config.yaml
cleanup.py
LICENSE
PROGRESS.md
pyproject.toml
README.md
TODO.md
VERSION.txt

================================================================
Files
================================================================

================
File: .cursor/rules/0project.mdc
================
---
description: About this project
globs: 
alwaysApply: false
---
# About this project

`twat-search` is a multi-provider search 

## Development Notes
- Uses `uv` for Python package management
- Quality tools: ruff, mypy, pytest
- Clear provider protocol for adding new search backends
- Strong typing and runtime checks throughout

================
File: .cursor/rules/cleanup.mdc
================
---
description: Run `cleanup.py` script before and after changes
globs: 
alwaysApply: false
---
Before you do any changes or if I say "cleanup", run the `cleanup.py update` script in the main folder. Analyze the results, describe recent changes in [PROGRESS.md](mdc:PROGRESS.md) and edit @TODO.md to update priorities and plan next changes. PERFORM THE CHANGES, then run the `cleanup.py status` script and react to the results.

When you edit @TODO.md, lead in lines with empty GFM checkboxes if things aren't done (`- [ ] `) vs. filled (`- [x] `) if done.

================
File: .cursor/rules/filetree.mdc
================
---
description: File tree of the project
globs: 
---
[ 928]  .
├── [  64]  .benchmarks
├── [  96]  .cursor
│   └── [ 192]  rules
│       ├── [ 334]  0project.mdc
│       ├── [ 558]  cleanup.mdc
│       └── [4.6K]  filetree.mdc
├── [  96]  .github
│   └── [ 128]  workflows
│       ├── [2.7K]  push.yml
│       └── [1.4K]  release.yml
├── [3.5K]  .gitignore
├── [ 532]  .pre-commit-config.yaml
├── [  96]  .specstory
│   └── [ 800]  history
│       ├── [2.0K]  .what-is-this.md
│       ├── [ 52K]  2025-02-25_01-58-creating-and-tracking-project-tasks.md
│       ├── [7.4K]  2025-02-25_02-17-project-task-continuation-and-progress-update.md
│       ├── [ 11K]  2025-02-25_02-24-planning-tests-for-twat-search-web-package.md
│       ├── [196K]  2025-02-25_02-27-implementing-tests-for-twat-search-package.md
│       ├── [ 46K]  2025-02-25_02-58-transforming-python-script-into-cli-tool.md
│       ├── [ 93K]  2025-02-25_03-09-generating-a-name-for-the-chat.md
│       ├── [5.5K]  2025-02-25_03-33-untitled.md
│       ├── [ 57K]  2025-02-25_03-54-integrating-search-engines-into-twat-search.md
│       ├── [ 72K]  2025-02-25_04-05-consolidating-you-py-and-youcom-py.md
│       ├── [6.1K]  2025-02-25_04-13-missing-env-api-key-names-in-pplx-py.md
│       ├── [118K]  2025-02-25_04-16-implementing-functions-for-brave-search-engines.md
│       ├── [286K]  2025-02-25_04-48-unifying-search-engine-parameters-in-twat-search.md
│       ├── [ 83K]  2025-02-25_05-36-implementing-duckduckgo-search-engine.md
│       ├── [194K]  2025-02-25_05-43-implementing-the-webscout-search-engine.md
│       ├── [ 23K]  2025-02-25_06-07-implementing-bing-scraper-engine.md
│       ├── [ 15K]  2025-02-25_06-12-continuing-bing-scraper-engine-implementation.md
│       ├── [121K]  2025-02-25_06-34-implementing-safe-import-patterns-in-modules.md
│       ├── [9.9K]  2025-02-25_07-09-refactoring-plan-and-progress-update.md
│       ├── [ 40K]  2025-02-25_07-17-implementing-phase-1-from-todo-md.md
│       ├── [292K]  2025-02-25_07-34-integrating-hasdata-google-serp-apis.md
│       ├── [142K]  2025-02-25_08-19-implementing-search-engines-from-nextengines-md.md
│       └── [ 34K]  2025-02-26_09-54-implementing-plain-option-for-search-commands.md
├── [ 499]  CLEANUP.txt
├── [1.0K]  LICENSE
├── [ 64K]  NEXTENGINES.md
├── [1.2K]  PROGRESS.md
├── [ 21K]  README.md
├── [4.1K]  TODO.md
├── [   7]  VERSION.txt
├── [ 12K]  cleanup.py
├── [ 192]  dist
├── [9.8K]  pyproject.toml
├── [ 128]  src
│   └── [ 256]  twat_search
│       ├── [ 556]  __init__.py
│       ├── [2.0K]  __main__.py
│       └── [ 384]  web
│           ├── [1.6K]  __init__.py
│           ├── [4.8K]  api.py
│           ├── [ 43K]  cli.py
│           ├── [4.3K]  config.py
│           ├── [ 576]  engines
│           │   ├── [4.9K]  __init__.py
│           │   ├── [ 24K]  anywebsearch.py
│           │   ├── [3.7K]  base.py
│           │   ├── [ 11K]  bing_scraper.py
│           │   ├── [7.6K]  brave.py
│           │   ├── [8.2K]  critique.py
│           │   ├── [6.7K]  duckduckgo.py
│           │   ├── [ 12K]  google_scraper.py
│           │   ├── [7.1K]  hasdata.py
│           │   ├── [4.9K]  pplx.py
│           │   ├── [ 25K]  searchit.py
│           │   ├── [6.9K]  serpapi.py
│           │   ├── [7.4K]  tavily.py
│           │   └── [7.3K]  you.py
│           ├── [1.0K]  exceptions.py
│           ├── [1.3K]  models.py
│           └── [1.5K]  utils.py
├── [ 256]  tests
│   ├── [  64]  .benchmarks
│   ├── [2.0K]  conftest.py
│   ├── [ 157]  test_twat_search.py
│   ├── [ 192]  unit
│   │   ├── [  42]  __init__.py
│   │   ├── [1.5K]  mock_engine.py
│   │   └── [ 320]  web
│   │       ├── [  46]  __init__.py
│   │       ├── [ 160]  engines
│   │       │   ├── [  37]  __init__.py
│   │       │   └── [4.3K]  test_base.py
│   │       ├── [5.1K]  test_api.py
│   │       ├── [2.7K]  test_config.py
│   │       ├── [2.0K]  test_exceptions.py
│   │       ├── [4.5K]  test_models.py
│   │       └── [3.5K]  test_utils.py
│   └── [ 160]  web
│       └── [ 10K]  test_bing_scraper.py
└── [ 89K]  twat_search.txt

19 directories, 76 files

================
File: .github/workflows/push.yml
================
name: Build & Test
on:
  push:
    branches: [main]
    tags-ignore: ["v*"]
  pull_request:
    branches: [main]
  workflow_dispatch:
permissions:
  contents: write
  id-token: write
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true
jobs:
  quality:
    name: Code Quality
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Run Ruff lint
        uses: astral-sh/ruff-action@v3
        with:
          version: "latest"
          args: "check --output-format=github"
      - name: Run Ruff Format
        uses: astral-sh/ruff-action@v3
        with:
          version: "latest"
          args: "format --check --respect-gitignore"
  test:
    name: Run Tests
    needs: quality
    strategy:
      matrix:
        python-version: ["3.10", "3.11", "3.12"]
        os: [ubuntu-latest]
      fail-fast: true
    runs-on: ${{ matrix.os }}
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install UV
        uses: astral-sh/setup-uv@v5
        with:
          version: "latest"
          python-version: ${{ matrix.python-version }}
          enable-cache: true
          cache-suffix: ${{ matrix.os }}-${{ matrix.python-version }}
      - name: Install test dependencies
        run: |
          uv pip install --system --upgrade pip
          uv pip install --system ".[test]"
      - name: Run tests with Pytest
        run: uv run pytest -n auto --maxfail=1 --disable-warnings --cov-report=xml --cov-config=pyproject.toml --cov=src/twat_search --cov=tests tests/
      - name: Upload coverage report
        uses: actions/upload-artifact@v4
        with:
          name: coverage-${{ matrix.python-version }}-${{ matrix.os }}
          path: coverage.xml
  build:
    name: Build Distribution
    needs: test
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - name: Install UV
        uses: astral-sh/setup-uv@v5
        with:
          version: "latest"
          python-version: "3.12"
          enable-cache: true
      - name: Install build tools
        run: uv pip install build hatchling hatch-vcs
      - name: Build distributions
        run: uv run python -m build --outdir dist
      - name: Upload distribution artifacts
        uses: actions/upload-artifact@v4
        with:
          name: dist-files
          path: dist/
          retention-days: 5

================
File: .github/workflows/release.yml
================
name: Release
on:
  push:
    tags: ["v*"]
permissions:
  contents: write
  id-token: write
jobs:
  release:
    name: Release to PyPI
    runs-on: ubuntu-latest
    environment:
      name: pypi
      url: https://pypi.org/p/twat-search
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - name: Install UV
        uses: astral-sh/setup-uv@v5
        with:
          version: "latest"
          python-version: "3.12"
          enable-cache: true
      - name: Install build tools
        run: uv pip install build hatchling hatch-vcs
      - name: Build distributions
        run: uv run python -m build --outdir dist
      - name: Verify distribution files
        run: |
          ls -la dist/
          test -n "$(find dist -name '*.whl')" || (echo "Wheel file missing" && exit 1)
          test -n "$(find dist -name '*.tar.gz')" || (echo "Source distribution missing" && exit 1)
      - name: Publish to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          password: ${{ secrets.PYPI_TOKEN }}
      - name: Create GitHub Release
        uses: softprops/action-gh-release@v1
        with:
          files: dist/*
          generate_release_notes: true
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

================
File: src/twat_search/web/engines/__init__.py
================
logger = logging.getLogger(__name__)
    __all__.extend(
    __all__.extend(["SerpApiSearchEngine", "serpapi"])
    __all__.extend(["TavilySearchEngine", "tavily"])
    __all__.extend(["PerplexitySearchEngine", "pplx"])
    __all__.extend(["YouNewsSearchEngine", "YouSearchEngine", "you", "you_news"])
    __all__.extend(["CritiqueSearchEngine", "critique"])
    __all__.extend(["DuckDuckGoSearchEngine", "duckduckgo"])
    __all__.extend(["BingScraperSearchEngine", "bing_scraper"])
    logger.debug("Imported google_scraper module")
    logger.warning(f"Failed to import google_scraper module: {e}")
    logger.debug("Imported searchit module")
    logger.warning(f"Failed to import searchit module: {e}")
    logger.debug("Imported anywebsearch module")
    logger.warning(f"Failed to import anywebsearch module: {e}")
def is_engine_available(engine_name: str) -> bool:
def get_engine_function(
    return available_engine_functions.get(engine_name)
def get_available_engines() -> list[str]:
    return list(available_engine_functions.keys())

================
File: src/twat_search/web/engines/anywebsearch.py
================
    class AnySearchResult:  # type: ignore
        def __init__(self, title: str, description: str, url: str):
    class Settings:  # type: ignore
        def __init__(
    def multi_search(query: str, settings: Settings) -> list[AnySearchResult]:  # type: ignore
logger = logging.getLogger(__name__)
class AnyWebSearchResult(BaseModel):
    @field_validator("title", "description")
    def validate_non_empty(cls, v: str) -> str:
class AnyWebSearchEngine(SearchEngine):
        super().__init__(config, **kwargs)
        self.max_results: int = num_results or self.config.default_params.get(
        self.language: str = language or self.config.default_params.get(
        self.brave_key: str | None = kwargs.get(
        ) or self.config.default_params.get("brave_key", None)
        self.ya_key: str | None = kwargs.get(
        ) or self.config.default_params.get("ya_key", None)
        self.ya_fldid: str | None = kwargs.get(
        ) or self.config.default_params.get("ya_fldid", None)
    def _convert_result(self, result: AnySearchResult) -> SearchResult | None:
            logger.warning(f"Empty result received from {self.name}")
            validated = AnyWebSearchResult(
                if hasattr(result, "description")
            return SearchResult(
                    "url": str(result.url),
            logger.warning(f"Validation error for result: {exc}")
            logger.warning(f"Unexpected error converting result: {exc}")
    def _create_settings(self) -> Settings:
        return Settings(
class GoogleAnyWebSearchEngine(AnyWebSearchEngine):
    async def search(self, query: str) -> list[SearchResult]:
            raise EngineError(self.name, "Search query cannot be empty")
        logger.info(f"Searching Google (anywebsearch) with query: '{query}'")
        logger.debug(f"Using max_results={self.max_results}, language={self.language}")
            settings = self._create_settings()
            raw_results = multi_search(query=query, settings=settings)
                logger.info("No results returned from Google (anywebsearch)")
            if isinstance(raw_results[0], AnySearchResult):
            logger.debug(
                f"Received {len(raw_results_list)} raw results from Google (anywebsearch)"
                    self._convert_result(result) for result in raw_results_list
            logger.info(
                f"Returning {len(results)} validated results from Google (anywebsearch)"
            logger.error(error_msg)
            raise EngineError(self.name, error_msg) from exc
class BingAnyWebSearchEngine(AnyWebSearchEngine):
        logger.info(f"Searching Bing (anywebsearch) with query: '{query}'")
                logger.info("No results returned from Bing (anywebsearch)")
                f"Received {len(raw_results_list)} raw results from Bing (anywebsearch)"
                f"Returning {len(results)} validated results from Bing (anywebsearch)"
class BraveAnyWebSearchEngine(AnyWebSearchEngine):
            raise EngineError(self.name, "Brave API key is required for Brave search")
        logger.info(f"Searching Brave (anywebsearch) with query: '{query}'")
                logger.info("No results returned from Brave (anywebsearch)")
                f"Received {len(raw_results_list)} raw results from Brave (anywebsearch)"
                f"Returning {len(results)} validated results from Brave (anywebsearch)"
class QwantAnyWebSearchEngine(AnyWebSearchEngine):
        logger.info(f"Searching Qwant (anywebsearch) with query: '{query}'")
                logger.info("No results returned from Qwant (anywebsearch)")
                f"Received {len(raw_results_list)} raw results from Qwant (anywebsearch)"
                f"Returning {len(results)} validated results from Qwant (anywebsearch)"
class YandexAnyWebSearchEngine(AnyWebSearchEngine):
            raise EngineError(
        logger.info(f"Searching Yandex (anywebsearch) with query: '{query}'")
                logger.info("No results returned from Yandex (anywebsearch)")
                f"Received {len(raw_results_list)} raw results from Yandex (anywebsearch)"
                f"Returning {len(results)} validated results from Yandex (anywebsearch)"
async def google_anyws(
    return await search(
async def bing_anyws(
async def brave_anyws(
async def qwant_anyws(
async def yandex_anyws(

================
File: src/twat_search/web/engines/base.py
================
class SearchEngine(abc.ABC):
    def __init__(self, config: EngineConfig, **kwargs: Any) -> None:
        self.num_results = kwargs.get("num_results", 5)
        self.country = kwargs.get("country", None)
        self.language = kwargs.get("language", None)
        self.safe_search = kwargs.get("safe_search", True)
        self.time_frame = kwargs.get("time_frame", None)
            raise SearchError(msg)
    async def search(self, query: str) -> list[SearchResult]:
def register_engine(engine_class: type[SearchEngine]) -> type[SearchEngine]:
    if not hasattr(engine_class, "env_api_key_names"):
        engine_class.env_api_key_names = [f"{engine_class.name.upper()}_API_KEY"]
    if not hasattr(engine_class, "env_enabled_names"):
        engine_class.env_enabled_names = [f"{engine_class.name.upper()}_ENABLED"]
    if not hasattr(engine_class, "env_params_names"):
        engine_class.env_params_names = [f"{engine_class.name.upper()}_DEFAULT_PARAMS"]
def get_engine(engine_name: str, config: EngineConfig, **kwargs: Any) -> SearchEngine:
    engine_class = _engine_registry.get(engine_name)
    return engine_class(config, **kwargs)
def get_registered_engines() -> dict[str, type[SearchEngine]]:
    return _engine_registry.copy()

================
File: src/twat_search/web/engines/bing_scraper.py
================
    class BingScraper:  # type: ignore
        def __init__(
        def search(self, query: str, num_results: int = 10) -> list[Any]:
logger = logging.getLogger(__name__)
class BingScraperResult(BaseModel):
class BingScraperSearchEngine(SearchEngine):
        super().__init__(config, **kwargs)
        self.max_results: int = num_results or self.config.default_params.get(
        self.max_retries: int = kwargs.get(
        ) or self.config.default_params.get("max_retries", 3)
        self.delay_between_requests: float = kwargs.get(
        ) or self.config.default_params.get("delay_between_requests", 1.0)
            unused_params.append(f"country='{country}'")
            unused_params.append(f"language='{language}'")
            unused_params.append(f"safe_search={safe_search}")
            unused_params.append(f"time_frame='{time_frame}'")
            logger.debug(
                f"Parameters {', '.join(unused_params)} set but not used by Bing Scraper"
    def _convert_result(self, result: Any) -> SearchResult | None:
            logger.warning("Empty result received from Bing Scraper")
        if not hasattr(result, "title") or not hasattr(result, "url"):
            logger.warning(f"Invalid result format: {result}")
            validated = BingScraperResult(
                if hasattr(result, "description")
            return SearchResult(
                    "url": str(result.url),
            logger.warning(f"Validation error for result: {exc}")
            logger.warning(f"Unexpected error converting result: {exc}")
    async def search(self, query: str) -> list[SearchResult]:
            raise EngineError(self.name, "Search query cannot be empty")
        logger.info(f"Searching Bing with query: '{query}'")
            scraper = BingScraper(
            raw_results = scraper.search(query, num_results=self.max_results)
                logger.info("No results returned from Bing Scraper")
            logger.debug(f"Received {len(raw_results)} raw results from Bing Scraper")
            logger.error(error_msg)
            raise EngineError(self.name, error_msg) from exc
                self._convert_result(result) for result in raw_results
        logger.info(f"Returning {len(results)} validated results from Bing Scraper")
async def bing_scraper(
    return await search(

================
File: src/twat_search/web/engines/brave.py
================
class BraveResult(BaseModel):
class BraveNewsResult(BaseModel):
class BaseBraveEngine(SearchEngine):
    def __init__(
        super().__init__(config)
        count = kwargs.get("count", num_results)
        self.count = count or self.config.default_params.get("count", 10)
            or kwargs.get("country")
            or self.config.default_params.get("country", None)
        search_lang = kwargs.get("search_lang", language)
        self.search_lang = search_lang or self.config.default_params.get(
        ui_lang = kwargs.get("ui_lang", language)
        self.ui_lang = ui_lang or self.config.default_params.get("ui_lang", None)
        safe = kwargs.get("safe_search", safe_search)
        if isinstance(safe, bool):
        self.safe_search = safe or self.config.default_params.get("safe_search", None)
        freshness = kwargs.get("freshness", time_frame)
        self.freshness = freshness or self.config.default_params.get("freshness", None)
            raise EngineError(
                f"Brave API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
    async def search(self, query: str) -> list[SearchResult]:
        async with httpx.AsyncClient() as client:
                response = await client.get(
                response.raise_for_status()
                data = response.json()
                section = data.get(self.response_key, {})
                if section.get("results"):
                            parsed = self.result_model(**result)
                            results.append(self.convert_result(parsed, result))
                raise EngineError(self.name, f"HTTP Request failed: {exc}") from exc
                raise EngineError(self.name, f"Response parsing error: {exc}") from exc
    def convert_result(self, parsed: BaseModel, raw: dict[str, Any]) -> SearchResult:
            publisher = getattr(parsed, "publisher", None)
            published_time = getattr(parsed, "published_time", None)
        return SearchResult(
class BraveSearchEngine(BaseBraveEngine):
class BraveNewsSearchEngine(BaseBraveEngine):
async def brave(
    config = EngineConfig(api_key=api_key, enabled=True)
    engine = BraveSearchEngine(
    return await engine.search(query)
async def brave_news(
    engine = BraveNewsSearchEngine(

================
File: src/twat_search/web/engines/critique.py
================
class CritiqueResult(BaseModel):
    url: str = Field(default="")  # URL of the result source
    title: str = Field(default="")  # Title of the result
    summary: str = Field(default="")  # Summary or snippet from the result
    source: str = Field(default="")  # Source of the result
class CritiqueResponse(BaseModel):
    results: list[CritiqueResult] = Field(default_factory=list)
class CritiqueSearchEngine(SearchEngine):
    def __init__(
        super().__init__(config)
        self.image_url = image_url or kwargs.get("image_url")
        self.image_base64 = image_base64 or kwargs.get("image_base64")
        self.source_whitelist = source_whitelist or kwargs.get("source_whitelist")
        self.source_blacklist = source_blacklist or kwargs.get("source_blacklist")
        self.output_format = output_format or kwargs.get("output_format")
            raise EngineError(
                f"Critique Labs API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
    async def _convert_image_url_to_base64(self, image_url: str) -> str:
            async with httpx.AsyncClient() as client:
                response = await client.get(image_url, timeout=30)
                response.raise_for_status()
                encoded = base64.b64encode(response.content).decode("utf-8")
            raise EngineError(self.name, f"Failed to fetch image from URL: {e}")
            raise EngineError(self.name, f"Error processing image: {e}")
    async def _build_payload(self, query: str) -> dict[str, Any]:
            payload["image"] = await self._convert_image_url_to_base64(self.image_url)
    def _build_result(self, item: CritiqueResult, rank: int) -> SearchResult:
                HttpUrl(item.url) if item.url else HttpUrl("https://critique-labs.ai")
            url_obj = HttpUrl("https://critique-labs.ai")
        return SearchResult(
            raw=item.dict(),
    def _parse_results(self, data: dict[str, Any]) -> list[SearchResult]:
        critique_data = CritiqueResponse(
            results=data.get("results", []),
            response=data.get("response"),
            structured_output=data.get("structured_output"),
            results.append(
                SearchResult(
                    url=HttpUrl("https://critique-labs.ai"),
        for idx, item in enumerate(critique_data.results, 1):
                results.append(self._build_result(item, idx))
    async def search(self, query: str) -> list[SearchResult]:
        payload = await self._build_payload(query)
                response = await client.post(
                data = response.json()
                return self._parse_results(data)
                raise EngineError(self.name, f"HTTP Request failed: {exc}") from exc
                raise EngineError(self.name, f"Response parsing error: {exc}") from exc
                raise EngineError(self.name, f"Search failed: {exc}") from exc
async def critique(
    config = EngineConfig(api_key=api_key, enabled=True)
    engine = CritiqueSearchEngine(
    return await engine.search(query)

================
File: src/twat_search/web/engines/duckduckgo.py
================
logger = logging.getLogger(__name__)
class DuckDuckGoResult(BaseModel):
class DuckDuckGoSearchEngine(SearchEngine):
    def __init__(
        super().__init__(config, **kwargs)
        ) = self._map_init_params(
            logger.debug(
    def _map_init_params(
        max_results = kwargs.get(
        ) or config.default_params.get("max_results", 10)
        region = kwargs.get("region", country) or config.default_params.get(
        lang = language or config.default_params.get("language", None)
        timelimit = kwargs.get("timelimit", time_frame) or config.default_params.get(
        if timelimit and not kwargs.get("timelimit"):
            timelimit = time_mapping.get(timelimit.lower(), timelimit)
        safesearch = kwargs.get("safesearch", safe_search)
        if isinstance(safesearch, str):
            safesearch = False if safesearch.lower() in ["off", "false"] else True
        proxy = kwargs.get("proxy") or config.default_params.get("proxy", None)
        timeout = kwargs.get("timeout") or config.default_params.get("timeout", 10)
    def _convert_result(self, raw: dict[str, Any]) -> SearchResult | None:
            ddg_result = DuckDuckGoResult(
            return SearchResult(
            logger.warning(f"Validation error for result: {exc}")
    async def search(self, query: str) -> list[SearchResult]:
            ddgs = DDGS(proxy=self.proxy, timeout=self.timeout)
            raw_results = ddgs.text(**params)
                converted = self._convert_result(raw)
                    results.append(converted)
            raise EngineError(self.name, f"Search failed: {exc}") from exc
async def duckduckgo(
    return await search(

================
File: src/twat_search/web/engines/google_scraper.py
================
    class GoogleSearchResult:  # type: ignore
        def __init__(self, url: str, title: str, description: str):
    def google_search(*args, **kwargs):  # type: ignore
logger = logging.getLogger(__name__)
class GoogleScraperResult(BaseModel):
    @field_validator("title", "description")
    def validate_non_empty(cls, v: str) -> str:
class GoogleScraperEngine(SearchEngine):
    def __init__(
        super().__init__(config, **kwargs)
        self.max_results: int = num_results or self.config.default_params.get(
        self.language: str = language or self.config.default_params.get(
        self.region: str | None = country or self.config.default_params.get(
            else self.config.default_params.get("safe", "active")
        self.sleep_interval: float = kwargs.get(
        ) or self.config.default_params.get("sleep_interval", 0.0)
        self.ssl_verify: bool | None = kwargs.get(
        ) or self.config.default_params.get("ssl_verify", None)
        self.proxy: str | None = kwargs.get("proxy") or self.config.default_params.get(
        self.unique: bool = kwargs.get("unique") or self.config.default_params.get(
            unused_params.append(f"time_frame='{time_frame}'")
            logger.debug(
                f"Parameters {', '.join(unused_params)} set but not used by Google Scraper"
    def _convert_result(self, result: GoogleSearchResult) -> SearchResult | None:
            logger.warning("Empty result received from Google Scraper")
            validated = GoogleScraperResult(
                if hasattr(result, "description")
            return SearchResult(
                    "url": str(result.url),
            logger.warning(f"Validation error for result: {exc}")
            logger.warning(f"Unexpected error converting result: {exc}")
    async def search(self, query: str) -> list[SearchResult]:
            raise EngineError(self.name, "Search query cannot be empty")
        logger.info(f"Searching Google with query: '{query}'")
            raw_results = list(
                google_search(
                logger.info("No results returned from Google Scraper")
            logger.debug(f"Received {len(raw_results)} raw results from Google Scraper")
            logger.error(error_msg)
            raise EngineError(self.name, error_msg) from exc
                self._convert_result(cast(GoogleSearchResult, result))
        logger.info(f"Returning {len(results)} validated results from Google Scraper")
async def google_scraper(
    return await search(

================
File: src/twat_search/web/engines/hasdata.py
================
class HasDataGoogleResult(BaseModel):
    def from_api_result(cls, result: dict[str, Any]) -> "HasDataGoogleResult":
        return cls(
            title=result.get("title", ""),
            url=result.get("link", ""),
            snippet=result.get("snippet", ""),
class HasDataBaseEngine(SearchEngine):
    def __init__(
        super().__init__(config)
            or kwargs.get("location")
            or self.config.default_params.get("location")
            or kwargs.get("device_type")
            or self.config.default_params.get("device_type", "desktop")
            raise EngineError(
                f"HasData API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
    async def search(self, query: str) -> list[SearchResult]:
        async with httpx.AsyncClient() as client:
                response = await client.get(
                response.raise_for_status()
                data = response.json()
                organic_results = data.get("organicResults", [])
                for i, result in enumerate(organic_results):
                        parsed = HasDataGoogleResult.from_api_result(result)
                        results.append(
                            SearchResult(
                raise EngineError(self.name, f"HTTP Request failed: {exc}") from exc
                raise EngineError(self.name, f"Response parsing error: {exc}") from exc
                raise EngineError(self.name, f"Invalid JSON response: {exc}") from exc
class HasDataGoogleEngine(HasDataBaseEngine):
class HasDataGoogleLightEngine(HasDataBaseEngine):
async def hasdata_google(
    config = EngineConfig(api_key=api_key, enabled=True)
    engine = HasDataGoogleEngine(
    return await engine.search(query)
async def hasdata_google_light(
    engine = HasDataGoogleLightEngine(

================
File: src/twat_search/web/engines/pplx.py
================
class PerplexityResult(BaseModel):
    answer: str = Field(default="")  # Perplexity may sometimes not include all details
    url: str = Field(default="https://perplexity.ai")  # Default URL if none provided
    title: str = Field(default="Perplexity AI Response")  # Default title
class PerplexitySearchEngine(SearchEngine):
    def __init__(
        super().__init__(config)
            or kwargs.get("model")
            or self.config.default_params.get("model", "pplx-70b-online")
            raise EngineError(
                f"Perplexity API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}.",
    async def search(self, query: str) -> list[SearchResult]:
            async with httpx.AsyncClient() as client:
                response = await client.post(
                response.raise_for_status()
                data = response.json()
            raise EngineError(self.name, f"HTTP Request failed: {exc}") from exc
        for choice in data.get("choices", []):
            answer = choice.get("message", {}).get("content", "")
                pr = PerplexityResult(answer=answer, url=url, title=title)
                url_obj = HttpUrl(pr.url)  # Validate URL format
                results.append(
                    SearchResult(
async def pplx(
    config = EngineConfig(
    engine = PerplexitySearchEngine(
    return await engine.search(query)

================
File: src/twat_search/web/engines/searchit.py
================
    class SearchitResult:  # type: ignore
        def __init__(self, rank: int, url: str, title: str, description: str):
    class ScrapeRequest:  # type: ignore
        def __init__(
    class GoogleScraper:  # type: ignore
        def __init__(self, max_results_per_page: int = 100):
        async def scrape(self, request: ScrapeRequest) -> list[SearchitResult]:
    class YandexScraper:  # type: ignore
        def __init__(self, max_results_per_page: int = 10):
    class QwantScraper:  # type: ignore
    class BingScraper:  # type: ignore
logger = logging.getLogger(__name__)
class SearchitScraperResult(BaseModel):
    @field_validator("title", "description")
    def validate_non_empty(cls, v: str) -> str:
class SearchitEngine(SearchEngine):
        super().__init__(config, **kwargs)
        self.max_results: int = num_results or self.config.default_params.get(
        self.language: str = language or self.config.default_params.get(
        self.domain: str | None = country or self.config.default_params.get(
        self.geo: str | None = country or self.config.default_params.get("geo", None)
        self.sleep_interval: int = kwargs.get(
        ) or self.config.default_params.get("sleep_interval", 0)
        self.proxy: str | None = kwargs.get("proxy") or self.config.default_params.get(
            unused_params.append(f"safe_search={safe_search}")
            unused_params.append(f"time_frame='{time_frame}'")
            logger.debug(
                f"Parameters {', '.join(unused_params)} set but not used by {self.name}"
    def _convert_result(self, result: SearchitResult) -> SearchResult | None:
            logger.warning(f"Empty result received from {self.name}")
            validated = SearchitScraperResult(
                if hasattr(result, "description")
            return SearchResult(
                    "url": str(result.url),
            logger.warning(f"Validation error for result: {exc}")
            logger.warning(f"Unexpected error converting result: {exc}")
    async def _run_scraper(
            loop = asyncio.get_event_loop()
            return await loop.run_in_executor(
                None, lambda: asyncio.run(scraper.scrape(request))
            logger.error(f"Error running searchit scraper: {exc}")
            raise EngineError(
class GoogleSearchitEngine(SearchitEngine):
    async def search(self, query: str) -> list[SearchResult]:
            raise EngineError(self.name, "Search query cannot be empty")
        logger.info(f"Searching Google (searchit) with query: '{query}'")
            request = ScrapeRequest(
            scraper = GoogleScraper(max_results_per_page=min(100, self.max_results))
            raw_results = await self._run_scraper(scraper, request)
                logger.info("No results returned from Google (searchit)")
                f"Received {len(raw_results)} raw results from Google (searchit)"
                    self._convert_result(result) for result in raw_results
            logger.info(
                f"Returning {len(results)} validated results from Google (searchit)"
            logger.error(error_msg)
            raise EngineError(self.name, error_msg) from exc
class YandexSearchitEngine(SearchitEngine):
        logger.info(f"Searching Yandex (searchit) with query: '{query}'")
            scraper = YandexScraper(max_results_per_page=min(10, self.max_results))
                logger.info("No results returned from Yandex (searchit)")
                f"Received {len(raw_results)} raw results from Yandex (searchit)"
                f"Returning {len(results)} validated results from Yandex (searchit)"
class QwantSearchitEngine(SearchitEngine):
        logger.info(f"Searching Qwant (searchit) with query: '{query}'")
            scraper = QwantScraper(max_results_per_page=min(10, self.max_results))
                logger.info("No results returned from Qwant (searchit)")
                f"Received {len(raw_results)} raw results from Qwant (searchit)"
                f"Returning {len(results)} validated results from Qwant (searchit)"
class BingSearchitEngine(SearchitEngine):
        logger.info(f"Searching Bing (searchit) with query: '{query}'")
            scraper = BingScraper(max_results_per_page=min(30, self.max_results))
                logger.info("No results returned from Bing (searchit)")
                f"Received {len(raw_results)} raw results from Bing (searchit)"
                f"Returning {len(results)} validated results from Bing (searchit)"
async def google_searchit(
    return await search(
async def yandex_searchit(
async def qwant_searchit(
async def bing_searchit(

================
File: src/twat_search/web/engines/serpapi.py
================
class SerpApiResult(BaseModel):
class SerpApiResponse(BaseModel):
class SerpApiSearchEngine(SearchEngine):
    def __init__(
        super().__init__(config)
            "num": kwargs.get("num", num_results)
            or self.config.default_params.get("num", 10),
            "google_domain": kwargs.get("google_domain")
            or self.config.default_params.get("google_domain", "google.com"),
            "gl": kwargs.get("gl", country) or self.config.default_params.get("gl"),
            "hl": kwargs.get("hl", language) or self.config.default_params.get("hl"),
            "safe": self._convert_safe(kwargs.get("safe", safe_search))
            or self.config.default_params.get("safe"),
            "time_period": kwargs.get("time_period", time_frame)
            or self.config.default_params.get("time_period"),
            raise EngineError(
                f"SerpApi API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
    def _convert_safe(self, safe: bool | str | None) -> str | None:
        if isinstance(safe, bool):
    async def search(self, query: str) -> list[SearchResult]:
        params.update({k: v for k, v in self._params.items() if v is not None})
        async with httpx.AsyncClient() as client:
                response = await client.get(
                response.raise_for_status()
                data = response.json()
                serpapi_response = SerpApiResponse(**data)
                        results.append(
                            SearchResult(
                                raw=result.model_dump(),  # Include raw result for debugging
                raise EngineError(self.name, f"HTTP Request failed: {exc}") from exc
                raise EngineError(self.name, f"Response parsing error: {exc}") from exc
async def serpapi(
    config = EngineConfig(
    engine = SerpApiSearchEngine(
    return await engine.search(query)

================
File: src/twat_search/web/engines/tavily.py
================
class TavilySearchResult(BaseModel):
class TavilySearchResponse(BaseModel):
class TavilySearchEngine(SearchEngine):
    def __init__(
        super().__init__(config)
        def get_default(value, key, fallback):
                else self.config.default_params.get(key, fallback)
        self.max_results = get_default(
            kwargs.get("max_results", num_results), "max_results", 5
        self.search_depth = get_default(search_depth, "search_depth", "basic")
        self.include_domains = get_default(include_domains, "include_domains", None)
        self.exclude_domains = get_default(exclude_domains, "exclude_domains", None)
        self.include_answer = get_default(include_answer, "include_answer", False)
        self.max_tokens = get_default(max_tokens, "max_tokens", None)
        self.search_type = get_default(search_type, "search_type", "search")
            raise EngineError(
                f"Tavily API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
    def _build_payload(self, query: str) -> dict:
    def _convert_result(self, item: dict, rank: int) -> SearchResult | None:
            validated_url = HttpUrl(item.get("url", ""))
            return SearchResult(
                title=item.get("title", ""),
                snippet=textwrap.shorten(
                    item.get("content", "").strip(), width=500, placeholder="..."
    async def search(self, query: str) -> list[SearchResult]:
        payload = self._build_payload(query)
        async with httpx.AsyncClient() as client:
                response = await client.post(
                response.raise_for_status()
                data = response.json()
                raise EngineError(self.name, f"HTTP error: {e}")
                raise EngineError(self.name, f"Request error: {e}")
                raise EngineError(self.name, f"Error: {e!s}")
            parsed_response = TavilySearchResponse.parse_obj(data)
            items = [item.dict() for item in parsed_response.results]
            items = data.get("results", [])
        for idx, item in enumerate(items, start=1):
            converted = self._convert_result(item, idx)
                results.append(converted)
async def tavily(
    config = EngineConfig(
    engine = TavilySearchEngine(
    return await engine.search(query)

================
File: src/twat_search/web/engines/you.py
================
class YouSearchHit(BaseModel):
    snippet: str = Field(alias="description")
class YouSearchResponse(BaseModel):
    search_id: str | None = Field(None, alias="searchId")
class YouNewsArticle(BaseModel):
class YouNewsResponse(BaseModel):
class YouBaseEngine(SearchEngine):
    def __init__(
        super().__init__(config)
            raise EngineError(
                f"You.com API key is required. Set it via one of these env vars: {', '.join(self.env_api_key_names)}",
        self.num_results = num_results or self.config.default_params.get(
        self.country_code = country or self.config.default_params.get(
        self.safe_search = safe_search or self.config.default_params.get(
    async def _make_api_call(self, query: str) -> dict:
            params["safe_search"] = str(self.safe_search).lower()
        async with httpx.AsyncClient() as client:
                response = await client.get(
                response.raise_for_status()
                return response.json()
class YouSearchEngine(YouBaseEngine):
    async def search(self, query: str) -> list[SearchResult]:
        data = await self._make_api_call(query)
            you_response = YouSearchResponse(**data)
                    results.append(
                        SearchResult(
                            raw=hit.model_dump(by_alias=True),
class YouNewsSearchEngine(YouBaseEngine):
            you_response = YouNewsResponse(**data)
                            raw=article.model_dump(by_alias=True),
async def you(
    config = EngineConfig(api_key=api_key, enabled=True)
    engine = YouSearchEngine(
    return await engine.search(query)
async def you_news(
    engine = YouNewsSearchEngine(

================
File: src/twat_search/web/__init__.py
================
    __all__.extend(["Config", "EngineConfig", "SearchResult", "search"])
    __all__.extend(["brave", "brave_news"])
    __all__.extend(["pplx"])
    __all__.extend(["serpapi"])
    __all__.extend(["tavily"])
    __all__.extend(["you", "you_news"])
    __all__.extend(["critique"])
    __all__.extend(["duckduckgo"])
    __all__.extend(["bing_scraper"])

================
File: src/twat_search/web/api.py
================
logger = logging.getLogger(__name__)
def get_engine_params(
        k[len(engine_name) + 1 :]: v
        for k, v in kwargs.items()
        if k.startswith(engine_name + "_")
        if not any(k.startswith(e + "_") for e in engines)
def init_engine_task(
    engine_config = config.engines.get(engine_name)
        logger.warning(f"Engine '{engine_name}' not configured.")
        engine_params = get_engine_params(engine_name, engines, kwargs, common_params)
        engine_instance: SearchEngine = get_engine(
        logger.info(f"🔍 Querying engine: {engine_name}")
        return (engine_name, engine_instance.search(query))
        logger.warning(
        logger.error(f"Error initializing engine '{engine_name}': {e}")
async def search(
        config = config or Config()
        engines = engines or list(config.engines.keys())
            raise SearchError(msg)
            }.items()
            task = init_engine_task(
                engine_names.append(task[0])
                tasks.append(task[1])
        results = await asyncio.gather(*tasks, return_exceptions=True)
        for engine_name, result in zip(engine_names, results, strict=False):
            if isinstance(result, Exception):
                logger.error(f"Search with engine '{engine_name}' failed: {result}")
            elif isinstance(result, list):
                logger.info(f"✅ Engine '{engine_name}' returned {len(result)} results")
                flattened_results.extend(result)
                logger.info(
                    f"⚠️ Engine '{engine_name}' returned no results or unexpected type: {type(result)}"
        logger.error(f"Search failed: {e}")

================
File: src/twat_search/web/cli.py
================
class CustomJSONEncoder(json_lib.JSONEncoder):
    def default(self, o: Any) -> Any:
            return json_lib.JSONEncoder.default(self, o)
            return str(o)
console = Console()
class SearchCLI:
    def __init__(self) -> None:
        self.logger = logging.getLogger("twat_search.cli")
        self.log_handler = RichHandler(rich_tracebacks=True)
        self._configure_logging()
        self.console = Console()
        available_engines = get_available_engines()
            self.logger.warning(
                f"{', '.join(missing_engines)}. "
    def _configure_logging(self, verbose: bool = False) -> None:
        logging.basicConfig(
        self.logger.setLevel(level)
        logging.getLogger("twat_search.web.api").setLevel(level)
        logging.getLogger("twat_search.web.engines").setLevel(level)
        logging.getLogger("httpx").setLevel(level)
    def _parse_engines(self, engines_arg: Any) -> list[str] | None:
        if isinstance(engines_arg, str):
            return [e.strip() for e in engines_arg.split(",") if e.strip()]
        if isinstance(engines_arg, list | tuple):
            return [str(e).strip() for e in engines_arg if str(e).strip()]
            f"Unexpected engines type: {type(engines_arg)}. Using all available engines."
    async def _run_search(
                if engine == "all" or is_engine_available(engine):
                    available.append(engine)
            self.logger.debug(f"Attempting to search with engines: {engines}")
            results = await search(query=query, engines=engines, **kwargs)
            return self._process_results(results)
            self.logger.error(f"Search failed: {e}")
            self._display_errors([str(e)])
    def _process_results(self, results: list) -> list[dict[str, Any]]:
            engine_name = getattr(result, "source", None) or "unknown"
            engine_results.setdefault(engine_name, []).append(result)
        for engine, engine_results_list in engine_results.items():
                processed.append(
            for idx, result in enumerate(engine_results_list):
                url = str(result.url)
                        if len(result.snippet) > 100
                        "raw_result": getattr(result, "raw", None),
    def _display_results(
            console.print("[bold red]No results found![/bold red]")
            urls = set()
                    urls.add(result["url"])
            for url in sorted(urls):
                console.print(url)
        table = Table()  # Remove show_lines=True to eliminate row separator lines
        table.add_column("Engine", style="cyan", no_wrap=True)
            table.add_column("Status", style="magenta")
            table.add_column("Title", style="green")
            table.add_column("URL", style="blue", overflow="fold")
                table.add_row(
            table.add_column("URL", style="blue", overflow="fold", max_width=70)
                table.add_row(result["engine"], result["url"])
        console.print(table)
                    console.print(result)
    def _display_json_results(self, processed_results: list[dict[str, Any]]) -> None:
            results_by_engine[engine]["results"].append(
                    "snippet": result.get("snippet")
                    if result.get("snippet") != "N/A"
                    "raw": result.get("raw_result"),
    def _display_errors(self, error_messages: list[str]) -> None:
        table = Table(title="❌ Search Errors")
        table.add_column("Error", style="red")
            table.add_row(error)
    async def _search_engine(
            engine_func = get_engine_function(engine)
                self.logger.warning(error_msg)
                self._display_errors([error_msg])
        friendly = friendly_names.get(engine, engine)
            self.console.print(f"[bold]Searching {friendly}[/bold]: {query}")
            results = await engine_func(query=query, **params)
            processed_results = self._process_results(results)
                self._display_json_results(processed_results)
                self._display_results(processed_results, verbose, plain)
            self.logger.error(f"{friendly} search failed: {e}")
    def q(
        self._configure_logging(verbose)
        engine_list = self._parse_engines(engines)
        common_params = {k: v for k, v in common_params.items() if v is not None}
            results = asyncio.run(
                self._run_search(query, engine_list, **common_params, **kwargs)
            with self.console.status(
            self._display_json_results(results)
            self._display_results(results, verbose, plain)
    def info(
            config = Config()
                self._display_engines_json(engine, config)
                self._display_engines_plain(engine, config)
                self._list_all_engines(config)
                self._show_engine_details(engine, config)
                self.logger.error(f"❌ Failed to display engine information: {e}")
    def _display_engines_plain(self, engine: str | None, config: "Config") -> None:
                self.console.print(engine)
            for engine_name in sorted(config.engines.keys()):
                self.console.print(engine_name)
    def _list_all_engines(self, config: "Config") -> None:
        table = Table(title="🔎 Available Search Engines")
        table.add_column("Enabled", style="magenta")
        table.add_column("API Key Required", style="yellow")
            registered_engines = get_registered_engines()
        sorted_engines = sorted(config.engines.items(), key=lambda x: x[0])
                hasattr(engine_config, "api_key") and engine_config.api_key is not None
                engine_class = registered_engines.get(engine)
                if engine_class and hasattr(engine_class, "env_api_key_names"):
                    api_key_required = bool(engine_class.env_api_key_names)
        self.console.print(table)
        self.console.print(
    def _show_engine_details(self, engine_name: str, config: "Config") -> None:
            self.console.print("\nAvailable engines:")
                self.console.print(f"- {name}")
            engine_class = registered_engines.get(engine_name)
                and hasattr(engine_class, "env_api_key_names")
            self.console.print(f"\n[bold cyan]🔍 Engine: {engine_name}[/bold cyan]")
                self.console.print("\n[bold]API Key Environment Variables:[/bold]")
                    value_status = "✅" if os.environ.get(env_name) else "❌"
                    self.console.print(f"  {env_name}: {value_status}")
            self.console.print("\n[bold]Default Parameters:[/bold]")
                for param, value in engine_config.default_params.items():
                    self.console.print(f"  {param}: {value}")
                self.console.print("  No default parameters specified")
                base_engine = engine_name.split("-")[0]
                engine_module = importlib.import_module(module_name)
                function_name = engine_name.replace("-", "_")
                if hasattr(engine_module, function_name):
                    func = getattr(engine_module, function_name)
                    self.console.print("\n[bold]Function Interface:[/bold]")
                        f"  [green]{function_name}()[/green] - {func.__doc__.strip().split('\\n')[0]}"
                    self.console.print("\n[bold]Example Usage:[/bold]")
            self.console.print("\n[bold]Basic Configuration:[/bold]")
            self.console.print(f"Enabled: {'✅' if engine_config.enabled else '❌'}")
            self.console.print(f"Default Parameters: {engine_config.default_params}")
    def _display_engines_json(self, engine: str | None, config: "Config") -> None:
            result[engine] = self._get_engine_info(
            for engine_name, engine_config in sorted(config.engines.items()):
                result[engine_name] = self._get_engine_info(
    def _get_engine_info(
        if hasattr(engine_config, "api_key") and engine_config.api_key is not None:
                    {"name": env_name, "set": bool(os.environ.get(env_name))}
            if hasattr(engine_config, "default_params")
            if hasattr(engine_config, "enabled")
    def _check_engine_availability(self, engine_name: str) -> bool:
        return is_engine_available(engine_name)
    async def critique(
                domain.strip() for domain in source_whitelist.split(",")
                domain.strip() for domain in source_blacklist.split(",")
        params.update(kwargs)
        return await self._search_engine(
    async def brave(
        params = {k: v for k, v in params.items() if v is not None}
        return await self._search_engine("brave", query, params, json, verbose, plain)
    async def brave_news(
    async def serpapi(
        return await self._search_engine("serpapi", query, params, json, verbose, plain)
    async def tavily(
                s.strip() for s in include_domains.split(",") if s.strip()
                s.strip() for s in exclude_domains.split(",") if s.strip()
        return await self._search_engine("tavily", query, params, json, verbose, plain)
    async def pplx(
        return await self._search_engine("pplx", query, params, json, verbose, plain)
    async def you(
        return await self._search_engine("you", query, params, json, verbose, plain)
    async def you_news(
    async def duckduckgo(
    async def hasdata_google(
    async def hasdata_google_light(
def main() -> None:
    fire.Fire(SearchCLI())
    main()

================
File: src/twat_search/web/config.py
================
    load_dotenv()  # Load variables from .env file into environment
class EngineConfig(BaseModel):
    default_params: dict[str, Any] = Field(default_factory=dict)
class Config:
    def __init__(self, **kwargs: Any) -> None:
        self.engines: dict[str, EngineConfig] = kwargs.get("engines", {})
            self._load_engine_configs()
    def _load_engine_configs(self) -> None:
            registered_engines = get_registered_engines()
        for engine_name, engine_class in registered_engines.items():
                api_key = os.environ.get(env_name)
                enabled = os.environ.get(env_name)
                    engine_settings[engine_name]["enabled"] = enabled.lower() in (
                params = os.environ.get(env_name)
                        engine_settings[engine_name]["default_params"] = json.loads(
        for engine_name, settings in engine_settings.items():
                for key, value in settings.items():
                    setattr(existing_config, key, value)
                self.engines[engine_name] = EngineConfig(**settings)

================
File: src/twat_search/web/exceptions.py
================
class SearchError(Exception):
    def __init__(self, message: str) -> None:
        super().__init__(message)
class EngineError(SearchError):
    def __init__(self, engine_name: str, message: str) -> None:
        super().__init__(f"Engine '{engine_name}': {message}")

================
File: src/twat_search/web/models.py
================
class SearchResult(BaseModel):
    @field_validator("title", "snippet", "source")
    def validate_non_empty(cls, v: str) -> str:
        if not v or not v.strip():
            raise ValueError(msg)
        return v.strip()

================
File: src/twat_search/web/utils.py
================
logger = logging.getLogger(__name__)
class RateLimiter:
    def __init__(self, calls_per_second: int = 10):
    def wait_if_needed(self) -> None:
        now = time.time()
        if len(self.call_timestamps) >= self.calls_per_second:
                    logger.debug(f"Rate limiting: sleeping for {sleep_time:.2f}s")
                time.sleep(sleep_time)
        self.call_timestamps.append(time.time())

================
File: src/twat_search/__init__.py
================
    __all__.append("__version__")
    __all__.append("web")

================
File: src/twat_search/__main__.py
================
logging.basicConfig(
    handlers=[RichHandler(rich_tracebacks=True)],
logger = logging.getLogger(__name__)
console = Console()
SearchCLIType = TypeVar("SearchCLIType")
class TwatSearchCLI:
    def __init__(self) -> None:
            self.web: Any = web_cli.SearchCLI()
            logger.error(f"Web CLI not available: {e!s}")
            logger.error("Make sure twat_search.web.cli is properly installed.")
    def _cli_error(self, *args: Any, **kwargs: Any) -> int:  # noqa: ARG002
        console.print(
    def version(self) -> str:
def main() -> None:
    fire.Fire(TwatSearchCLI(), name="twat-search")
    main()

================
File: tests/unit/web/engines/__init__.py
================


================
File: tests/unit/web/engines/test_base.py
================
class TestSearchEngine(SearchEngine):
    async def search(self, query: str) -> list[SearchResult]:
            SearchResult(
                url=HttpUrl("https://example.com/test"),
register_engine(TestSearchEngine)
class DisabledTestSearchEngine(SearchEngine):
        raise NotImplementedError(msg)
register_engine(DisabledTestSearchEngine)
def test_search_engine_is_abstract() -> None:
    assert hasattr(SearchEngine, "__abstractmethods__")
    with pytest.raises(TypeError):
        SearchEngine(EngineConfig())  # type: ignore
def test_search_engine_name_class_var() -> None:
    assert hasattr(SearchEngine, "name")
def test_engine_registration() -> None:
    class NewEngine(SearchEngine):
    returned_class = register_engine(NewEngine)
    engine_instance = get_engine("new_engine", EngineConfig())
    assert isinstance(engine_instance, NewEngine)
def test_get_engine_with_invalid_name() -> None:
    with pytest.raises(SearchError, match="Unknown search engine"):
        get_engine("nonexistent_engine", EngineConfig())
def test_get_engine_with_disabled_engine() -> None:
    config = EngineConfig(enabled=False)
    with pytest.raises(SearchError, match="is disabled"):
        get_engine("disabled_engine", config)
def test_get_engine_with_config() -> None:
    config = EngineConfig(
    engine = get_engine("test_engine", config)
def test_get_engine_with_kwargs() -> None:
    engine = get_engine("test_engine", EngineConfig(), **kwargs)

================
File: tests/unit/web/__init__.py
================


================
File: tests/unit/web/test_api.py
================
logging.basicConfig(level=logging.DEBUG)
T = TypeVar("T")
class MockSearchEngine(SearchEngine):
    def __init__(self, config: EngineConfig, **kwargs: Any) -> None:
        super().__init__(config, **kwargs)
        self.should_fail = kwargs.get("should_fail", False)
    async def search(self, query: str) -> list[SearchResult]:
            raise Exception(msg)
        result_count = self.kwargs.get("result_count", 1)
            SearchResult(
                url=HttpUrl(f"https://example.com/{i + 1}"),
            for i in range(result_count)
register_engine(MockSearchEngine)
def mock_config() -> Config:
    config = Config()
        "mock": EngineConfig(
async def setup_teardown() -> AsyncGenerator[None, None]:
    tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
    with contextlib.suppress(asyncio.CancelledError):
        await asyncio.gather(*tasks)
async def test_search_with_mock_engine(
    results = await search("test query", engines=["mock"], config=mock_config)
    assert len(results) == 2
    assert all(isinstance(result, SearchResult) for result in results)
    assert all(result.source == "mock" for result in results)
async def test_search_with_additional_params(
    results = await search(
    assert len(results) == 3
async def test_search_with_engine_specific_params(
    assert len(results) == 4
async def test_search_with_no_engines(setup_teardown: None) -> None:
    with pytest.raises(SearchError, match="No search engines configured"):
        await search("test query", engines=[])
async def test_search_with_failing_engine(
    assert len(results) == 0
async def test_search_with_nonexistent_engine(
    with pytest.raises(SearchError, match="No search engines could be initialized"):
        await search("test query", engines=["nonexistent"], config=mock_config)
async def test_search_with_disabled_engine(
        await search("test query", engines=["mock"], config=mock_config)

================
File: tests/unit/web/test_config.py
================
def test_engine_config_defaults() -> None:
    config = EngineConfig()
def test_engine_config_values() -> None:
    config = EngineConfig(
def test_config_defaults(isolate_env_vars: None) -> None:
    config = Config()
    assert isinstance(config.engines, dict)
    assert len(config.engines) == 0
def test_config_with_env_vars(
def test_config_with_direct_initialization() -> None:
    custom_config = Config(
            "test_engine": EngineConfig(
def test_config_env_vars_override_direct_config(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setenv("BRAVE_API_KEY", "env_key")
            "brave": EngineConfig(

================
File: tests/unit/web/test_exceptions.py
================
def test_search_error() -> None:
    exception = SearchError(error_message)
    assert str(exception) == error_message
    assert isinstance(exception, Exception)
def test_engine_error() -> None:
    exception = EngineError(engine_name, error_message)
    assert str(exception) == f"Engine '{engine_name}': {error_message}"
    assert isinstance(exception, SearchError)
def test_engine_error_inheritance() -> None:
        raise EngineError(msg, "Test error")
        if isinstance(e, EngineError):
def test_search_error_as_base_class() -> None:
        raise SearchError(msg)
        exceptions.append(e)
        raise EngineError(msg, "API key missing")
    assert len(exceptions) == 2
    assert isinstance(exceptions[0], SearchError)
    assert isinstance(exceptions[1], EngineError)
    assert "General search error" in str(exceptions[0])
    assert "Engine 'brave': API key missing" in str(exceptions[1])

================
File: tests/unit/web/test_models.py
================
def test_search_result_valid_data() -> None:
    url = HttpUrl("https://example.com")
    result = SearchResult(
    assert str(result.url) == "https://example.com/"
def test_search_result_with_optional_fields() -> None:
def test_search_result_invalid_url() -> None:
    with pytest.raises(ValidationError):
        SearchResult.model_validate(
def test_search_result_empty_fields() -> None:
                "url": str(url),
def test_search_result_serialization() -> None:
    result_dict = result.model_dump()
    assert str(result_dict["url"]) == "https://example.com/"
    result_json = result.model_dump_json()
    assert isinstance(result_json, str)
def test_search_result_deserialization() -> None:
    result = SearchResult.model_validate(data)

================
File: tests/unit/web/test_utils.py
================
def rate_limiter() -> RateLimiter:
    return RateLimiter(calls_per_second=5)
def test_rate_limiter_init() -> None:
    limiter = RateLimiter(calls_per_second=10)
def test_rate_limiter_wait_when_not_needed(rate_limiter: RateLimiter) -> None:
    with patch("time.sleep") as mock_sleep:
        rate_limiter.wait_if_needed()
        mock_sleep.assert_not_called()
        for _ in range(3):  # 4 total calls including the one above
def test_rate_limiter_wait_when_needed(rate_limiter: RateLimiter) -> None:
    now = time.time()
        now - 0.01 * i for i in range(rate_limiter.calls_per_second)
    with patch("time.sleep") as mock_sleep, patch("time.time", return_value=now):
        mock_sleep.assert_called_once()
def test_rate_limiter_cleans_old_timestamps(rate_limiter: RateLimiter) -> None:
    with patch("time.time", return_value=now):
        len(rate_limiter.call_timestamps) == len(recent_stamps) + 1
@pytest.mark.parametrize("calls_per_second", [1, 5, 10, 100])
def test_rate_limiter_with_different_rates(calls_per_second: int) -> None:
    limiter = RateLimiter(calls_per_second=calls_per_second)
        for _ in range(calls_per_second):
            limiter.wait_if_needed()
        patch("time.sleep") as mock_sleep,
        patch("time.time", return_value=time.time()),

================
File: tests/unit/__init__.py
================


================
File: tests/unit/mock_engine.py
================
class MockSearchEngine(SearchEngine):
    def __init__(self, config: EngineConfig, **kwargs: Any) -> None:
        super().__init__(config, **kwargs)
        self.should_fail = kwargs.get("should_fail", False)
    async def search(self, query: str) -> list[SearchResult]:
            raise Exception(msg)
        result_count = self.kwargs.get("result_count", 1)
            SearchResult(
                url=HttpUrl(f"https://example.com/{i + 1}"),
            for i in range(result_count)
register_engine(MockSearchEngine)

================
File: tests/web/test_bing_scraper.py
================
class MockSearchResult:
    def __init__(self, title: str, url: str, description: str = "") -> None:
def engine_config() -> EngineConfig:
    return EngineConfig(enabled=True)
def engine(engine_config: EngineConfig) -> BingScraperSearchEngine:
    return BingScraperSearchEngine(config=engine_config, num_results=5)
def mock_results() -> list[MockSearchResult]:
        MockSearchResult(
class TestBingScraperEngine:
    @patch("twat_search.web.engines.bing_scraper.BingScraper")
    def test_init(self, mock_BingScraper: MagicMock, engine: Any) -> None:
        mock_BingScraper.assert_not_called()
    async def test_search_basic(
        mock_instance = MagicMock()
        results = await engine.search("test query")
        assert len(results) == 2
        assert isinstance(results[0], SearchResult)
        assert str(results[0].url) == "https://example.com/1"
        mock_BingScraper.assert_called_once_with(
        mock_instance.search.assert_called_once_with("test query", num_results=5)
    async def test_custom_parameters(self, mock_BingScraper: MagicMock) -> None:
        engine = BingScraperSearchEngine(
            config=EngineConfig(enabled=True),
        await engine.search("test query")
        mock_instance.search.assert_called_once_with("test query", num_results=10)
    async def test_invalid_url_handling(
        assert len(results) == 1
    @patch("twat_search.web.api.search")
    async def test_bing_scraper_convenience_function(
            SearchResult(
                url=HttpUrl("https://example.com"),
        results = await bing_scraper(
        mock_search.assert_called_once()
    async def test_empty_query(
        with pytest.raises(EngineError) as excinfo:
            await engine.search("")
        assert "Search query cannot be empty" in str(excinfo.value)
    async def test_no_results(
        assert isinstance(results, list)
        assert len(results) == 0
    async def test_network_error(
        mock_instance.search.side_effect = ConnectionError("Network timeout")
        assert "Network error connecting to Bing" in str(excinfo.value)
    async def test_parsing_error(
        mock_instance.search.side_effect = RuntimeError("Failed to parse HTML")
        assert "Error parsing Bing search results" in str(excinfo.value)
    async def test_invalid_result_format(
        class InvalidResult:
            def __init__(self):
        mock_instance.search.return_value = [InvalidResult()]

================
File: tests/conftest.py
================
@pytest.fixture(autouse=True)
def isolate_env_vars(monkeypatch: MonkeyPatch) -> None:
    for env_var in list(os.environ.keys()):
        if any(
            env_var.endswith(suffix)
            monkeypatch.delenv(env_var, raising=False)
    monkeypatch.setenv("_TEST_ENGINE", "true")
def env_vars_for_brave(monkeypatch: MonkeyPatch) -> None:
        sys.path.insert(0, str(Path(__file__).parent.parent))
        class MockBraveEngine(SearchEngine):
        register_engine(MockBraveEngine)
    monkeypatch.setenv("BRAVE_API_KEY", "test_brave_key")
    monkeypatch.setenv("BRAVE_ENABLED", "true")
    monkeypatch.setenv("BRAVE_DEFAULT_PARAMS", '{"count": 10}')
    monkeypatch.delenv("_TEST_ENGINE", raising=False)

================
File: tests/test_twat_search.py
================
def test_version():

================
File: .gitignore
================
*_autogen/
.DS_Store
__version__.py
__pycache__/
_Chutzpah*
_deps
_NCrunch_*
_pkginfo.txt
_Pvt_Extensions
_ReSharper*/
_TeamCity*
_UpgradeReport_Files/
!?*.[Cc]ache/
!.axoCover/settings.json
!.vscode/extensions.json
!.vscode/launch.json
!.vscode/settings.json
!.vscode/tasks.json
!**/[Pp]ackages/build/
!Directory.Build.rsp
.*crunch*.local.xml
.axoCover/*
.builds
.cr/personal
.fake/
.history/
.ionide/
.localhistory/
.mfractor/
.ntvs_analysis.dat
.paket/paket.exe
.sass-cache/
.vs/
.vscode
.vscode/*
.vshistory/
[Aa][Rr][Mm]/
[Aa][Rr][Mm]64/
[Bb]in/
[Bb]uild[Ll]og.*
[Dd]ebug/
[Dd]ebugPS/
[Dd]ebugPublic/
[Ee]xpress/
[Ll]og/
[Ll]ogs/
[Oo]bj/
[Rr]elease/
[Rr]eleasePS/
[Rr]eleases/
[Tt]est[Rr]esult*/
[Ww][Ii][Nn]32/
*_h.h
*_i.c
*_p.c
*_wpftmp.csproj
*- [Bb]ackup ([0-9]).rdl
*- [Bb]ackup ([0-9][0-9]).rdl
*- [Bb]ackup.rdl
*.[Cc]ache
*.[Pp]ublish.xml
*.[Rr]e[Ss]harper
*.a
*.app
*.appx
*.appxbundle
*.appxupload
*.aps
*.azurePubxml
*.bim_*.settings
*.bim.layout
*.binlog
*.btm.cs
*.btp.cs
*.build.csdef
*.cab
*.cachefile
*.code-workspace
*.coverage
*.coveragexml
*.d
*.dbmdl
*.dbproj.schemaview
*.dll
*.dotCover
*.DotSettings.user
*.dsp
*.dsw
*.dylib
*.e2e
*.exe
*.gch
*.GhostDoc.xml
*.gpState
*.ilk
*.iobj
*.ipdb
*.jfm
*.jmconfig
*.la
*.lai
*.ldf
*.lib
*.lo
*.log
*.mdf
*.meta
*.mm.*
*.mod
*.msi
*.msix
*.msm
*.msp
*.ncb
*.ndf
*.nuget.props
*.nuget.targets
*.nupkg
*.nvuser
*.o
*.obj
*.odx.cs
*.opendb
*.opensdf
*.opt
*.out
*.pch
*.pdb
*.pfx
*.pgc
*.pgd
*.pidb
*.plg
*.psess
*.publishproj
*.publishsettings
*.pubxml
*.pyc
*.rdl.data
*.rptproj.bak
*.rptproj.rsuser
*.rsp
*.rsuser
*.sap
*.sbr
*.scc
*.sdf
*.sln.docstates
*.sln.iml
*.slo
*.smod
*.snupkg
*.so
*.suo
*.svclog
*.tlb
*.tlh
*.tli
*.tlog
*.tmp
*.tmp_proj
*.tss
*.user
*.userosscache
*.userprefs
*.vbp
*.vbw
*.VC.db
*.VC.VC.opendb
*.VisualState.xml
*.vsp
*.vspscc
*.vspx
*.vssscc
*.xsd.cs
**/[Pp]ackages/*
**/*.DesktopClient/GeneratedArtifacts
**/*.DesktopClient/ModelManifest.xml
**/*.HTMLClient/GeneratedArtifacts
**/*.Server/GeneratedArtifacts
**/*.Server/ModelManifest.xml
*~
~$*
$tf/
AppPackages/
artifacts/
ASALocalRun/
AutoTest.Net/
Backup*/
BenchmarkDotNet.Artifacts/
bld/
BundleArtifacts/
ClientBin/
cmake_install.cmake
CMakeCache.txt
CMakeFiles
CMakeLists.txt.user
CMakeScripts
CMakeUserPresets.json
compile_commands.json
coverage*.info
coverage*.json
coverage*.xml
csx/
CTestTestfile.cmake
dlldata.c
DocProject/buildhelp/
DocProject/Help/*.hhc
DocProject/Help/*.hhk
DocProject/Help/*.hhp
DocProject/Help/*.HxC
DocProject/Help/*.HxT
DocProject/Help/html
DocProject/Help/Html2
ecf/
FakesAssemblies/
FodyWeavers.xsd
Generated_Code/
Generated\ Files/
healthchecksdb
install_manifest.txt
ipch/
Makefile
MigrationBackup/
mono_crash.*
nCrunchTemp_*
node_modules/
nunit-*.xml
OpenCover/
orleans.codegen.cs
Package.StoreAssociation.xml
paket-files/
project.fragment.lock.json
project.lock.json
publish/
PublishScripts/
rcf/
ScaffoldingReadMe.txt
ServiceFabricBackup/
StyleCopReport.xml
Testing
TestResult.xml
UpgradeLog*.htm
UpgradeLog*.XML
x64/
x86/
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Distribution / packaging
!dist/.gitkeep

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
.ruff_cache/

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# IDE
.idea/
.vscode/
*.swp
*.swo
*~

# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Project specific
__version__.py
_private

================
File: .pre-commit-config.yaml
================
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.3.4
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format
        args: [--respect-gitignore]
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-toml
      - id: check-added-large-files
      - id: debug-statements
      - id: check-case-conflict
      - id: mixed-line-ending
        args: [--fix=lf]

================
File: cleanup.py
================
LOG_FILE = Path("CLEANUP.txt")
os.chdir(Path(__file__).parent)
def new() -> None:
    if LOG_FILE.exists():
        LOG_FILE.unlink()
def prefix() -> None:
    readme = Path(".cursor/rules/0project.mdc")
    if readme.exists():
        log_message("\n=== PROJECT STATEMENT ===")
        content = readme.read_text()
        log_message(content)
def suffix() -> None:
    todo = Path("TODO.md")
    if todo.exists():
        log_message("\n=== TODO.md ===")
        content = todo.read_text()
def log_message(message: str) -> None:
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    with LOG_FILE.open("a") as f:
        f.write(log_line)
def run_command(cmd: list[str], check: bool = True) -> subprocess.CompletedProcess:
        result = subprocess.run(cmd, check=check, capture_output=True, text=True)
            log_message(result.stdout)
        log_message(f"Command failed: {' '.join(cmd)}")
        log_message(f"Error: {e.stderr}")
        return subprocess.CompletedProcess(cmd, 1, "", str(e))
def check_command_exists(cmd: str) -> bool:
        subprocess.run(["which", cmd], check=True, capture_output=True)
class Cleanup:
    def __init__(self) -> None:
        self.workspace = Path.cwd()
    def _print_header(self, message: str) -> None:
        log_message(f"\n=== {message} ===")
    def _check_required_files(self) -> bool:
            if not (self.workspace / file).exists():
                log_message(f"Error: {file} is missing")
    def _generate_tree(self) -> None:
        if not check_command_exists("tree"):
            log_message("Warning: 'tree' command not found. Skipping tree generation.")
            rules_dir = Path(".cursor/rules")
            rules_dir.mkdir(parents=True, exist_ok=True)
            tree_result = run_command(
            with open(rules_dir / "filetree.mdc", "w") as f:
                f.write("---\ndescription: File tree of the project\nglobs: \n---\n")
                f.write(tree_text)
            log_message("\nProject structure:")
            log_message(tree_text)
            log_message(f"Failed to generate tree: {e}")
    def _git_status(self) -> bool:
        result = run_command(["git", "status", "--porcelain"], check=False)
        return bool(result.stdout.strip())
    def _venv(self) -> None:
        log_message("Setting up virtual environment")
            run_command(["uv", "venv"])
            if venv_path.exists():
                os.environ["VIRTUAL_ENV"] = str(self.workspace / ".venv")
                log_message("Virtual environment created and activated")
                log_message("Virtual environment created but activation failed")
            log_message(f"Failed to create virtual environment: {e}")
    def _install(self) -> None:
        log_message("Installing package with all extras")
            self._venv()
            run_command(["uv", "pip", "install", "-e", ".[test,dev]"])
            log_message("Package installed successfully")
            log_message(f"Failed to install package: {e}")
    def _run_checks(self) -> None:
        log_message("Running code quality checks")
            log_message(">>> Running code fixes...")
            run_command(
            log_message(">>>Running type checks...")
            run_command(["python", "-m", "mypy", "src", "tests"], check=False)
            log_message(">>> Running tests...")
            run_command(["python", "-m", "pytest", "tests"], check=False)
            log_message("All checks completed")
            log_message(f"Failed during checks: {e}")
    def status(self) -> None:
        prefix()  # Add README.md content at start
        self._print_header("Current Status")
        self._check_required_files()
        self._generate_tree()
        result = run_command(["git", "status"], check=False)
        self._print_header("Environment Status")
        self._install()
        self._run_checks()
        suffix()  # Add TODO.md content at end
    def venv(self) -> None:
        self._print_header("Virtual Environment Setup")
    def install(self) -> None:
        self._print_header("Package Installation")
    def update(self) -> None:
        self.status()
        if self._git_status():
            log_message("Changes detected in repository")
                run_command(["git", "add", "."])
                run_command(["git", "commit", "-m", commit_msg])
                log_message("Changes committed successfully")
                log_message(f"Failed to commit changes: {e}")
            log_message("No changes to commit")
    def push(self) -> None:
        self._print_header("Pushing Changes")
            run_command(["git", "push"])
            log_message("Changes pushed successfully")
            log_message(f"Failed to push changes: {e}")
def repomix(
            cmd.append("--compress")
            cmd.append("--remove-empty-lines")
            cmd.append("-i")
            cmd.append(ignore_patterns)
        cmd.extend(["-o", output_file])
        run_command(cmd)
        log_message(f"Repository content mixed into {output_file}")
        log_message(f"Failed to mix repository: {e}")
def print_usage() -> None:
    log_message("Usage:")
    log_message("  cleanup.py status   # Show current status and run all checks")
    log_message("  cleanup.py venv     # Create virtual environment")
    log_message("  cleanup.py install  # Install package with all extras")
    log_message("  cleanup.py update   # Update and commit changes")
    log_message("  cleanup.py push     # Push changes to remote")
def main() -> NoReturn:
    new()  # Clear log file
    if len(sys.argv) < 2:
        print_usage()
        sys.exit(1)
    cleanup = Cleanup()
            cleanup.status()
            cleanup.venv()
            cleanup.install()
            cleanup.update()
            cleanup.push()
        log_message(f"Error: {e}")
    repomix()
    main()

================
File: LICENSE
================
MIT License

Copyright (c) 2025 Adam Twardoch

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

================
File: PROGRESS.md
================
---
this_file: PROGRESS.md
---

## Completed
- [x] Defined common parameters in base `SearchEngine` class
- [x] Updated search engines to use unified parameters
- [x] Added support for multiple search engines (Brave, Tavily, Perplexity, You.com, SerpAPI, Critique, DuckDuckGo)
- [x] Implemented initial version of Bing Scraper engine
- [x] Created basic testing infrastructure

## In Progress
- [ ] Fixing type checking and linting issues identified in cleanup report
- [ ] Completing Bing Scraper implementation and tests
- [ ] Addressing failing test in config environment variable loading

## Upcoming
- [ ] Enhancing test framework with mocks and fixtures
- [ ] Standardizing error handling across all engines
- [ ] Improving documentation and adding comprehensive examples
- [ ] Implementing performance optimizations
- [ ] Adding advanced features (caching, rate limiting, result normalization)

## Known Issues
- Environment variable loading not working correctly in tests
- Missing type annotations in several modules
- Excessive parameter counts in engine initialization methods
- Skipped tests for asynchronous components

See [TODO.md](TODO.md) for the detailed task breakdown and implementation plans.

================
File: pyproject.toml
================
# this_file: twat_search/pyproject.toml

# Build System Configuration
# -------------------------
# Specifies the build system and its requirements for packaging the project
# - hatchling: Modern, extensible build backend for Python projects
# - hatch-vcs: Automatically determines package version from version control system
[build-system]
requires = [
    "hatchling>=1.27.0",     # Core build backend for Hatch, providing modern packaging capabilities
    "hatch-vcs>=0.4.0",      # Plugin to dynamically generate version from Git tags/commits
]
build-backend = "hatchling.build"  # Use Hatchling as the build backend for consistent and flexible builds

# Wheel Distribution Configuration
# --------------------------------
# Controls how the package is built and distributed as a wheel
# Ensures only specific packages are included in the distribution
[tool.hatch.build.targets.wheel]
packages = ["src/twat_search"]  # Only include the src/twat_search directory in the wheel

# Project Metadata Configuration
# ------------------------------
# Comprehensive project description, requirements, and compatibility information
[project]
name = "twat-search"  # Unique package name for PyPI and installation
dynamic = ["version"]  # Version is dynamically determined from version control system
description = "Advanced search utilities and tools for the twat ecosystem"  # Short, descriptive package summary
readme = "README.md"  # Path to the project's README file for package description
requires-python = ">=3.10"  # Minimum Python version required, leveraging modern Python features
license = "MIT"  # Open-source license type
keywords = ["twat", "search", "utilities", "text-search", "indexing"]  # Keywords for package discovery
classifiers = [  # Metadata for package indexes and compatibility
    "Development Status :: 4 - Beta",
    "Programming Language :: Python",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Programming Language :: Python :: Implementation :: CPython",
    "Programming Language :: Python :: Implementation :: PyPy",
]

# Runtime Dependencies
# -------------------
# External packages required for the project to function
dependencies = [
    "twat>=1.8.1", # Core twat package, providing essential functionality
    "pydantic>=2.10.6", # Data validation and settings management
    "pydantic-settings>=2.8.0", # Settings management for Pydantic v2
    "httpx>=0.28.1", # HTTP client for API requests
    "python-dotenv>=1.0.1", # Environment variable management
    "fire>=0.5.0", # Command line interface generator
    "rich>=13.6.0", # Rich text and formatting for terminal output
]

# Project Authors
# ---------------
[[project.authors]]
name = "Adam Twardoch"  # Primary author's name
email = "adam+github@twardoch.com"  # Contact email for the author

# Project URLs
# ------------
# Links to project resources for documentation, issues, and source code
[project.urls]
Documentation = "https://github.com/twardoch/twat-search#readme"
Issues = "https://github.com/twardoch/twat-search/issues"
Source = "https://github.com/twardoch/twat-search"

# Twat Plugin Registration
# -----------------------
# Registers this package as a plugin for the twat ecosystem
[project.entry-points."twat.plugins"]
search = "twat_search"  # Plugin name and module for search utilities

# Version Management
# -----------------
# Configures automatic version generation from version control system
[tool.hatch.version]
source = "vcs"  # Use version control system (Git) to determine version

# Version Scheme
# --------------
# Defines how versions are generated and incremented
[tool.hatch.version.raw-options]
version_scheme = "post-release"  # Generates version numbers based on Git tags

# Version File Generation
# ----------------------
# Automatically creates a version file in the package
[tool.hatch.build.hooks.vcs]
version-file = "src/twat_search/__version__.py"

# Default development environment configuration
[tool.hatch.envs.default]
dependencies = [
    "pytest",                # Testing framework
    "pytest-cov",           # Coverage reporting
    "mypy>=1.15.0",         # Static type checker
    "ruff>=0.9.6",          # Fast Python linter
]

# Scripts available in the default environment
[tool.hatch.envs.default.scripts]
test = "pytest {args:tests}"
test-cov = "pytest --cov-report=term-missing --cov-config=pyproject.toml --cov=src/twat_search --cov=tests {args:tests}"
type-check = "mypy src/twat_search tests"
lint = ["ruff check src/twat_search tests", "ruff format src/twat_search tests"]

# Python version matrix for testing
[[tool.hatch.envs.all.matrix]]
python = ["3.10", "3.11", "3.12"]

# Linting environment configuration
[tool.hatch.envs.lint]
detached = true  # Run in isolated environment
dependencies = [
    "mypy>=1.15.0",         # Static type checker
    "ruff>=0.9.6",          # Fast Python linter
]

# Linting environment scripts
[tool.hatch.envs.lint.scripts]
typing = "mypy --install-types --non-interactive {args:src/twat_search tests}"
style = ["ruff check {args:.}", "ruff format {args:.}"]
fmt = ["ruff format {args:.}", "ruff check --fix {args:.}"]
all = ["style", "typing"]

# Ruff (linter) configuration
[tool.ruff]
target-version = "py310"
line-length = 88

# Ruff lint rules configuration
[tool.ruff.lint]
extend-select = [
    "A",     # flake8-builtins
    "ARG",   # flake8-unused-arguments
    "B",     # flake8-bugbear
    "C",     # flake8-comprehensions
    "DTZ",   # flake8-datetimez
    "E",     # pycodestyle errors
    "EM",    # flake8-errmsg
    "F",     # pyflakes
    "FBT",   # flake8-boolean-trap
    "I",     # isort
    "ICN",   # flake8-import-conventions
    "ISC",   # flake8-implicit-str-concat
    "N",     # pep8-naming
    "PLC",   # pylint convention
    "PLE",   # pylint error
    "PLR",   # pylint refactor
    "PLW",   # pylint warning
    "Q",     # flake8-quotes
    "RUF",   # Ruff-specific rules
    "S",     # flake8-bandit
    "T",     # flake8-debugger
    "TID",   # flake8-tidy-imports
    "UP",    # pyupgrade
    "W",     # pycodestyle warnings
    "YTT",   # flake8-2020
]
ignore = [
    "ARG001", # Unused function argument
    "E501",   # Line too long
    "I001",   # Import block formatting
]

# File-specific Ruff configurations
[tool.ruff.per-file-ignores]
"tests/*" = ["S101"]  # Allow assert in tests

# MyPy (type checker) configuration
[tool.mypy]
python_version = "3.10"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
check_untyped_defs = true
disallow_untyped_decorators = true
no_implicit_optional = true
warn_redundant_casts = true
warn_unused_ignores = true
warn_no_return = true
warn_unreachable = true

# Coverage.py configuration for test coverage
[tool.coverage.run]
source_pkgs = ["twat_search", "tests"]
branch = true
parallel = true
omit = [
    "src/twat_search/__about__.py",
]

# Coverage path mappings
[tool.coverage.paths]
twat_search = ["src/twat_search", "*/twat-search/src/twat_search"]
tests = ["tests", "*/twat-search/tests"]

# Coverage report configuration
[tool.coverage.report]
exclude_lines = [
    "no cov",
    "if __name__ == .__main__.:",
    "if TYPE_CHECKING:",
]

# Optional dependencies
[project.optional-dependencies]
test = [
    "pytest>=8.3.4",
    "pytest-cov>=6.0.0",
    "pytest-xdist>=3.6.1",                # For parallel test execution
    "pytest-benchmark[histogram]>=5.1.0", 
    "pytest-asyncio>=0.25.3", # For async test execution
]

dev = [
    "pre-commit>=4.1.0",     # Git pre-commit hooks
    "ruff>=0.9.6",           # Fast Python linter
    "mypy>=1.15.0",          # Static type checker
]

# Search engine dependencies
brave = []  # Brave Search uses only core dependencies

duckduckgo = [
    "duckduckgo-search>=7.3.0",  # DuckDuckGo search API
]

bing_scraper = [
    "scrape-bing>=0.1.2.1",  # Bing scraper
]

tavily = [
    "tavily-python>=0.5.0",  # Tavily search API
]

pplx = [
]

serpapi = [
    "serpapi>=0.1.5",  # SerpAPI search API
]

hasdata = []  # HasData API uses only core dependencies

# New search backends
google_scraper = [
    "googlesearch-python>=1.3.0",  # Google scraper
]

searchit = [
    "searchit",  # Multiple search engines with async scraping
]


# All search engines
all = ["twat",
    "duckduckgo-search>=7.3.0",
    "scrape-bing>=0.1.2.1",
    "tavily-python>=0.5.0",
    "serpapi>=0.1.5",
    "googlesearch-python>=1.3.0",
    "searchit",
    "requests>=2.31.0",
]

# Test environment configuration
[tool.hatch.envs.test]
dependencies = [".[test]"]

# Test environment scripts
[tool.hatch.envs.test.scripts]
test = "python -m pytest -n auto {args:tests}"
test-cov = "python -m pytest -n auto --cov-report=term-missing --cov-config=pyproject.toml --cov=src/twat_search --cov=tests {args:tests}"
bench = "python -m pytest -v -p no:briefcase tests/test_benchmark.py --benchmark-only"
bench-save = "python -m pytest -v -p no:briefcase tests/test_benchmark.py --benchmark-only --benchmark-json=benchmark/results.json"

# Pytest configuration
[tool.pytest.ini_options]
markers = ["benchmark: marks tests as benchmarks (select with '-m benchmark')"]
addopts = "-v -p no:briefcase"
testpaths = ["tests"]
python_files = ["test_*.py"]
filterwarnings = ["ignore::DeprecationWarning", "ignore::UserWarning"]
asyncio_mode = "auto"

# Pytest-benchmark configuration
[tool.pytest-benchmark]
min_rounds = 100
min_time = 0.1
histogram = true
storage = "file"
save-data = true
compare = [
    "min",    # Minimum time
    "max",    # Maximum time
    "mean",   # Mean time
    "stddev", # Standard deviation
    "median", # Median time
    "iqr",    # Inter-quartile range
    "ops",    # Operations per second
    "rounds", # Number of rounds
]

# Console Scripts
# --------------
# Command line interfaces exposed by this package
[project.scripts]
twat-search = "twat_search.__main__:main"
twat-search-web = "twat_search.web.cli:main"

================
File: README.md
================
# Twat Search: multi-engine web search aggregator

## Executive summary

Twat Search is a powerful, asynchronous Python package that provides a unified interface to query multiple search engines simultaneously. It facilitates efficient information retrieval by aggregating, normalizing, and processing results from various search providers through a consistent API. This comprehensive documentation serves as a definitive guide for both CLI and Python usage of the package.

## Key features

- **Multi-Engine Search**: A single query can simultaneously search across multiple providers including Brave, Google (via SerpAPI/HasData), Tavily, Perplexity, You.com, Bing (via web scraping), and more
- **Asynchronous Operation**: Leverages `asyncio` for concurrent searches, maximizing speed and efficiency
- **Rate Limiting**: Built-in mechanisms to prevent exceeding API limits of individual search providers
- **Strong Typing**: Full type annotations and Pydantic validation for improved code reliability and maintainability
- **Robust Error Handling**: Custom exception classes for graceful error management
- **Flexible Configuration**: Configure search engines via environment variables, `.env` files, or directly in code
- **Extensible Architecture**: Designed for easy addition of new search engines
- **Command-Line Interface**: Rich, interactive CLI for searching and exploring engine configurations
- **JSON Output**: Supports JSON output for easy integration with other tools

## Installation options

### Full installation

```bash
uv pip install --system twat-search[all]
```

or 

```bash
uv pip install --system twat-search[all]
```


### Selective installation

Install only specific engine dependencies:

```bash
# Example: install only brave and duckduckgo dependencies
pip install "twat-search[brave,duckduckgo]"

# Example: install duckduckgo and bing scraper
pip install "twat-search[duckduckgo,bing_scraper]"
```

After installation, both `Twat Search` and `Twat Search-web` commands should be available in your PATH. Alternatively, you can run:

```bash
python -m twat_search.__main__
python -m twat_search.web.cli
```

## Quick start guide

### Python API

```python
import asyncio
from twat_search.web import search

async def main():
    # Search across all configured engines
    results = await search("quantum computing applications")

    # Print results
    for result in results:
        print(f"[{result.source}] {result.title}")
        print(f"URL: {result.url}")
        print(f"Snippet: {result.snippet}\n")

# Run the async function
asyncio.run(main())
```

### Command line interface

```bash
# Search using all available engines
Twat Search q "climate change solutions"

# Search with specific engines
Twat Search q "machine learning frameworks" --engines brave,tavily

# Get json output
Twat Search q "renewable energy" --json

# Use engine-specific command
Twat Search brave "web development trends" --count 10
```

## Core architecture

### Module structure

```
twat_search/
└── web/
    ├── engines/            # Individual search engine implementations
    │   ├── __init__.py     # Engine registration and availability checks
    │   ├── base.py         # Base SearchEngine class definition
    │   ├── brave.py        # Brave search implementation
    │   ├── bing_scraper.py # Bing scraper implementation
    │   └── ...             # Other engine implementations
    ├── __init__.py         # Module exports
    ├── api.py              # Main search API
    ├── cli.py              # Command-line interface
    ├── config.py           # Configuration handling
    ├── exceptions.py       # Custom exceptions
    ├── models.py           # Data models
    └── utils.py            # Utility functions
```

## Supported search engines

Twat Search provides a consistent interface to the following search engines:

| Engine | Module | API Key Required | Description | Package Extra |
| --- | --- | --- | --- | --- |
| Brave | `brave` | Yes | Web search via Brave Search API | `brave` |
| Brave News | `brave_news` | Yes | News search via Brave API | `brave` |
| You.com | `you` | Yes | Web search via You.com API | - |
| You.com News | `you_news` | Yes | News search via You.com API | - |
| Tavily | `tavily` | Yes | Research-focused search API | `tavily` |
| Perplexity | `pplx` | Yes | AI-powered search with detailed answers | `pplx` |
| SerpAPI | `serpapi` | Yes | Google search results via SerpAPI | `serpapi` |
| HasData Google | `hasdata-google` | Yes | Google search results via HasData API | `hasdata` |
| HasData Google Light | `hasdata-google-light` | Yes | Light version of HasData API | `hasdata` |
| Critique | `critique` | Yes | Visual and textual search capabilities | - |
| DuckDuckGo | `duckduckgo` | No | Privacy-focused search results | `duckduckgo` |
| Bing Scraper | `bing_scraper` | No | Web scraping of Bing search results | `bing_scraper` |

## Detailed usage guide

### Python API

#### The `search()` function

The core function for performing searches is `twat_search.web.search()` :

```python
from twat_search.web import search, Config

# Basic usage
results = await search("python async programming")

# Advanced usage with specific engines and parameters
results = await search(
    query="python async programming",
    engines=["brave", "tavily", "bing_scraper"],
    num_results=5,
    language="en",
    country="US",
    safe_search=True
)
```

Parameters:

- **`query`**: The search query string (required)
- **`engines`**: A list of engine names to use (e.g., `["brave", "tavily"]`). If `None` or empty, all configured engines will be used
- **`config`**: A `Config` object. If `None`, configuration is loaded from environment variables
- **`**kwargs`\*\*: Additional parameters passed to engines. These can be:
  - General parameters applied to all engines (e.g., `num_results=10`)
  - Engine-specific parameters with prefixes (e.g., `brave_count=20`, `tavily_search_depth="advanced"`)

#### Engine-specific functions

Each engine provides a direct function for individual access:

```python
from twat_search.web.engines.brave import brave
from twat_search.web.engines.bing_scraper import bing_scraper

# Using brave search
brave_results = await brave(
    query="machine learning tutorials",
    count=10,
    country="US",
    safe_search=True
)

# Using bing scraper (no api key required)
bing_results = await bing_scraper(
    query="data science projects",
    num_results=10,
    max_retries=3,
    delay_between_requests=1.0
)
```

#### Working with search results

The `SearchResult` model provides a consistent structure across all engines:

```python
from twat_search.web.models import SearchResult
from pydantic import HttpUrl

# Creating a search result
result = SearchResult(
    title="Example Search Result",
    url=HttpUrl("https://example.com"),
    snippet="This is an example search result snippet...",
    source="brave",
    raw={"original_data": "from_engine"}  # Optional raw data
)

# Accessing properties
print(result.title)    # "Example Search Result"
print(result.url)      # "https://example.com/"
print(result.source)   # "brave"
print(result.snippet)  # "This is an example search result snippet..."
```

### Command line interface

The CLI provides convenient access to all search engines through the `Twat Search` command.

#### General search command

```bash
Twat Search q <query> [options]
```

Common options:

- `--engines <engine1,engine2,...>`: Specify engines to use
- `--num_results <n>`: Number of results to return
- `--country <country_code>`: Country to search in (e.g., "US", "GB")
- `--language <lang_code>`: Language to search in (e.g., "en", "es")
- `--safe_search <true|false>`: Enable or disable safe search
- `--json`: Output results in JSON format
- `--verbose`: Enable verbose logging

Engine-specific parameters can be passed with `--<engine>_<param> <value>` , for example:

```bash
Twat Search q "machine learning" --brave_count 15 --tavily_search_depth advanced
```

#### Engine information command

```bash
Twat Search info [engine_name] [--json]
```

- Shows information about available search engines
- If `engine_name` is provided, shows detailed information about that engine
- The `--json` flag outputs in JSON format

#### Engine-specific commands

Each engine has a dedicated command for direct access:

```bash
# Brave search
Twat Search brave "web development trends" --count 10

# Duckduckgo search
Twat Search duckduckgo "privacy tools" --max_results 5

# Bing scraper
Twat Search bing_scraper "python tutorials" --num_results 10

# Critique with image
Twat Search critique --image-url "https://example.com/image.jpg" "Is this image real?"
```

## Configuration management

### Environment variables

Configure engines using environment variables:

```bash
# Api keys
BRAVE_API_KEY=your_brave_api_key
TAVILY_API_KEY=your_tavily_api_key
PERPLEXITY_API_KEY=your_perplexity_api_key
YOU_API_KEY=your_you_api_key
SERPAPI_API_KEY=your_serpapi_api_key
CRITIQUE_API_KEY=your_critique_api_key
HASDATA_API_KEY=your_hasdata_api_key

# Engine enablement
BRAVE_ENABLED=true
TAVILY_ENABLED=true
PERPLEXITY_ENABLED=true
YOU_ENABLED=true
SERPAPI_ENABLED=true
CRITIQUE_ENABLED=true
DUCKDUCKGO_ENABLED=true
BING_SCRAPER_ENABLED=true
HASDATA_GOOGLE_ENABLED=true

# Default parameters (json format)
BRAVE_DEFAULT_PARAMS={"count": 10, "safesearch": "off"}
TAVILY_DEFAULT_PARAMS={"max_results": 5, "search_depth": "basic"}
PERPLEXITY_DEFAULT_PARAMS={"model": "pplx-7b-online"}
YOU_DEFAULT_PARAMS={"safe_search": true, "count": 8}
SERPAPI_DEFAULT_PARAMS={"num": 10, "gl": "us"}
HASDATA_GOOGLE_DEFAULT_PARAMS={"location": "Austin,Texas,United States", "device_type": "desktop"}
DUCKDUCKGO_DEFAULT_PARAMS={"max_results": 10, "safesearch": "moderate", "time": "d"}
BING_SCRAPER_DEFAULT_PARAMS={"max_retries": 3, "delay_between_requests": 1.0}

# Global default for all engines
NUM_RESULTS=5
```

You can store these in a `.env` file in your project directory, which will be automatically loaded by the library using `python-dotenv` .

### Programmatic configuration

Configure engines programmatically when using the Python API:

```python
from twat_search.web import Config, EngineConfig, search

# Create custom configuration
config = Config(
    engines={
        "brave": EngineConfig(
            api_key="your_brave_api_key",
            enabled=True,
            default_params={"count": 10, "country": "US"}
        ),
        "bing_scraper": EngineConfig(
            enabled=True,
            default_params={"max_retries": 3, "delay_between_requests": 1.0}
        ),
        "tavily": EngineConfig(
            api_key="your_tavily_api_key",
            enabled=True,
            default_params={"search_depth": "advanced"}
        )
    }
)

# Use the configuration
results = await search("quantum computing", config=config)
```

## Engine-specific parameters

Each search engine accepts different parameters. Here's a reference for commonly used ones:

### Brave search

```python
await brave(
    query="search term",
    count=10,              # Number of results (default: 10)
    country="US",          # Country code (ISO 3166-1 alpha-2)
    search_lang="en",      # Search language
    ui_lang="en",          # UI language
    safe_search=True,      # Safe search (True/False)
    freshness="day"        # Time frame (day, week, month)
)
```

### Bing scraper

```python
await bing_scraper(
    query="search term",
    num_results=10,                # Number of results
    max_retries=3,                 # Maximum retry attempts
    delay_between_requests=1.0     # Delay between requests (seconds)
)
```

### Tavily

```python
await tavily(
    query="search term",
    max_results=5,               # Number of results (default: 5)
    search_depth="basic",        # Search depth (basic, advanced)
    include_domains=["example.com"],  # Domains to include
    exclude_domains=["spam.com"],     # Domains to exclude
    include_answer=True,         # Include AI-generated answer
    search_type="search"         # Search type (search, news, etc.)
)
```

### Perplexity (pplx)

```python
await pplx(
    query="search term",
    model="pplx-70b-online"      # Model to use for search
)
```

### You.com

```python
await you(
    query="search term",
    num_results=10,              # Number of results
    country_code="US",           # Country code
    safe_search=True             # Safe search (True/False)
)
```

### Duckduckgo

```python
await duckduckgo(
    query="search term",
    max_results=10,              # Number of results
    region="us-en",              # Region code
    safesearch=True,             # Safe search (True/False)
    timelimit="m",               # Time limit (d=day, w=week, m=month)
    timeout=10                   # Request timeout (seconds)
)
```

### Critique (with image)

```python
await critique(
    query="Is this image real?",
    image_url="https://example.com/image.jpg",  # URL to image
    # OR
    image_base64="base64_encoded_image_data",   # Base64 encoded image
    source_whitelist=["trusted-site.com"],      # Optional domain whitelist
    source_blacklist=["untrusted-site.com"],    # Optional domain blacklist
    output_format="text"                        # Output format
)
```

## Error handling framework

Twat Search provides custom exception classes for proper error handling:

```python
from twat_search.web.exceptions import SearchError, EngineError

try:
    results = await search("quantum computing")
except EngineError as e:
    print(f"Engine-specific error: {e}")
    # e.g., "Engine 'brave': API key is required"
except SearchError as e:
    print(f"General search error: {e}")
    # e.g., "No search engines configured"
```

The exception hierarchy:

- `SearchError`: Base class for all search-related errors
- `EngineError`: Subclass for engine-specific errors, includes the engine name in the message

Typical error scenarios:

- Missing API keys
- Network errors
- Rate limiting
- Invalid responses
- Configuration errors

## Advanced usage techniques

### Concurrent searches

Search across multiple engines concurrently:

```python
import asyncio
from twat_search.web.engines.brave import brave
from twat_search.web.engines.tavily import tavily

async def search_multiple(query):
    brave_task = brave(query)
    tavily_task = tavily(query)

    results = await asyncio.gather(brave_task, tavily_task, return_exceptions=True)

    brave_results, tavily_results = [], []
    if isinstance(results[0], list):
        brave_results = results[0]
    if isinstance(results[1], list):
        tavily_results = results[1]

    return brave_results + tavily_results

# Usage
results = await search_multiple("artificial intelligence")
```

### Custom engine parameters

Specify engine-specific parameters in the unified search function:

```python
from twat_search.web import search

results = await search(
    "machine learning",
    engines=["brave", "tavily", "bing_scraper"],
    # Common parameters
    num_results=10,
    country="US",

    # Engine-specific parameters
    brave_count=15,
    brave_freshness="week",
    tavily_search_depth="advanced",
    bing_scraper_max_retries=5
)
```

### Rate limiting

Use the built-in rate limiter to avoid hitting API limits:

```python
from twat_search.web.utils import RateLimiter

# Create a rate limiter with 5 calls per second
limiter = RateLimiter(calls_per_second=5)

# Use in an async context
async def rate_limited_search():
    for query in ["python", "javascript", "rust", "golang"]:
        limiter.wait_if_needed()  # Wait if necessary
        results = await search(query)
        # Process results...
```

## Development guide

### Running tests

```bash
# Install test dependencies
pip install "twat-search[test]"

# Run tests
pytest

# Run with coverage
pytest --cov=src/twat_search

# Run tests in parallel
pytest -n auto
```

### Adding a new search engine

To add a new search engine:

1. Create a new file in `src/twat_search/web/engines/`
2. Implement a class that inherits from `SearchEngine`
3. Implement the required methods and register the engine

Example:

```python
from pydantic import HttpUrl
from twat_search.web.engines.base import SearchEngine, register_engine
from twat_search.web.models import SearchResult
from twat_search.web.config import EngineConfig

@register_engine
class MyNewSearchEngine(SearchEngine):
    name = "my_new_engine"
    env_api_key_names = ["MY_NEW_ENGINE_API_KEY"]

    def __init__(self, config: EngineConfig, **kwargs) -> None:
        super().__init__(config, **kwargs)
        # Initialize engine-specific parameters

    async def search(self, query: str) -> list[SearchResult]:
        # Implement search logic
        return [
            SearchResult(
                title="My Result",
                url=HttpUrl("https://example.com"),
                snippet="Result snippet",
                source=self.name
            )
        ]

# Convenience function
async def my_new_engine(query: str, **kwargs):
    # Implement convenience function
    # ...
```

### Development setup

To contribute to `Twat Search` , follow these steps:

1. Clone the repository:

```bash
   git clone https://github.com/twardoch/Twat Search.git
   cd Twat Search
```

2. Set up the virtual environment with `uv`:

```bash
   uv venv
   source .venv/bin/activate
```

3. Install development dependencies:

```bash
   uv pip install -e ".[test,dev]"
```

4. Run tests:

```bash
   uv run pytest
```

5. Run type checking:

```bash
   uv run mypy src tests
```

6. Run linting:

```bash
   uv run ruff check src tests
```

7. Use `cleanup.py` for project maintenance:

```bash
   python cleanup.py status
```

## Troubleshooting guide

### Api key issues

If you're encountering API key errors:

1. Verify the API key is set correctly in environment variables
2. Check the API key format is valid for the specific provider
3. Ensure the API key has the necessary permissions
4. For engines that require API keys, verify the key is set via one of these methods:
   - Environment variable (e.g., `BRAVE_API_KEY` )
   - `.env` file
   - Programmatic configuration

### Rate limiting problems

If you're being rate limited by search providers:

1. Reduce the number of concurrent requests
2. Use the `RateLimiter` utility to space out requests
3. Consider upgrading your API plan with the provider
4. Add delay between requests for engines that support it (e.g., `delay_between_requests` for Bing Scraper)

### No results returned

If you're not getting results:

1. Check that the engine is enabled (`ENGINE_ENABLED=true`)
2. Verify your query is not empty or too restrictive
3. Try with safe search disabled to see if content filtering is the issue
4. Check for engine-specific errors in the logs (use `--verbose` flag with CLI)
5. Ensure you have the required dependencies installed for the engine

### Common error messages

- `"Engine 'X': API key is required"`: The engine requires an API key that hasn't been configured
- `"No search engines configured"`: No engines are enabled or available
- `"Unknown search engine: X"`: The specified engine name is invalid
- `"Engine 'X': is disabled"`: The engine is registered but disabled in configuration

## Development status

Version: 1.8.1

Twat Search is actively developed. See [PROGRESS.md](PROGRESS.md) for completed tasks and [TODO.md](TODO.md) for planned features and improvements.

## Contributing

Contributions are welcome! Please check [TODO.md](TODO.md) for areas that need work. Submit pull requests or open issues on GitHub. Key areas for contribution:

- Adding new search engines
- Improving test coverage
- Enhancing documentation
- Optimizing performance
- Implementing advanced features (e.g., caching, result normalization)

## License

Twat Search is released under the MIT License. See the [LICENSE](LICENSE) file for details.

---

## Appendix: available engines and requirements

| Engine | Package Extra | API Key Required | Environment Variable | Notes |
| --- | --- | --- | --- | --- |
| Brave | `brave` | Yes | `BRAVE_API_KEY` | General web search engine |
| Brave News | `brave` | Yes | `BRAVE_API_KEY` | News-specific search |
| You.com | - | Yes | `YOU_API_KEY` | AI-powered web search |
| You.com News | - | Yes | `YOU_API_KEY` | News-specific search |
| Tavily | `tavily` | Yes | `TAVILY_API_KEY` | Research-focused search |
| Perplexity | `pplx` | Yes | `PPLX_API_KEY` | AI-powered search with detailed answers |
| SerpAPI | `serpapi` | Yes | `SERPAPI_API_KEY` | Google search results API |
| HasData Google | `hasdata` | Yes | `HASDATA_API_KEY` | Google search results API |
| HasData Google Light | `hasdata` | Yes | `HASDATA_API_KEY` | Lightweight Google search API |
| Critique | - | Yes | `CRITIQUE_API_KEY` | Supports image analysis |
| DuckDuckGo | `duckduckgo` | No | - | Privacy-focused search |
| Bing Scraper | `bing_scraper` | No | - | Uses web scraping techniques |

================
File: TODO.md
================
# twat-search Web Package - Future Tasks

The basic implementation of the `twat-search` web package is complete.

Tip: Periodically run `./cleanup.py status` to see results of lints and tests.

## 1. Phase 1

### 1.1. Complete Bing Scraper Implementation

- [ ] Fix implementation issues in bing_scraper.py
  - Add proper type annotations to all methods
  - Implement better error handling with appropriate context
- [ ] Complete test coverage for BingScraperSearchEngine
  - Fix skipped tests in test_bing_scraper.py
  - Add tests for error conditions and edge cases
- [ ] Document Bing Scraper functionality
  - Add comprehensive docstrings
  - Include usage examples in README


### 1.2. Documentation and Examples

- [ ] Add comprehensive docstrings to all classes and methods
  - Include parameter descriptions
  - Document exceptions that can be raised
  - Add usage examples
- [ ] Create detailed README examples
  - Basic usage examples for each engine
  - Advanced configuration examples
  - Error handling examples
- [ ] Document environment variable configuration
  - Create a comprehensive list of all supported environment variables
  - Add examples of .env file configuration


## 2. Phase 2

### 2.1. Type Checking Errors

- [ ] Fix missing type stubs for third-party modules
  - `duckduckgo_search` and `scrape_bing` are showing import-not-found errors
  - Create local stub files or install type stubs if available
- [ ] Add type annotations to functions missing them
  - Particularly in bing_scraper.py, need to add annotations to search methods
- [ ] Fix attribute errors in You.py engine
  - "YouBaseEngine" has no attribute errors for num_results_param and base_url
- [ ] Resolve incompatible types in engine assignments in **init**.py
- [ ] Fix the test_config_with_env_vars failure (api_key not being set correctly)

### 2.2. Linting Issues

- [ ] Address boolean parameter warnings (FBT001, FBT002)
  - Consider using keyword-only arguments for boolean parameters
  - Or create specific enum types for boolean options
- [ ] Fix functions with too many parameters (PLR0913)
  - Refactor using parameter objects or configuration objects
  - Consider breaking down complex functions
- [ ] Resolve magic values in code (PLR2004)
  - Replace hardcoded numbers like 100, 5, 10 with named constants
- [ ] Clean up unused imports (F401)
  - Remove or properly use imported modules

### 2.3. Improve Test Framework

- [ ] Implement mock search engines for all providers
  - Create standardized mock responses
  - Enable offline testing without API keys
- [ ] Add integration tests
  - Test the entire search workflow
  - Test concurrent searches across multiple engines
- [ ] Create test fixtures for common configurations
  - Standard API response data
  - Common error cases
- [ ] Fix test_config_with_env_vars failure
  - Debug why environment variables aren't being properly loaded

## 3. Phase 3

### 3.1. Enhance Engine Implementations

- [ ] Standardize error handling across all engines
  - Use consistent error context and messages
  - Properly propagate exceptions with 'from exc'
- [ ] Optimize parameter handling in engines
  - Reduce code duplication in parameter mapping
  - Create utility functions for common parameter conversions
- [ ] Add timeouts to all HTTP requests
  - Ensure all engines have consistent timeout handling
  - Add configurable timeout parameters

## 4. Phase 4

### 4.1. Additional Features

- [ ] Add result caching mechanism
  - Implement optional caching of search results
  - Add configurable cache expiration
- [ ] Implement rate limiting for all engines
  - Ensure all engines respect API rate limits
  - Add configurable backoff strategies
- [ ] Add result normalization
  - Create a more consistent result format across engines
  - Implement result scoring and ranking

### 4.2. Performance Improvements

- [ ] Profile search performance across engines
  - Measure latency and throughput
  - Identify performance bottlenecks
- [ ] Implement connection pooling for HTTP clients
  - Reuse connections where possible
  - Configure appropriate connection limits
- [ ] Add parallelization options for multiple searches
  - Control concurrency limits
  - Implement proper resource cleanup

================
File: VERSION.txt
================
v1.8.1



================================================================
End of Codebase
================================================================
