Metadata-Version: 2.1
Name: unique_web_search
Version: 1.5.0
Summary: 
License: Proprietary
Author: Andreas Hauri
Author-email: andreas@unique.ch
Requires-Python: >=3.12,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: azure-ai-projects (>=1.0.0,<2.0.0)
Requires-Dist: azure-core (>=1.36.0,<2.0.0)
Requires-Dist: azure-identity (>=1.25.0,<2.0.0)
Requires-Dist: crawl4ai (>=0.6.3,<0.7.0)
Requires-Dist: fake-useragent (>=2.2.0,<3.0.0)
Requires-Dist: firecrawl (>=3.3.2,<4.0.0)
Requires-Dist: langchain-community (>=0.3.1,<0.4.0)
Requires-Dist: markdownify (>=0.14.1,<0.15.0)
Requires-Dist: pandas (==2.2.3)
Requires-Dist: pydantic (>=2.12.3,<3.0.0)
Requires-Dist: pydantic-settings (>=2.10.1,<3.0.0)
Requires-Dist: pytest (>=8.4.1,<9.0.0)
Requires-Dist: pytest-asyncio (>=1.2.0,<2.0.0)
Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
Requires-Dist: tavily-python (>=0.7.11,<0.8.0)
Requires-Dist: timeout-decorator (>=0.5.0,<0.6.0)
Requires-Dist: typing-extensions (>=4.9.0,<5.0.0)
Requires-Dist: unidecode (>=1.4.0,<2.0.0)
Requires-Dist: unique-sdk (>=0.10.33,<0.11.0)
Requires-Dist: unique-toolkit (>=1.19.2,<2.0.0)
Description-Content-Type: text/markdown

# Unique Web Search

A powerful, configurable web search tool for retrieving and processing the latest information from the internet. This package provides intelligent search capabilities with support for multiple search engines, web crawlers, and content processing strategies.

## Architecture

The following diagram illustrates the complete architecture and workflow of the unique_web_search package:

![Web Search Tool Architecture](doc/images/architecture-diagram.svg)

## Key Features

- **Dual Execution Modes**:
  - **V1 (Traditional)**: Query refinement with single or multiple search strategies
  - **V2 (Step-based Planning)**: Advanced research planning with parallel execution
  
- **Multiple Search Engines**:
  - Google Search
  - Bing Search
  - Brave Search
  - Jina Search
  - Tavily Search
  - Firecrawl Search

- **Multiple Web Crawlers**:
  - Basic HTTP Crawler
  - Crawl4AI
  - Jina Reader
  - Tavily Crawler
  - Firecrawl Crawler

- **Intelligent Content Processing**:
  - LLM-based summarization
  - Token-based truncation
  - Relevancy scoring and sorting
  - Content chunking and optimization

- **Query Refinement**:
  - **BASIC Mode**: Single optimized search query
  - **ADVANCED Mode**: Multiple targeted search queries for complex research

- **Performance Optimized**:
  - Parallel execution of search and crawl operations
  - Token limit management
  - Configurable timeouts and error handling

## Configuration

The tool uses environment variables and configuration files to manage API keys and settings. Key configuration areas include:

- Search engine selection and API keys
- Crawler selection and configuration
- Content processing strategies (SUMMARIZE, TRUNCATE, NONE)
- Token limits and relevancy thresholds
- Proxy configuration
- Debug and monitoring options

## Workflow

1. **Input**: User query or structured search plan
2. **Configuration**: Load settings and initialize services
3. **Execution**: 
   - V1: Query refinement → Search → Crawl → Process
   - V2: Execute planned steps in parallel → Process
4. **Content Processing**: Clean, summarize/truncate, and chunk content
5. **Optimization**: Reduce to token limits and sort by relevance
6. **Output**: Return structured content chunks optimized for LLM consumption
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), 
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.5.0] - 2025-11-10
- Add support for private endpoint transport (for Workload identity authentication)

## [1.4.0] - 2025-11-10
- Expose Search Mode Configuration

## [1.3.6] - 2025-10-29
- Fix minor notification display issue and remove unnecssary log

## [1.3.5] - 2025-10-29
- Upgrading azure-ai-projects to 1.0.0 version (relevant for bing search)

## [1.3.4] - 2025-10-28
- Removing unused tool specific `get_tool_call_result_for_loop_history` function

## [1.3.3] - 2025-10-14
- Fix bug in selecting the refine query mode

## [1.3.2] - 2025-10-10
- Add possibility to switch proxy auth protocol (http or https)

## [1.3.1] - 2025-10-09
- Update loading path of `DEFAULT_GPT_4o` from `unique_toolkit` 

## [1.3.0] - 2025-10-06
- **Proxy Authentication Support**: Route search engine and crawler requests through proxies with multiple authentication methods:
  - Username/Password authentication
  - Client Certificate authentication
- **Active Crawlers**: Dynamic crawler activation system allowing selective enablement of crawling services:
  - **In-house crawlers**: Control activation via environment variables for internal crawlers (Basic, Crawl4AI.)
  - **External crawlers**: Auto-activate when API keys are configured (Firecrawl, Jina, Tavily)
- **Test Coverage**: Added comprehensive tests to ensure web search tool stability and reliability

## [1.2.0] - 2025-09-29
- Mark new crawlers as experimental

## [1.1.0] - 2025-09-24
- Set active search engine through `active_search_engines` env variable

## [1.0.3] - 2025-09-23
- Add field to track execution time of the excutors

## [1.0.2] - 2025-09-23
- Paralellize steps execution for V2 mode.

## [1.0.1] - 2025-09-23
- Add octet-stream to blacklisted content-types and allow to change the unwanted-types from config

## [1.0.0] - 2025-09-18
- Bump toolkit version to allow for both patch and minor updates

### [0.2.0] - 2025-09-17
- Add support for Brave and Grounding by Bing through azure

## [0.1.4] - 2025-09-17
- Updated to latest toolkit

### [0.1.3] - 2025-09-17
- Add content utf8 cleanup logic when processing content

### [0.1.2] - 2025-09-15
- Fix Minor bug in transforming toolResponse to toolCallResult

## [0.1.1] - 2025-09-15
### Added
- **WebSearchV2Executor**: New step-based execution model supporting both search and direct URL reading operations
- **BaseWebSearchExecutor**: Abstract base class providing common functionality between executor versions
- **Enhanced Schema**: New model `WebSearchPlan` for structured web search planning
- **Flexible Step Execution**: Support for mixed search and URL reading operations in a single plan

### Changed
- **Architecture Refactor**: Improved executor structure with better separation of concerns
- **Configuration Enhancement**: Added experimental features flag to switch between V1 and V2 modes
- **Progress Reporting**: Enhanced with step-specific notifications and better user feedback

### Maintained
- **Backward Compatibility**: Existing V1 executor functionality preserved
- **API Consistency**: No breaking changes to existing tool interfaces

## [0.1.0] - 2025-09-12
- Code simplification
- Enable new crawlers
- Default cleaning of search results
- Refactor of code structure and crawler location

## [0.0.6] - 2025-09-05
- Updated unique_web_search README.

## [0.0.5] - 2025-09-04
- Path change of loading local .env.

## [0.0.4] - 2025-09-01
- Reduce default crawler timeout to 10s.

## [0.0.3] - 2025-08-18
- Auto-register Tool in Factory.

## [0.0.2] - 2025-08-18
- Moved out of private repo to public repo.

## [0.0.1] - 2025-08-18
- Initial release of `web_search`.
