Metadata-Version: 2.4
Name: achem
Version: 1.1.0
Summary: Deep Web Research Tool with Structural Positional Search - AI-powered synthesis using Ordinal Distance algorithm
Author: Sarok
License-Expression: MIT
Project-URL: Homepage, https://github.com/sarok-exe/achem
Project-URL: Documentation, https://github.com/sarok-exe/achem#readme
Project-URL: Repository, https://github.com/sarok-exe/achem
Project-URL: Issues, https://github.com/sarok-exe/achem/issues
Keywords: research,deep-web,web-scraping,summarization,ai,cli,tool,ordinal-distance,positional-search,ollama
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: wikipedia-api>=0.5.4
Requires-Dist: rich>=13.0.0
Requires-Dist: psutil>=5.9.0
Requires-Dist: prompt_toolkit>=3.0.0
Requires-Dist: pyfiglet>=0.8.0
Requires-Dist: openai>=1.0.0
Requires-Dist: groq>=1.0.0
Requires-Dist: google-genai>=1.0.0
Requires-Dist: ddgs>=3.0.0
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: requests>=2.31.0
Requires-Dist: trafilatura>=1.6.0
Requires-Dist: textual>=0.50.0
Requires-Dist: duckduckgo-search>=4.0.0
Provides-Extra: arabic
Requires-Dist: arabic-reshaper>=3.0.0; extra == "arabic"
Requires-Dist: python-bidi>=0.14.0; extra == "arabic"
Provides-Extra: ollama
Requires-Dist: ollama>=0.1.0; extra == "ollama"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Provides-Extra: all
Requires-Dist: arabic-reshaper>=3.0.0; extra == "all"
Requires-Dist: python-bidi>=0.14.0; extra == "all"
Requires-Dist: ollama>=0.1.0; extra == "all"
Dynamic: license-file

# ACHEM - Deep Web Research Tool

![ACHEM Banner](https://img.shields.io/badge/ACHEM-v1.1.0-blue?style=for-the-badge)

> **ACHEM** (Arabic: آشم) is a powerful deep web research tool that extracts content from 100+ sources, scrapes full article text, filters relevant content, and generates AI-powered conclusions.

## Features

- **100+ Sources**: Searches DuckDuckGo for up to 100 results
- **Full Content Extraction**: Scrapes full article text using Trafilatura
- **Smart Content Filtering**: Removes ads/boilerplate, keeps only relevant sentences
- **AI Conclusions**: Generates synthesized final verdicts with probability predictions
- **Multi-AI Providers**: OpenRouter (free), Groq, Gemini, Ollama
- **Markdown Export**: Saves complete reports with all sources to `~/Documents/ACHEM/`
- **Multi-language**: Supports English, French, and Arabic
- **Rate Limit Retry**: Automatic retry on 429 errors

## Installation

### Prerequisites

- Python 3.10 or higher
- uv package manager (recommended)

### Quick Install

```bash
git clone https://github.com/sarok-exe/achem.git
cd achem
uv venv .venv && source .venv/bin/activate
uv pip install -e .
```

### API Configuration

Create config at `~/.ACHEM/api.env` or `~/Documents/ACHEM/api.env`:

```bash
# OpenRouter (free, recommended)
OPENROUTER_API_KEY=your_openrouter_key_here
OPENROUTER_MODEL=google/gemma-4-31b-it:free

# Ollama (local AI)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2
OLLAMA_PRIMARY=false
```

Get OpenRouter API key: https://openrouter.ai/settings

## Usage

### Command Line

```bash
achem "your research query" --ddg-limit 100
```

### Options

```bash
--ddg-limit N        Number of DuckDuckGo results (default: 100)
--mode ai           Use AI for conclusions (default)
--mode local        Use local TF-IDF (no API needed)
--lang en/fr/ar     Response language
--no-wikipedia      Skip Wikipedia sources
--no-cache         Skip cache
```

## How It Works

```
┌─────────────────────────────────────────────────────┐
│ 1. SEARCH (100+ sources)                           │
│    • DuckDuckGo web search                         │
│    • Prioritizes relevant content                 │
├─────────────────────────────────────────────────────┤
│ 2. SCRAPE (Full article text)                      │
│    • Extracts full content from URLs               │
│    • Uses Trafilatura for clean text               │
│    • Scrapes up to 100 pages concurrently         │
├─────────────────────────────────────────────────────┤
│ 3. FILTER (Relevant content only)                    │
│    • Removes boilerplate and ads                   │
│    • Keeps sentences matching keywords              │
│    • Deduplicates similar content                  │
├─────────────────────────────────────────────────────┤
│ 4. AI CONCLUSION                                   │
│    • Analyzes all content                          │
│    • Generates final prediction                    │
│    • Includes probability percentages               │
│    • Provides key reasons                          │
└─────────────────────────────────────────────────────┘
```

## Output

Reports saved to `~/Documents/ACHEM/` include:
- **AI Conclusion**: Synthesized final prediction
- **All Articles**: Full extracted content from each source
- **Keywords**: Identified topics
- **Extracted Web Content**: Combined filtered content

## License

MIT License - see [LICENSE](LICENSE) file

## Acknowledgments

- [OpenRouter](https://openrouter.ai/) - Free AI models
- [DuckDuckGo](https://duckduckgo.com/) - Privacy-focused search
- [Trafilatura](https://trafilatura.readthedocs.io/) - Web content extraction
- [Sumy](https://miso-belka.github.io/sumy/) - Text summarization
