Metadata-Version: 2.1
Name: softhauzpy
Version: 0.0.7
Author: Karen Urate
Author-email: karen.urate@softhauz.ca
Description-Content-Type: text/markdown

# SofthauzPy
**SofthauzPy** is a comprehensive Python toolkit built for developers creating intelligent, data-driven web applications. It provides a powerful suite of web utilities including web scraping tools, crawling systems, content extraction pipelines, and search engine components that help developers build fully customizable in-house website search solutions.

Designed for scalability and flexibility, **SofthauzPy** enables teams to collect, process, index, and search website content efficiently — all within a clean Python-first development ecosystem.

Built for developers who need scalable web data tools and intelligent search capabilities, **SofthauzPy** simplifies the process of scraping, processing, indexing, and searching website content.
From lightweight crawlers to fully customizable in-house search engine functionality, **SofthauzPy** helps developers build smarter web applications without relying heavily on external search services.


## Key Features

**Web Scraping & Crawling**

-   High-performance web scraping utilities
-   HTML parsing and structured data extraction
-   Recursive website crawling
-   Sitemap discovery and URL indexing
-   Support for asynchronous scraping workflows
-   Rate limiting and request handling utilities

**Search Engine Toolkit**

-   In-house website search engine creation
-   Full-text indexing and querying
-   Custom relevance ranking algorithms
-   Search filtering and query optimization
-   Incremental indexing support
-   Lightweight search infrastructure for internal platforms

**Content Processing**

-   Text normalization and cleaning
-   Metadata extraction
-   Duplicate content detection
-   Keyword extraction and tagging
-   Content chunking for AI and search applications

**AI & Semantic Search Ready**

-   Embedding generation helpers
-   Vector database compatibility
-   Semantic similarity search utilities
-   Retrieval-Augmented Generation (RAG) support
-   AI-powered content indexing workflows

**Developer Experience**

-   Modular and extensible architecture
-   Framework-friendly design for Flask, Django, and FastAPI
-   Easy API integration
-   Clean, Pythonic interfaces
-   Production-ready utilities for scalable deployments

> This program may incorporate artificial intelligence (AI) tools solely
> to support and enhance development efficiency, code quality, and
> overall performance. All software design, implementation, testing,
> validation, and quality assurance processes are conducted and reviewed
> by a qualified human software professional to ensure accuracy,
> reliability, security, and compliance with applicable standards.

Author:
**Urate, Karen**<br>
*Softhauz Software Architect*<br>
[softhauz.ca](https://softhauz.ca)
