Metadata-Version: 2.4
Name: evolvishub-outlook-ingestor
Version: 1.1.6
Summary: Production-ready, secure email ingestion system for Microsoft Outlook with advanced processing, monitoring, and database integration
Author-email: "Alban Maxhuni, PhD" <a.maxhuni@evolvis.ai>
Maintainer-email: Kevin Medina Gómez <k.medina@evolvis.ai>
License: Evolvis AI License
Project-URL: Homepage, https://github.com/evolvisai/metcal
Project-URL: Documentation, https://github.com/evolvisai/metcal/tree/main/shared/libs/evolvis-outlook-ingestor/docs
Project-URL: Repository, https://github.com/evolvisai/metcal.git
Project-URL: Issues, https://github.com/evolvisai/metcal/issues
Project-URL: Changelog, https://github.com/evolvisai/metcal/blob/main/shared/libs/evolvis-outlook-ingestor/CHANGELOG.md
Project-URL: Examples, https://github.com/evolvisai/metcal/tree/main/shared/libs/evolvis-outlook-ingestor/examples
Keywords: outlook,email,ingestion,exchange,graph-api,imap,pop3,database,async,batch-processing,security,monitoring,performance,postgresql,mongodb,enterprise
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Intended Audience :: Information Technology
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Communications :: Email
Classifier: Topic :: Communications :: Email :: Filters
Classifier: Topic :: Database
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Archiving
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: System :: Systems Administration
Classifier: Topic :: Security
Classifier: Topic :: Security :: Cryptography
Classifier: Framework :: AsyncIO
Classifier: Framework :: Pydantic
Classifier: Environment :: Console
Classifier: Environment :: Web Environment
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic<3.0.0,>=2.0.0
Requires-Dist: pydantic-settings<3.0.0,>=2.0.0
Requires-Dist: typing-extensions>=4.0.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: aiohttp>=3.8.0
Requires-Dist: aiofiles>=23.0.0
Requires-Dist: asyncio-throttle>=1.0.0
Requires-Dist: exchangelib>=5.0.0
Requires-Dist: msal>=1.20.0
Requires-Dist: requests>=2.28.0
Requires-Dist: aioimaplib>=1.0.0
Requires-Dist: sqlalchemy[asyncio]>=2.0.0
Requires-Dist: asyncpg>=0.28.0
Requires-Dist: aiomysql>=0.2.0
Requires-Dist: motor>=3.0.0
Requires-Dist: prometheus-client>=0.17.0
Requires-Dist: structlog>=23.0.0
Requires-Dist: tenacity>=8.0.0
Requires-Dist: psutil>=5.9.0
Requires-Dist: python-dateutil>=2.8.0
Requires-Dist: email-validator>=2.0.0
Requires-Dist: chardet>=5.0.0
Requires-Dist: python-magic>=0.4.0
Requires-Dist: cryptography>=41.0.0
Requires-Dist: beautifulsoup4>=4.11.0
Requires-Dist: Pillow>=9.0.0
Requires-Dist: click>=8.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: redis>=4.5.0
Requires-Dist: websockets>=11.0.0
Requires-Dist: fastapi>=0.100.0
Requires-Dist: uvicorn>=0.23.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: spacy>=3.6.0
Requires-Dist: textblob>=0.17.0
Requires-Dist: langdetect>=1.0.9
Requires-Dist: opentelemetry-api>=1.20.0
Requires-Dist: opentelemetry-sdk>=1.20.0
Requires-Dist: prometheus-client>=0.17.0
Requires-Dist: schedule>=1.2.0
Requires-Dist: cachetools>=5.3.0
Provides-Extra: protocols
Requires-Dist: msal>=1.20.0; extra == "protocols"
Requires-Dist: aiohttp>=3.8.0; extra == "protocols"
Requires-Dist: exchangelib>=5.0.0; extra == "protocols"
Requires-Dist: aioimaplib>=1.0.0; extra == "protocols"
Provides-Extra: database
Requires-Dist: asyncpg>=0.28.0; extra == "database"
Requires-Dist: motor>=3.1.0; extra == "database"
Requires-Dist: aiomysql>=0.1.0; extra == "database"
Provides-Extra: database-sqlite
Requires-Dist: aiosqlite>=0.19.0; extra == "database-sqlite"
Provides-Extra: database-mssql
Requires-Dist: aioodbc>=0.4.0; extra == "database-mssql"
Requires-Dist: pyodbc>=4.0.0; extra == "database-mssql"
Provides-Extra: database-mariadb
Requires-Dist: aiomysql>=0.2.0; extra == "database-mariadb"
Provides-Extra: database-oracle
Requires-Dist: cx_Oracle>=8.3.0; extra == "database-oracle"
Provides-Extra: database-cockroachdb
Requires-Dist: asyncpg>=0.28.0; extra == "database-cockroachdb"
Provides-Extra: database-all
Requires-Dist: asyncpg>=0.28.0; extra == "database-all"
Requires-Dist: motor>=3.1.0; extra == "database-all"
Requires-Dist: aiomysql>=0.2.0; extra == "database-all"
Requires-Dist: aiosqlite>=0.19.0; extra == "database-all"
Requires-Dist: aioodbc>=0.4.0; extra == "database-all"
Requires-Dist: pyodbc>=4.0.0; extra == "database-all"
Requires-Dist: cx_Oracle>=8.3.0; extra == "database-all"
Provides-Extra: datalake-delta
Requires-Dist: delta-spark>=2.4.0; extra == "datalake-delta"
Requires-Dist: pyspark>=3.4.0; extra == "datalake-delta"
Requires-Dist: pyarrow>=12.0.0; extra == "datalake-delta"
Provides-Extra: datalake-iceberg
Requires-Dist: pyiceberg>=0.5.0; extra == "datalake-iceberg"
Requires-Dist: pyarrow>=12.0.0; extra == "datalake-iceberg"
Provides-Extra: database-clickhouse
Requires-Dist: clickhouse-connect>=0.6.0; extra == "database-clickhouse"
Requires-Dist: aiohttp>=3.8.0; extra == "database-clickhouse"
Provides-Extra: datalake-all
Requires-Dist: delta-spark>=2.4.0; extra == "datalake-all"
Requires-Dist: pyspark>=3.4.0; extra == "datalake-all"
Requires-Dist: pyiceberg>=0.5.0; extra == "datalake-all"
Requires-Dist: pyarrow>=12.0.0; extra == "datalake-all"
Requires-Dist: clickhouse-connect>=0.6.0; extra == "datalake-all"
Requires-Dist: aiohttp>=3.8.0; extra == "datalake-all"
Provides-Extra: processing
Requires-Dist: beautifulsoup4>=4.11.0; extra == "processing"
Requires-Dist: Pillow>=9.0.0; extra == "processing"
Provides-Extra: storage
Requires-Dist: minio>=7.1.0; extra == "storage"
Provides-Extra: cloud-aws
Requires-Dist: boto3>=1.26.0; extra == "cloud-aws"
Requires-Dist: botocore>=1.29.0; extra == "cloud-aws"
Provides-Extra: cloud-azure
Requires-Dist: azure-storage-blob>=12.14.0; extra == "cloud-azure"
Requires-Dist: azure-identity>=1.12.0; extra == "cloud-azure"
Provides-Extra: cloud-gcp
Requires-Dist: google-cloud-storage>=2.7.0; extra == "cloud-gcp"
Requires-Dist: google-auth>=2.16.0; extra == "cloud-gcp"
Provides-Extra: cloud-all
Requires-Dist: minio>=7.1.0; extra == "cloud-all"
Requires-Dist: boto3>=1.26.0; extra == "cloud-all"
Requires-Dist: botocore>=1.29.0; extra == "cloud-all"
Requires-Dist: azure-storage-blob>=12.14.0; extra == "cloud-all"
Requires-Dist: azure-identity>=1.12.0; extra == "cloud-all"
Requires-Dist: google-cloud-storage>=2.7.0; extra == "cloud-all"
Requires-Dist: google-auth>=2.16.0; extra == "cloud-all"
Provides-Extra: streaming
Requires-Dist: redis>=4.5.0; extra == "streaming"
Requires-Dist: websockets>=11.0.0; extra == "streaming"
Requires-Dist: fastapi>=0.100.0; extra == "streaming"
Requires-Dist: uvicorn>=0.23.0; extra == "streaming"
Requires-Dist: aiokafka>=0.8.0; extra == "streaming"
Requires-Dist: kafka-python>=2.0.0; extra == "streaming"
Provides-Extra: analytics
Requires-Dist: pandas>=2.0.0; extra == "analytics"
Requires-Dist: numpy>=1.24.0; extra == "analytics"
Requires-Dist: scikit-learn>=1.3.0; extra == "analytics"
Requires-Dist: matplotlib>=3.7.0; extra == "analytics"
Requires-Dist: seaborn>=0.12.0; extra == "analytics"
Requires-Dist: networkx>=3.0; extra == "analytics"
Requires-Dist: scipy>=1.10.0; extra == "analytics"
Provides-Extra: ml
Requires-Dist: spacy>=3.6.0; extra == "ml"
Requires-Dist: textblob>=0.17.0; extra == "ml"
Requires-Dist: langdetect>=1.0.9; extra == "ml"
Requires-Dist: transformers>=4.30.0; extra == "ml"
Requires-Dist: torch>=2.0.0; extra == "ml"
Provides-Extra: observability
Requires-Dist: opentelemetry-api>=1.20.0; extra == "observability"
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == "observability"
Requires-Dist: opentelemetry-instrumentation>=0.41b0; extra == "observability"
Requires-Dist: jaeger-client>=4.8.0; extra == "observability"
Provides-Extra: caching
Requires-Dist: redis>=4.5.0; extra == "caching"
Requires-Dist: cachetools>=5.3.0; extra == "caching"
Requires-Dist: diskcache>=5.6.0; extra == "caching"
Provides-Extra: governance
Requires-Dist: apache-airflow>=2.7.0; extra == "governance"
Requires-Dist: great-expectations>=0.17.0; extra == "governance"
Requires-Dist: dbt-core>=1.6.0; extra == "governance"
Provides-Extra: all
Requires-Dist: msal>=1.20.0; extra == "all"
Requires-Dist: aiohttp>=3.8.0; extra == "all"
Requires-Dist: exchangelib>=5.0.0; extra == "all"
Requires-Dist: aioimaplib>=1.0.0; extra == "all"
Requires-Dist: asyncpg>=0.28.0; extra == "all"
Requires-Dist: motor>=3.1.0; extra == "all"
Requires-Dist: aiomysql>=0.2.0; extra == "all"
Requires-Dist: aiosqlite>=0.19.0; extra == "all"
Requires-Dist: aioodbc>=0.4.0; extra == "all"
Requires-Dist: pyodbc>=4.0.0; extra == "all"
Requires-Dist: cx_Oracle>=8.3.0; extra == "all"
Requires-Dist: clickhouse-connect>=0.6.0; extra == "all"
Requires-Dist: delta-spark>=2.4.0; extra == "all"
Requires-Dist: pyspark>=3.4.0; extra == "all"
Requires-Dist: pyiceberg>=0.5.0; extra == "all"
Requires-Dist: pyarrow>=12.0.0; extra == "all"
Requires-Dist: beautifulsoup4>=4.11.0; extra == "all"
Requires-Dist: Pillow>=9.0.0; extra == "all"
Requires-Dist: minio>=7.1.0; extra == "all"
Requires-Dist: boto3>=1.26.0; extra == "all"
Requires-Dist: botocore>=1.29.0; extra == "all"
Requires-Dist: azure-storage-blob>=12.14.0; extra == "all"
Requires-Dist: azure-identity>=1.12.0; extra == "all"
Requires-Dist: google-cloud-storage>=2.7.0; extra == "all"
Requires-Dist: google-auth>=2.16.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: pytest-xdist>=3.0.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: bandit>=1.7.0; extra == "dev"
Requires-Dist: sphinx>=6.0.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme>=1.2.0; extra == "dev"
Requires-Dist: myst-parser>=1.0.0; extra == "dev"
Requires-Dist: msal>=1.20.0; extra == "dev"
Requires-Dist: aiohttp>=3.8.0; extra == "dev"
Requires-Dist: exchangelib>=5.0.0; extra == "dev"
Requires-Dist: aioimaplib>=1.0.0; extra == "dev"
Requires-Dist: asyncpg>=0.28.0; extra == "dev"
Requires-Dist: motor>=3.1.0; extra == "dev"
Requires-Dist: aiomysql>=0.1.0; extra == "dev"
Requires-Dist: beautifulsoup4>=4.11.0; extra == "dev"
Requires-Dist: Pillow>=9.0.0; extra == "dev"
Provides-Extra: performance
Requires-Dist: uvloop>=0.17.0; sys_platform != "win32" and extra == "performance"
Requires-Dist: orjson>=3.8.0; extra == "performance"
Requires-Dist: msgpack>=1.0.0; extra == "performance"
Provides-Extra: monitoring
Requires-Dist: grafana-client>=3.5.0; extra == "monitoring"
Requires-Dist: elasticsearch>=8.0.0; extra == "monitoring"
Requires-Dist: redis>=4.5.0; extra == "monitoring"
Dynamic: license-file

<div align="center">
  <img src="https://evolvis.ai/wp-content/uploads/2025/08/evie-solutions-03.png" alt="Evolvis AI - Evie Solutions Logo" width="400">
</div>

# Evolvishub Outlook Ingestor

**Production-ready email data processing platform with comprehensive advanced features.**

A Python library for ingesting, processing, and storing email data from Microsoft Outlook and Exchange systems. Provides complete email ingestion functionality with advanced features including analytics, ML, governance, monitoring, and real-time streaming capabilities.

## Download Statistics

[![Weekly Downloads](https://pepy.tech/badge/evolvishub-outlook-ingestor/week)](https://pepy.tech/project/evolvishub-outlook-ingestor)
[![Monthly Downloads](https://pepy.tech/badge/evolvishub-outlook-ingestor/month)](https://pepy.tech/project/evolvishub-outlook-ingestor)
[![Total Downloads](https://pepy.tech/badge/evolvishub-outlook-ingestor)](https://pepy.tech/project/evolvishub-outlook-ingestor)

[![PyPI Version](https://img.shields.io/pypi/v/evolvishub-outlook-ingestor)](https://pypi.org/project/evolvishub-outlook-ingestor/)
[![Python Versions](https://img.shields.io/pypi/pyversions/evolvishub-outlook-ingestor)](https://pypi.org/project/evolvishub-outlook-ingestor/)
[![License](https://img.shields.io/pypi/l/evolvishub-outlook-ingestor)](LICENSE)

## Quick Start

```python
import asyncio
from evolvishub_outlook_ingestor import OutlookIngestor, Settings

async def main():
    settings = Settings()
    settings.database.host = "localhost"
    settings.database.database = "outlook_emails"
    
    ingestor = OutlookIngestor(settings)
    await ingestor.process_emails()

asyncio.run(main())
```

## Installation

```bash
# Basic installation
pip install evolvishub-outlook-ingestor

# With all advanced features
pip install 'evolvishub-outlook-ingestor[streaming,analytics,ml,governance,monitoring]'
```

## Core Features

### Email Ingestion & Processing
- Microsoft Graph API integration for Office 365/Exchange Online
- Exchange Web Services (EWS) support for on-premises Exchange
- IMAP/POP3 protocol support for legacy systems
- Comprehensive email metadata extraction and processing

### Database Storage
- PostgreSQL, MongoDB, SQLite support
- Async database operations with connection pooling
- Configurable storage backends
- Email deduplication and conflict resolution

## Advanced Features

### Real-time Streaming & Event Processing
- Redis pub/sub based event streaming with Kafka integration support
- Advanced backpressure handling with intelligent queues
- Real-time email processing capabilities
- Distributed streaming support with horizontal scaling

### Change Data Capture (CDC)
- Complete incremental processing capabilities
- Advanced change detection and synchronization
- Event-driven data capture with lineage tracking

### Data Transformation
- Complete data transformation pipelines
- NLP processing with sentiment analysis and language detection
- PII detection and entity extraction
- Content enrichment and metadata augmentation

### Analytics Engine
- Full analytics framework with communication pattern analysis
- Trend detection and insights generation
- ML-powered business intelligence and reporting

### Data Quality Validation
- Comprehensive data quality framework
- Advanced validation rules, scoring, and anomaly detection
- Duplicate detection and completeness validation

### Intelligent Caching
- Multi-level caching with LRU, LFU, and TTL strategies
- Redis integration with intelligent cache warming
- Predictive caching and performance optimization

### Multi-Tenant Support
- Complete tenant isolation and resource management
- Enterprise-grade security boundaries and access control
- Scalable multi-tenant architecture

### Data Governance
- Complete governance framework with lineage tracking
- Data retention policies and compliance monitoring
- GDPR/CCPA compliance validation and reporting

### Machine Learning Integration
- Full ML service with email classification and spam detection
- Priority prediction and sentiment analysis
- Model training and evaluation capabilities

### Monitoring & Observability
- Complete monitoring with distributed tracing
- Prometheus metrics integration and alerting
- Health checking and performance monitoring

## Configuration

### Basic Configuration

```python
from evolvishub_outlook_ingestor import Settings

settings = Settings()

# Database configuration
settings.database.host = "localhost"
settings.database.port = 5432
settings.database.database = "outlook_emails"
settings.database.username = "user"
settings.database.password = "password"

# Microsoft Graph API
settings.protocols.graph.client_id = "your-client-id"
settings.protocols.graph.client_secret = "your-client-secret"
settings.protocols.graph.tenant_id = "your-tenant-id"
```

### Advanced Configuration

```python
# Enable advanced features
settings.enable_analytics = True
settings.enable_ml = True
settings.enable_governance = True
settings.enable_monitoring = True

# Streaming configuration
settings.streaming.backend = "redis"
settings.streaming.redis_url = "redis://localhost:6379"

# ML configuration
settings.ml.enable_spam_detection = True
settings.ml.enable_classification = True
settings.ml.enable_priority_prediction = True

# Governance configuration
settings.governance.enable_compliance_monitoring = True
settings.governance.enable_retention_policies = True
settings.governance.enable_lineage_tracking = True
```

## Advanced Usage

### Complete Pipeline with All Features

```python
import asyncio
from evolvishub_outlook_ingestor import (
    OutlookIngestor,
    AdvancedMonitoringService,
    IntelligentCacheManager,
    MLService,
    DataQualityValidator,
    AnalyticsEngine,
    GovernanceService,
    Settings
)

async def advanced_pipeline():
    settings = Settings()
    
    # Initialize core ingestor
    ingestor = OutlookIngestor(settings)
    
    # Initialize advanced services
    monitoring = AdvancedMonitoringService({'enable_tracing': True})
    cache = IntelligentCacheManager({'backend': 'memory'})
    ml_service = MLService({'enable_spam_detection': True})
    quality_validator = DataQualityValidator({'enable_duplicate_detection': True})
    analytics = AnalyticsEngine({'enable_communication_analysis': True})
    governance = GovernanceService({'enable_compliance_monitoring': True})
    
    # Initialize all services
    await monitoring.initialize()
    await cache.initialize()
    await ml_service.initialize()
    await quality_validator.initialize()
    await analytics.initialize()
    await governance.initialize()
    
    print("All services initialized successfully!")
    print("Advanced email processing pipeline ready")
    
    # Cleanup
    await monitoring.shutdown()
    await cache.shutdown()
    await ml_service.shutdown()
    await quality_validator.shutdown()
    await analytics.shutdown()
    await governance.shutdown()

asyncio.run(advanced_pipeline())
```

## Performance

### Production Benchmarks

| Configuration | Emails/Minute | Memory Usage | Notes |
|---------------|---------------|--------------|-------|
| Basic Processing | 500-1000 | 128MB | Core ingestion with optimizations |
| With Database Storage | 800-1500 | 256MB | PostgreSQL/MongoDB with connection pooling |
| With Redis Caching | 1200-2000 | 384MB | Intelligent caching enabled |
| Full ML Pipeline | 600-1200 | 512MB | Complete ML classification and analysis |
| Enterprise Setup | 1500-3000 | 1GB | All features with monitoring and governance |

### Feature Performance

| Feature | Status | Performance | Notes |
|---------|--------|-------------|-------|
| Real-time Streaming | Production Ready | 2000+ emails/min | Redis + Kafka support |
| ML Classification | Production Ready | 1000+ emails/min | Full sklearn/spacy pipeline |
| Analytics Engine | Production Ready | Real-time insights | Complete communication analysis |
| Intelligent Caching | Production Ready | 95%+ hit rate | Multi-level LRU/LFU/TTL strategies |
| Data Governance | Production Ready | Full compliance | GDPR/CCPA monitoring and reporting |

## Requirements

### System Requirements
- Python 3.9+
- 4GB+ RAM (8GB+ recommended for enterprise features)
- 10GB+ disk space for data storage

### Optional External Services
- Database: PostgreSQL 12+ or MongoDB 4.4+ (for data persistence)
- Message Queue: Redis 6.0+ (for streaming) or Kafka 2.8+ (with aiokafka dependency)
- Monitoring: Prometheus, Jaeger, InfluxDB (for observability)
- Cache: Redis 6.0+ (for distributed caching)

## Documentation

- [Configuration Reference](docs/CONFIGURATION_REFERENCE.md)
- [Deployment Guide](docs/DEPLOYMENT_GUIDE.md)
- [Advanced Features](docs/ADVANCED_FEATURES.md)
- [API Reference](docs/API_REFERENCE.md)

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Support

For support, please contact [support@evolvis.ai](mailto:support@evolvis.ai) or visit our [documentation](https://docs.evolvis.ai).
