Metadata-Version: 2.4
Name: xelytics-core
Version: 0.2.0
Summary: Pure analytics engine for statistical analysis and insight generation
Author: Xelytics Team
License: MIT
Project-URL: Homepage, https://github.com/xelytics/xelytics-core
Project-URL: Documentation, https://xelytics-core.readthedocs.io
Project-URL: Repository, https://github.com/xelytics/xelytics-core.git
Project-URL: Issues, https://github.com/xelytics/xelytics-core/issues
Project-URL: Changelog, https://github.com/xelytics/xelytics-core/releases
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pandas>=2.1.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.11.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: statsmodels>=0.14.0
Requires-Dist: pingouin>=0.5.3
Requires-Dist: plotly>=5.17.0
Requires-Dist: matplotlib>=3.7.0
Requires-Dist: jinja2>=3.1.0
Requires-Dist: redis>=5.0.0
Provides-Extra: llm
Requires-Dist: openai>=1.6.0; extra == "llm"
Requires-Dist: groq>=0.4.0; extra == "llm"
Requires-Dist: httpx>=0.25.0; extra == "llm"
Provides-Extra: advanced
Requires-Dist: ruptures>=1.1.8; extra == "advanced"
Requires-Dist: pmdarima>=2.0.4; extra == "advanced"
Provides-Extra: connectors
Requires-Dist: psycopg2-binary>=2.9.0; extra == "connectors"
Requires-Dist: pymysql>=1.1.0; extra == "connectors"
Requires-Dist: snowflake-connector-python>=3.0.0; extra == "connectors"
Requires-Dist: sqlalchemy>=2.0.0; extra == "connectors"
Requires-Dist: openpyxl>=3.1.0; extra == "connectors"
Requires-Dist: pyarrow>=14.0.0; extra == "connectors"
Requires-Dist: boto3>=1.34.0; extra == "connectors"
Requires-Dist: azure-storage-blob>=12.19.0; extra == "connectors"
Requires-Dist: google-cloud-storage>=2.10.0; extra == "connectors"
Requires-Dist: google-cloud-bigquery>=3.11.0; extra == "connectors"
Requires-Dist: pandas-gbq>=0.19.0; extra == "connectors"
Provides-Extra: export
Requires-Dist: jinja2>=3.1.0; extra == "export"
Requires-Dist: weasyprint>=60.0; extra == "export"
Requires-Dist: python-pptx>=0.6.23; extra == "export"
Requires-Dist: nbformat>=5.7.0; extra == "export"
Requires-Dist: kaleido>=0.2.1; extra == "export"
Provides-Extra: large-data
Requires-Dist: dask[dataframe]>=2024.1.0; extra == "large-data"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "dev"
Requires-Dist: black>=23.11.0; extra == "dev"
Requires-Dist: mypy>=1.7.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"

# Xelytics-Core

**Pure analytics engine for statistical analysis and insight generation.**

**Current Version**: 0.2.0-alpha.1 (In Development)  
**Status**: Alpha - Phase 1, 2, 3 Complete

## What's New in v0.2.0

✅ **Phase 1 (Foundation)**: Complete
- Extended schemas for v0.2.0 features
- Backward compatibility guaranteed

✅ **Phase 2 (Time Series Analysis)**: Complete
- Time series detection and validation
- Trend and seasonality decomposition (STL, classical)
- ARIMA and Exponential Smoothing forecasting
- Anomaly detection (Z-score, IQR, MAD, Isolation Forest)
- Change point detection

✅ **Phase 3 (Clustering)**: Complete
- K-Means clustering with optimal K selection
- DBSCAN density-based clustering
- Hierarchical/Agglomerative clustering
- Cluster profiling and characterization

🚧 **Coming Soon** (Phase 4-7):
- Performance optimization (parallel processing, caching)
- Enhanced LLM providers (Anthropic, Azure, Gemini)
- Database connectors (PostgreSQL, MySQL, BigQuery)
- Interactive HTML report generation

## Installation

```bash
pip install -e .
```

## Quick Start

```python
from xelytics import analyze, AnalysisConfig
import pandas as pd

# Load your data
df = pd.read_csv("data.csv")

# Run automated analysis
result = analyze(df, mode="automated")

# Access results
print(f"Analyzed {result.metadata.row_count} rows")
print(f"Found {len(result.statistics)} statistical tests")
print(f"Generated {len(result.visualizations)} visualizations")
print(f"Produced {len(result.insights)} insights")

# Export to JSON
json_output = result.to_json()
```

## API Contract

```python
from xelytics import analyze, AnalysisConfig, AnalysisResult

result = analyze(
    data=df,
    mode="automated",  # or "semi-automated"
    config=AnalysisConfig(
        significance_level=0.05,
        enable_llm_insights=True,
        max_visualizations=10,
    )
)
```

## Output Schema

```python
AnalysisResult(
    summary=DatasetSummary(...),
    statistics=[StatisticalTestResult(...), ...],
    visualizations=[VisualizationSpec(...), ...],
    insights=[Insight(...), ...],
    metadata=RunMetadata(...),
)
```

## v0.2.0 Features

### Time Series Analysis

```python
from xelytics import analyze, AnalysisConfig

config = AnalysisConfig(
    enable_time_series=True,
    datetime_column='date',
    forecast_periods=30
)

result = analyze(df, config=config)

# Access time series results
for ts in result.time_series_analysis:
    print(f"Trend: {ts.has_trend}")
    print(f"Seasonality: {ts.has_seasonality}")
    print(f"Period: {ts.seasonal_period}")
```

### Clustering

```python
from xelytics import analyze, AnalysisConfig

config = AnalysisConfig(
    enable_clustering=True,
    clustering_algorithm='kmeans',  # or 'dbscan', 'hierarchical'
    max_clusters=5
)

result = analyze(df, config=config)

# Access cluster results
for cluster in result.clusters:
    print(f"Cluster {cluster.cluster_id}: {cluster.size} samples")
    print(f"Silhouette score: {cluster.silhouette_score}")
```

### Standalone Modules

```python
# Time Series
from xelytics.timeseries import decompose_time_series, forecast_time_series, detect_anomalies

decomposition = decompose_time_series(df, 'value', datetime_column='date', period=12)
forecast = forecast_time_series(df, 'value', periods=30, method='arima')
anomalies = detect_anomalies(df, 'value', method='zscore', threshold=3.0)

# Clustering
from xelytics.clustering import cluster_kmeans, cluster_dbscan, profile_clusters

kmeans_result, df_clustered = cluster_kmeans(df, n_clusters=5)
dbscan_result, df_clustered = cluster_dbscan(df, eps=0.5, min_samples=5)
profiles = profile_clusters(df, cluster_labels)
```

## Design Principles

1. **Pure analytics engine** - No HTTP, no database, no auth
2. **Deterministic** - Same input = same output
3. **LLM is optional** - Rule-based insights work without LLM
4. **Type-safe** - All inputs/outputs are typed dataclasses
5. **Backward compatible** - v0.1.0 code works in v0.2.0

## License

MIT
