Metadata-Version: 2.4
Name: phani-data-recon
Version: 1.0.1
Summary: SAP <-> Salesforce Account Data Reconciliation Utility
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: duckdb>=0.10
Requires-Dist: pandas>=2.0
Requires-Dist: openpyxl>=3.1
Requires-Dist: jinja2>=3.1
Requires-Dist: pyyaml>=6.0
Requires-Dist: jsonschema>=4.0
Requires-Dist: rapidfuzz>=3.0
Requires-Dist: great-expectations>=0.18
Requires-Dist: rich>=13.0

# SAP ↔ Salesforce Account Data Reconciliation Utility

Reconcile SAP and Salesforce Account master data at bulk scale (300K–400K records).
Produces a 10-tab Excel workbook and an HTML dashboard with KPIs, field-level diffs,
fuzzy match candidates, and a prioritised action plan.

## Quick Start

```bash
# 1. Install package in editable mode (library + CLI)
pip install -e .

# 2. Place your CSV files in input/
#    input/sap_accounts.csv
#    input/sf_accounts.csv

# 3. Run using installed CLI command
reconcile-accounts --sap input/sap_accounts.csv --sf input/sf_accounts.csv

# 4. Or run using config input paths (config/rules.yaml -> input.sap/input.sf)
reconcile-accounts

# Output written to output/
```

Legacy script invocation still works:

```bash
python run_reconciliation.py
```

## Options

```
--sap         Path to SAP accounts CSV (optional if config input.sap is set)
--sf          Path to Salesforce accounts CSV (optional if config input.sf is set)
--config      Path to rules YAML (default: ./config/rules.yaml, then packaged default)
--output-dir  Output directory (default: from config)
--formats     excel html (default: both)
--dry-run     Validate config + headers only; no report written
--no-fuzzy    Skip fuzzy matching (faster for large files)
--verbose     Verbose logging
```

Path resolution precedence:
- If `--sap` / `--sf` are passed, CLI values are used.
- If not passed, values are resolved from `config/rules.yaml` under `input.sap` and `input.sf`.
- If neither CLI nor config provides paths, the run exits with an input path error.

## Configuration

Edit `config/rules.yaml` to change:
- Default input files via `input.sap` and `input.sf` (`directory` + `file_name`)
- Join key columns (SAP ↔ SF linking fields)
- Fallback-key matching toggle via `join.fallback.enabled` (default: `false` = primary-key-only matching)
- Field comparison rules, severity levels, and normalize modes
- Deduplication strategy (`keep_first` / `keep_last` / `flag_all`)
- Fuzzy match threshold and fields
- Output formats and directory
- Output report location/name via `output.report.directory` + `output.report.file_name`

### Config Reference (Input + Join)

```yaml
input:
	sap:
		directory: "input"
		file_name: "sap_accounts.csv"
	sf:
		directory: "input"
		file_name: "sf_accounts.csv"

join:
	primary:
		sap_col: "SAP_Unique_ID"
		sf_col:  "BP_PowerCerv_Account_Id__c"
	fallback:
		enabled: false
		sap_col: "SAP_Unique_ID"
		sf_col:  "WC_SAP_Identification__c"

output:
	formats: ["excel", "html"]
	report:
		directory: "output"
		file_name: "reconciliation_report"
```

Notes:
- Set `join.fallback.enabled: false` for strict primary-key-only matching (default).
- Set `join.fallback.enabled: true` only when you explicitly want fallback-key matching.

## Report Tabs

| Tab | Content |
|-----|---------|
| Summary | KPI counts, match rate, exception rate |
| Exact_Matches | Records found in both systems |
| Field_Mismatches | Field-level diffs (CRITICAL / HIGH / INFO) |
| SAP_Only | SAP records missing from Salesforce |
| SF_Only | Salesforce records missing from SAP |
| SAP_Duplicates | Duplicate SAP rows before dedup |
| SF_Duplicates | Duplicate SF rows before dedup |
| Fuzzy_Match_Candidates | Likely-same accounts not linked by ID |
| Data_Quality_Issues | Null IDs, bad formats, validation failures |
| Action_Plan | P1–P4 prioritised remediation table |

## Run Tests

```bash
pip install pytest
python -m pytest tests/ -v
```

## Distribution (Business Rollout)

```bash
# Build wheel + source distribution
python -m build

# Install locally from wheel
pip install dist/phani_data_recon-1.0.0-py3-none-any.whl
```

If `reconcile-accounts` is not on PATH, run:

```bash
python -m phani_data_recon.cli --dry-run
```

## Project Structure

```
reconciliation_project/
├── input/           ← Place source CSVs here
├── config/          ← rules.yaml + schema
├── src/             ← All Python modules
├── templates/       ← Jinja2 HTML template
├── tests/           ← pytest test suite
├── output/          ← Reports generated here
└── run_reconciliation.py
```
