Metadata-Version: 2.4
Name: phani-data-recon
Version: 1.0.4
Summary: SAP <-> Salesforce Account Data Reconciliation Utility
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: duckdb>=0.10
Requires-Dist: pandas>=2.0
Requires-Dist: openpyxl>=3.1
Requires-Dist: jinja2>=3.1
Requires-Dist: pyyaml>=6.0
Requires-Dist: jsonschema>=4.0
Requires-Dist: rapidfuzz>=3.0
Requires-Dist: great-expectations>=0.18
Requires-Dist: rich>=13.0

# SAP ↔ Salesforce Account Data Reconciliation Utility

Reconcile SAP and Salesforce Account master data at bulk scale (300K–400K records).
Produces a 10-tab Excel workbook and an HTML dashboard with KPIs, field-level diffs,
fuzzy match candidates, and a prioritised action plan.

## Quick Start

### Installation

```bash
# Install from PyPI
py -m pip install phani-data-recon

# Upgrade to the latest version
py -m pip install --upgrade phani-data-recon

# Verify installed version
py -m pip show phani-data-recon
```

### Run the Application

**Using the CLI command:**

```bash
# Verify the CLI is available
reconcile-accounts --help

# Run with explicit SAP and Salesforce input files
reconcile-accounts --sap input/sap_accounts.csv --sf input/sf_accounts.csv
```

**Using py module (alternative if CLI is not on PATH):**

```bash
py -m phani_data_recon.cli --sap input/sap_accounts.csv --sf input/sf_accounts.csv
```

### Configuration & Input/Output

**Configure input and output folders via config file:**

Edit `config/rules.yaml` to set default paths:

```yaml
input:
  sap:
    directory: "input"
    file_name: "sap_accounts.csv"
  sf:
    directory: "input"
    file_name: "sf_accounts.csv"

output:
  formats: ["excel", "html"]
  report:
    directory: "output"
    file_name: "reconciliation_report"
```

Then run with config:

```bash
reconcile-accounts --config config/rules.yaml
py -m phani_data_recon.cli --config config/rules.yaml
```

**Override output directory at runtime:**

```bash
reconcile-accounts --config config/rules.yaml --output-dir output/custom_run
py -m phani_data_recon.cli --config config/rules.yaml --output-dir output/custom_run
```

### Validation & Advanced Options

**Validate headers and config only (dry-run):**

```bash
reconcile-accounts --sap input/sap_accounts.csv --sf input/sf_accounts.csv --dry-run
py -m phani_data_recon.cli --sap input/sap_accounts.csv --sf input/sf_accounts.csv --dry-run
```

**Generate only HTML output:**

```bash
reconcile-accounts --sap input/sap_accounts.csv --sf input/sf_accounts.csv --formats html
py -m phani_data_recon.cli --sap input/sap_accounts.csv --sf input/sf_accounts.csv --formats html
```

**Skip fuzzy matching (faster for large files):**

```bash
reconcile-accounts --sap input/sap_accounts.csv --sf input/sf_accounts.csv --no-fuzzy
py -m phani_data_recon.cli --sap input/sap_accounts.csv --sf input/sf_accounts.csv --no-fuzzy
```

**Enable verbose logging:**

```bash
reconcile-accounts --sap input/sap_accounts.csv --sf input/sf_accounts.csv --verbose
py -m phani_data_recon.cli --sap input/sap_accounts.csv --sf input/sf_accounts.csv --verbose
```

Platform-specific path examples:

Windows:

```powershell
reconcile-accounts --sap .\input\sap_accounts.csv --sf .\input\sf_accounts.csv
py -m phani_data_recon.cli --config .\config\rules.yaml --dry-run
```

```bash
# macOS
reconcile-accounts --sap ./input/sap_accounts.csv --sf ./input/sf_accounts.csv
python3 -m phani_data_recon.cli --config ./config/rules.yaml --dry-run
```

If Windows `cmd` does not recognize `reconcile-accounts`, add your Python Scripts directory to PATH and reopen `cmd`:

```cmd
setx PATH "%PATH%;C:\Users\SeshaphaniBysani\AppData\Local\Python\pythoncore-3.14-64\Scripts"
```

Then verify:

```cmd
where reconcile-accounts
reconcile-accounts --help
```

For local development in this repository, editable install still works:

```bash
py -m pip install -e .
```

## Package Usage

```bash
# Explicit input files
reconcile-accounts --sap input/sap_accounts.csv --sf input/sf_accounts.csv

# Config-driven execution
reconcile-accounts --config config/rules.yaml

# Override only the output directory
reconcile-accounts --config config/rules.yaml --output-dir output/run_2026_05_11

# Generate only HTML output
reconcile-accounts --sap input/sap_accounts.csv --sf input/sf_accounts.csv --formats html
```

If the console script is not available on your PATH, use:

```bash
py -m phani_data_recon.cli --dry-run
```

Production verification on Windows `cmd`:

```cmd
py -m pip show phani-data-recon
where reconcile-accounts
py -m phani_data_recon.cli --sap input/sap_accounts.csv --sf input/sf_accounts.csv --dry-run
```

Expected state:
- Installed version should be `1.0.1`.
- If `where reconcile-accounts` is empty but module execution works, only PATH needs to be fixed.

### Python API

Run reconciliation from another Python application:

```python
from phani_data_recon.api import run_reconciliation

exit_code = run_reconciliation(
    sap="input/sap_accounts.csv",
    sf="input/sf_accounts.csv",
    config="config/rules.yaml",
    output_dir="output/api_run",
    formats=["excel", "html"],
    dry_run=False,
    no_fuzzy=False,
    verbose=True,
)

print(exit_code)
```

The API mirrors the CLI behavior and returns a process-style exit code.

## Options

```
--sap         Path to SAP accounts CSV (optional if config input.sap is set)
--sf          Path to Salesforce accounts CSV (optional if config input.sf is set)
--config      Path to rules YAML (default: ./config/rules.yaml, then packaged default)
--output-dir  Output directory (default: from config)
--formats     excel html (default: both)
--dry-run     Validate config + headers only; no report written
--no-fuzzy    Skip fuzzy matching (faster for large files)
--verbose     Verbose logging
```

Path resolution precedence:
- If `--sap` / `--sf` are passed, CLI values are used.
- If not passed, values are resolved from `config/rules.yaml` under `input.sap` and `input.sf`.
- If `--config` is not passed, the CLI tries local `config/rules.yaml` first and then the packaged default config.
- If `--output-dir` is passed, it overrides `output.report.directory`.
- If neither CLI nor config provides paths, the run exits with an input path error.

## Configuration

Edit `config/rules.yaml` to change:
- Default input files via `input.sap` and `input.sf` (`directory` + `file_name`)
- Join key columns (SAP ↔ SF linking fields)
- Fallback-key matching toggle via `join.fallback.enabled` (default: `false` = primary-key-only matching)
- Field comparison rules, severity levels, and normalize modes
- Deduplication strategy (`keep_first` / `keep_last` / `flag_all`)
- Fuzzy match threshold and fields
- Output formats and directory
- Output report location/name via `output.report.directory` + `output.report.file_name`

When using the package outside this repository, pass your own config file with `--config` if you do not want to rely on the packaged defaults.

### Output Report Configuration

Use the `output.report` block in `config/rules.yaml` to control where reports are written and what base filename is used.

```yaml
output:
	formats: ["excel", "html"]
	report:
		directory: "output/month_end"
		file_name: "customer_reconciliation"
```

This writes reports under `output/month_end/` using `customer_reconciliation` as the base name, for example:
- `output/month_end/customer_reconciliation_<run_id>.html`
- `output/month_end/customer_reconciliation_<run_id>.xlsx`

Rules:
- `--output-dir` overrides `output.report.directory`
- `output.report.file_name` sets the report filename prefix
- `output.formats` selects Excel, HTML, or both

Example commands:

```bash
# Use output settings from config
reconcile-accounts --config config/rules.yaml

# Override only the output directory at runtime
reconcile-accounts --config config/rules.yaml --output-dir output/ad_hoc_run
```

### Config Reference (Input + Join)

```yaml
input:
	sap:
		directory: "input"
		file_name: "sap_accounts.csv"
	sf:
		directory: "input"
		file_name: "sf_accounts.csv"

join:
	primary:
		sap_col: "SAP_Unique_ID"
		sf_col:  "BP_PowerCerv_Account_Id__c"
	fallback:
		enabled: false
		sap_col: "SAP_Unique_ID"
		sf_col:  "WC_SAP_Identification__c"

output:
	formats: ["excel", "html"]
	report:
		directory: "output"
		file_name: "reconciliation_report"
```

Notes:
- Set `join.fallback.enabled: false` for strict primary-key-only matching (default).
- Set `join.fallback.enabled: true` only when you explicitly want fallback-key matching.

## Report Tabs

| Tab | Content |
|-----|---------|
| Summary | KPI counts, match rate, exception rate |
| Exact_Matches | Records found in both systems |
| Field_Mismatches | Field-level diffs (CRITICAL / HIGH / INFO) |
| SAP_Only | SAP records missing from Salesforce |
| SF_Only | Salesforce records missing from SAP |
| SAP_Duplicates | Duplicate SAP rows before dedup |
| SF_Duplicates | Duplicate SF rows before dedup |
| Fuzzy_Match_Candidates | Likely-same accounts not linked by ID |
| Data_Quality_Issues | Null IDs, bad formats, validation failures |
| Action_Plan | P1–P4 prioritised remediation table |

## Run Tests

```bash
py -m pip install pytest
py -m pytest tests/ -v
```

## Distribution (Business Rollout)

```bash
# Build wheel + source distribution
py -m pip install build
py -m build

# Install locally from wheel
py -m pip install dist/phani_data_recon-1.0.3-py3-none-any.whl
```

If `reconcile-accounts` is not on PATH, run:

```bash
py -m phani_data_recon.cli --dry-run
```

Published package:

```bash
py -m pip install --upgrade phani-data-recon
```

Legacy script usage inside this repository still works:

```bash
python run_reconciliation.py --dry-run
```

## CI Publishing (GitHub Actions)

This repository includes [`.github/workflows/publish-pypi.yml`](.github/workflows/publish-pypi.yml) to publish new releases to PyPI without storing a PyPI API token in GitHub.

One-time PyPI setup:
- In PyPI, open the `phani-data-recon` project settings.
- Add a Trusted Publisher for this GitHub repository.
- Set the workflow name to `publish-pypi.yml`.
- Set the environment name to `pypi`.

Release flow:
- Bump the version in `pyproject.toml`.
- Create a GitHub release or run the workflow manually from the Actions tab.
- The workflow builds `dist/` artifacts and publishes them with PyPI trusted publishing.

Notes:
- This workflow uses GitHub OIDC via `id-token: write`, so no `TWINE_PASSWORD` secret is required in GitHub.
- Keep local `twine` usage only for manual emergency releases.

## Project Structure

```
reconciliation_project/
├── input/           ← Place source CSVs here
├── config/          ← rules.yaml + schema
├── src/             ← All Python modules
├── templates/       ← Jinja2 HTML template
├── tests/           ← pytest test suite
├── output/          ← Reports generated here
└── run_reconciliation.py
```
