Metadata-Version: 2.4
Name: finforge
Version: 3.0.0
Summary: Synthetic financial transaction data generation with persona-driven behavior simulation.
Author: Shivangi Shukla
Maintainer: FinForge maintainers
License-Expression: MIT
Project-URL: Homepage, https://github.com/shivangis22/finforge
Project-URL: Repository, https://github.com/shivangis22/finforge
Project-URL: Issues, https://github.com/shivangis22/finforge/issues
Project-URL: Documentation, https://github.com/shivangis22/finforge#readme
Project-URL: Changelog, https://github.com/shivangis22/finforge/blob/main/CHANGELOG.md
Keywords: synthetic-data,finance,transactions,simulation,testing
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=2.0
Requires-Dist: numpy>=1.24
Requires-Dist: faker>=24.0
Requires-Dist: pydantic>=2.6
Requires-Dist: python-dateutil>=2.8
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Dynamic: license-file

# FinForge v3.0.0

FinForge is a Python library for generating realistic synthetic financial transaction datasets with persistent personas, temporal balance consistency, business cashflow simulation, and reproducible fraud and anomaly scenarios.

## FinForge v3.0.0 — Fraud, Anomaly & Risk Simulation

FinForge v3 adds a post-generation risk layer on top of the normal v1/v2 behavioral engine:

- fraud injection engine
- anomaly simulation engine
- rule-based risk scoring
- fraud scenario IDs
- persona-aware fraud patterns
- fraud summary utilities
- lightweight fraud feature extraction for ML workflows

Fraud is injected after normal behavior generation, so suspicious activity appears as a deviation from a realistic baseline rather than replacing normal spending.

## Core capabilities

- Student, salaried, freelancer, business owner, household, retired, and mixed persona simulation
- Persistent behavioral identity metadata
- Irregular income and business cashflow
- Business vs personal account flags
- Seasonal business income and quarterly tax payments
- Recurring bills and subscriptions
- Balance tracking and overdraft metadata
- Session-based spending and low-balance suppression
- Fraud, anomaly, and risk metadata
- Seed reproducibility
- CSV export and pandas-native workflows

## Installation

```bash
pip install finforge
```

For local development:

```bash
pip install -e .[dev]
```

## Quickstart

Baseline dataset:

```python
from finforge import DatasetGenerator

df = (
    DatasetGenerator(seed=101)
    .with_users(3)
    .with_persona("student")
    .for_months(2)
    .generate()
)
```

Mixed population:

```python
from finforge import DatasetGenerator

df = (
    DatasetGenerator(seed=42)
    .with_users(50)
    .with_persona("mixed")
    .for_months(12)
    .generate()
)
```

Fraud dataset:

```python
from finforge import DatasetGenerator

df = (
    DatasetGenerator(seed=42)
    .with_users(100)
    .with_persona("mixed")
    .for_months(6)
    .with_fraud(rate=0.03)
    .generate()
)
```

Fraud + anomaly + risk scoring:

```python
from finforge import DatasetGenerator

df = (
    DatasetGenerator(seed=42)
    .with_users(500)
    .with_persona("mixed")
    .for_months(12)
    .with_fraud(rate=0.03)
    .with_anomalies(rate=0.05)
    .with_risk_scoring()
    .generate()
)
```

## Personas

Supported personas:

- `student`
- `salaried`
- `freelancer`
- `business_owner`
- `household`
- `retired`
- `mixed`

Mixed mode supports all v2 personas. When `user_count` is at least the number of supported personas, FinForge guarantees at least one user per persona. Remaining users are assigned using deterministic weighted distribution, so the same seed and config produce the same persona mix.

## Fraud simulation

Supported fraud types:

- `card_fraud`
- `account_takeover`
- `mule_account`
- `refund_abuse`
- `business_invoice_fraud`

Examples:

```python
df = (
    DatasetGenerator(seed=42)
    .with_users(500)
    .with_persona("mixed")
    .for_months(12)
    .with_fraud(
        rate=0.03,
        types=[
            "card_fraud",
            "account_takeover",
            "mule_account",
            "refund_abuse",
            "business_invoice_fraud",
        ],
        severity="medium",
    )
    .generate()
)
```

Persona-aware behavior includes:

- Student: smaller late-night wallet drain, gaming, gift-card, and account-takeover patterns
- Salaried: salary-account drain, electronics fraud, and high-value transfer abuse
- Freelancer: suspicious payouts, platform-style anomalies, and fake vendor/service expenses
- Business owner: invoice abuse, fake supplier payments, round-number vendor anomalies
- Household: unusual shopping, insurance, or family-account payment anomalies
- Retired: phishing-style transfers and healthcare scam deviations

## Anomaly simulation

Anomalies are suspicious but not confirmed fraud.

Supported anomaly types:

- `unusual_amount`
- `unusual_time`
- `unusual_merchant`
- `unusual_category`
- `velocity_spike`
- `balance_drain`
- `income_spike`

```python
df = (
    DatasetGenerator(seed=42)
    .with_users(100)
    .with_persona("mixed")
    .for_months(6)
    .with_anomalies(rate=0.05)
    .generate()
)
```

## Risk scoring

FinForge includes deterministic rule-based transaction risk scoring.

```python
df = (
    DatasetGenerator(seed=42)
    .with_users(100)
    .with_persona("mixed")
    .for_months(6)
    .with_fraud(rate=0.03)
    .with_risk_scoring()
    .generate()
)
```

Risk output includes:

- `risk_score` from `0.0` to `1.0`
- `risk_level` in `low`, `medium`, `high`, `critical`
- `risk_reasons` such as:
  - `amount_spike`
  - `odd_hour`
  - `new_merchant`
  - `new_category`
  - `velocity_spike`
  - `balance_drain`
  - `rapid_in_out_transfer`
  - `refund_pattern`
  - `suspicious_vendor`
  - `business_invoice_anomaly`
  - `healthcare_scam_pattern`

## Fraud/anomaly metadata

v3 adds the following columns:

- `is_fraud`
- `fraud_type`
- `fraud_scenario_id`
- `fraud_stage`
- `fraud_severity`
- `fraud_pattern`
- `fraud_start_time`
- `risk_score`
- `risk_level`
- `risk_reasons`
- `is_anomaly`
- `anomaly_type`
- `anomaly_score`

These columns always exist, even when fraud and anomalies are disabled.

## Summary utilities

```python
from finforge import DatasetGenerator
from finforge.analysis import fraud_summary

df = (
    DatasetGenerator(seed=42)
    .with_users(500)
    .with_persona("mixed")
    .for_months(12)
    .with_fraud(rate=0.03)
    .with_anomalies(rate=0.05)
    .with_risk_scoring()
    .generate()
)

print(fraud_summary(df))
```

The summary utility reports:

- total transactions
- fraud transactions and fraud rate
- fraud by type
- fraud by persona
- anomaly count and anomaly rate
- risk level distribution
- average risk score by fraud/non-fraud
- top risk reasons

## ML-ready feature extraction

```python
from finforge.features import build_fraud_features

X, y = build_fraud_features(df)
```

The helper returns:

- `X`: pandas `DataFrame`
- `y`: pandas `Series`

Feature columns include amount, hour, balance, recurring/discretionary flags, business/tax flags, anomaly and risk scores, and encoded categorical fields such as persona, category, account type, and transaction type.

## Architecture

Core simulation:

- `finforge.core`
- `finforge.personas`
- `finforge.generators`
- `finforge.merchants`
- `finforge.behavior`
- `finforge.dataset`

v3 extensions:

- `finforge.fraud`
- `finforge.anomaly`
- `finforge.risk`
- `finforge.analysis`
- `finforge.features`

Fraud and anomaly injection happen after the baseline transaction dataset is generated. Balances are recomputed after injection so chronological integrity is preserved.

## Examples

See the examples in [examples](/Users/shivangishukla/Documents/resumeProjects/finforge/examples):

- [fraud_card_fraud.py](/Users/shivangishukla/Documents/resumeProjects/finforge/examples/fraud_card_fraud.py)
- [fraud_account_takeover.py](/Users/shivangishukla/Documents/resumeProjects/finforge/examples/fraud_account_takeover.py)
- [fraud_mule_account.py](/Users/shivangishukla/Documents/resumeProjects/finforge/examples/fraud_mule_account.py)
- [fraud_business_invoice.py](/Users/shivangishukla/Documents/resumeProjects/finforge/examples/fraud_business_invoice.py)
- [anomaly_generation.py](/Users/shivangishukla/Documents/resumeProjects/finforge/examples/anomaly_generation.py)
- [fraud_dataset_for_ml.py](/Users/shivangishukla/Documents/resumeProjects/finforge/examples/fraud_dataset_for_ml.py)
- [fraud_summary_demo.py](/Users/shivangishukla/Documents/resumeProjects/finforge/examples/fraud_summary_demo.py)
- [persona_comparison_v2.py](/Users/shivangishukla/Documents/resumeProjects/finforge/examples/persona_comparison_v2.py)

## Testing guarantees

The test suite covers:

- v1/v2 backward compatibility
- fraud injection and scenario grouping
- anomaly generation
- risk score bounds and relative ordering
- balance integrity after fraud injection
- chronological ordering after fraud injection
- simulation timestamp range safety
- mixed persona guarantees
- seed reproducibility
- feature helper outputs

Run tests with:

```bash
pytest
```

## Why FinForge is different

FinForge focuses on persistent financial behavior over time:

- behavioral continuity instead of isolated fake rows
- temporal balance realism
- persona-aware cashflow and business behavior
- configurable fraud deviations on top of realistic normal activity
- deterministic reproducibility for QA, analytics, and ML experimentation

## Changelog

See [CHANGELOG.md](/Users/shivangishukla/Documents/resumeProjects/finforge/CHANGELOG.md).

## License

MIT
