Metadata-Version: 2.4
Name: hdb-valuation-engine
Version: 0.4.3
Summary: Singapore Public Housing (HDB) Valuation Engine using geospatial accessibility scoring.
License: MIT
License-File: LICENSE
Keywords: HDB,Singapore,real-estate,valuation,geospatial,MRT,KDTree,transport,LRT,data-analysis
Author: Mansib "Xenix" Miraj
Requires-Python: >=3.10
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: GIS
Provides-Extra: dev
Requires-Dist: bandit[toml] (>=1.7) ; extra == "dev"
Requires-Dist: black (>=24.0) ; extra == "dev"
Requires-Dist: isort (>=5.12) ; extra == "dev"
Requires-Dist: mypy (>=1.0) ; extra == "dev"
Requires-Dist: myst-parser (>=2.0) ; extra == "dev"
Requires-Dist: numpy (>=1.26,<2.0)
Requires-Dist: pandas (>=2.1)
Requires-Dist: pandas-stubs (>=2.0) ; extra == "dev"
Requires-Dist: pyarrow (>=16.0) ; extra == "dev"
Requires-Dist: pytest (>=7.0) ; extra == "dev"
Requires-Dist: pytest-cov (>=4.0) ; extra == "dev"
Requires-Dist: requests (>=2.31)
Requires-Dist: rich (>=13.7)
Requires-Dist: ruff (>=0.6.0) ; extra == "dev"
Requires-Dist: safety (>=3.0) ; extra == "dev"
Requires-Dist: scikit-learn (>=1.2)
Requires-Dist: sphinx (>=7.0) ; extra == "dev"
Requires-Dist: sphinx-autodoc-typehints (>=2.0) ; extra == "dev"
Requires-Dist: sphinx-immaterial (>=0.11.0) ; extra == "dev"
Requires-Dist: types-requests (>=2.31) ; extra == "dev"
Project-URL: Homepage, https://github.com/theMansib/hdb-valuation-engine
Project-URL: Repository, https://github.com/theMansib/hdb-valuation-engine
Description-Content-Type: text/markdown

# HDB Valuation Engine 🇸🇬

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/theMansib/HDB-Valuation-Engine/blob/main/notebooks/01_quickstart_tutorial.ipynb)

[![CI](https://github.com/theMansib/HDB-Valuation-Engine/workflows/CI/badge.svg)](https://github.com/theMansib/HDB-Valuation-Engine/actions/workflows/ci.yml)
[![Documentation](https://github.com/theMansib/HDB-Valuation-Engine/workflows/Documentation/badge.svg)](https://github.com/theMansib/HDB-Valuation-Engine/actions/workflows/docs.yml)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Coverage](https://img.shields.io/badge/coverage-66%25-yellow.svg)](https://github.com/theMansib/HDB-Valuation-Engine)

A quantitative tool for identifying undervalued Singapore public housing assets using spatial data analysis.

### **The Real-World Problems**
1. **The LRT Deception:** Commercial portals treat LRT (Light Rail) and MRT (Heavy Rail) as equal. This is false. LRT loops add significant commute latency. Buyers need a metric that rewards **True Connectivity**.
2. **The Lease Illusion:** Buyers often fixate on raw price, ignoring lease decay. A 'cheap' flat with 50 years remaining is often a worse asset than a pricier unit with 95 years.
3. **Data Overload:** With thousands of transactions, manual comparisons are impossible. Buyers need statistical anomaly detection, not just a search bar.

### **The Engineering Solution**
This engine ingests historical transaction data to calculate a **'True Value Score'** for every flat.
* **LRT-Exclusion Algorithm:** Uses Regex filtering and KDTree spatial indexing to calculate walking distance strictly to **MRT** nodes.
* **Depreciation Logic:** Normalizes price against remaining lease life to find the true cost of ownership.
* **Z-Score Ranking:** Identifies properties trading 2 deviations below their cluster average.

---

## Key Features
- **Dual Interface:** Use as a CLI tool OR as a Python module in your own projects
- **🎨 Beautiful CLI (v0.4.1+):** Rich terminal output with colored tables, progress spinners, and visual feedback
- Strict OOP pipeline with type hints and logging
- Robust lease parsing and inference (handles text and infers from `lease_commence_date` + `month`)
- Lease-adjusted price efficiency metric and group-wise Z-Score valuation
- Extended filters (exact/partial) and numeric ranges
- TransportScorer with KDTree and strict LRT exclusion (regex `^(BP|S[WE]|P[WE])`)
- Export to CSV/JSON/Parquet; optional full export
- Configurable peer grouping via `--group-by`
- Caching for fast repeated transport queries; cache management subcommand

## Algorithm Overview

### Core Valuation Pipeline

1. **Lease parsing/inference**
   - Parse `remaining_lease` strings to float years (e.g., `61 years 04 months` → `61.33`).
   - If absent, infer years: `remaining_years = 99 - ((YYYY + (MM-1)/12) - lease_commence_year)`.

2. **Bala's Curve: Non-Linear Lease Depreciation** (v0.3.0+)
   - **Why it matters:** HDB leases don't lose value linearly. A flat with 80 years remaining holds almost full value, while one with 30 years faces steep depreciation. Traditional linear models miss this critical market behavior.
   - **Mathematical model:** `depreciation_factor = exp(-k × ((99 - remaining) / 99)^n)`
     - Default parameters: `k=3.0` (decay rate), `n=2.5` (curve steepness)
   - **Real-world behavior:**
     - 99 years → factor = 1.00 (no depreciation)
     - 80 years → factor = 0.95 (minimal depreciation, ~5% loss)
     - 60 years → factor = 0.75 (moderate depreciation, ~25% loss)
     - 40 years → factor = 0.44 (accelerating depreciation, ~56% loss)
     - 20 years → factor = 0.18 (severe depreciation, ~82% loss)
   - **Impact:** Properties with shorter leases get penalized more heavily in valuation, reflecting true market economics and helping buyers avoid "cheap but depreciating" traps.
   - **Academic foundation:** Based on Bala's studies on Singapore HDB lease decay and observed market behavior in resale transactions.

3. **Price efficiency (lease-adjusted)**
   - Base: `price_efficiency = resale_price / (floor_area_sqm × remaining_lease_years)`
   - Adjusted: `price_efficiency_adjusted = base_efficiency / depreciation_factor`
   - Lower values indicate better cost per effective area-year (better value)

4. **Group-wise Z-Score**
   - Group by `--group-by` (default: `town`, `flat_type`) and compute `z = (x - μ) / σ`.
   - Identifies statistical outliers within peer groups (e.g., cheap 4-ROOM flats in PUNGGOL compared to other 4-ROOM PUNGGOL flats)

5. **Valuation score**
   - `valuation_score = -z_price_efficiency` (higher → more undervalued relative to peers).
   - Combined with growth potential analysis to identify "deep value" opportunities

6. **Transport accessibility** (optional)
   - Compute nearest MRT exit distance (LRT excluded) and `Accessibility_Score = max(0, 10 - 2 × dist_km)`.
   - By default adjusts price_efficiency; use `--no-accessibility-adjust` for analysis-only.

## System Architecture & Data Flow

```mermaid
flowchart TB
    subgraph Input["📊 Data Sources"]
        A1[HDB Resale CSV<br/>~200K+ Records]
        A2[LTA MRT Station<br/>GeoJSON API]
    end

    subgraph Pipeline["🔄 Processing Pipeline"]
        B1[HDBLoader<br/>Schema Normalization]
        B2[FeatureEngineer<br/>Lease Parsing & Inference]
        B3[TransportScorer<br/>KDTree Spatial Indexing]
        B4[ValuationEngine<br/>Statistical Scoring]
        B5[ReportGenerator<br/>Filtering & Ranking]
    end

    subgraph Algorithms["🧮 Core Algorithms"]
        C1["Lease Depreciation Model<br/>remaining = 99 - (txn_year - commence_year)"]
        C2["Price Efficiency<br/>PE = price / (area × lease_years)"]
        C3["LRT Exclusion Filter<br/>Regex: ^(BP|S[WE]|P[WE])"]
        C4["KDTree Nearest Neighbor<br/>O(log n) Spatial Query"]
        C5["Haversine Distance<br/>Great-Circle Calculation"]
        C6["Group-wise Z-Score<br/>z = (x - μ) / σ<br/>within (town, flat_type) cohorts"]
        C7["Accessibility Score<br/>AS = max(0, 10 - 2×dist_km)"]
        C8["Valuation Score<br/>VS = -z_PE × (1 + AS/10)"]
    end

    subgraph Output["📈 Outputs"]
        D1[Ranked DataFrame<br/>Top-N Undervalued Units]
        D2[CLI Report<br/>Formatted Table]
        D3[Export Files<br/>CSV/JSON/Parquet]
        D4[Programmatic API<br/>Python Module Integration]
    end

    A1 --> B1
    A2 --> B3
    B1 --> B2
    B2 --> C1
    B2 --> C2
    C1 --> B4
    C2 --> B4
    B3 --> C3
    C3 --> C4
    C4 --> C5
    C5 --> C7
    B4 --> C6
    C6 --> C8
    C7 --> C8
    C8 --> B5
    B5 --> D1
    D1 --> D2
    D1 --> D3
    D1 --> D4

    style Input fill:#e1f5ff,stroke:#01579b,stroke-width:2px
    style Pipeline fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    style Algorithms fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style Output fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px

    style C4 fill:#ffeb3b,stroke:#f57f17,stroke-width:3px
    style C6 fill:#ffeb3b,stroke:#f57f17,stroke-width:3px
    style C8 fill:#ffeb3b,stroke:#f57f17,stroke-width:3px
```

### Technical Highlights

**🎯 Statistical Rigor**
- Group-wise Z-score normalization ensures fair peer comparison across 26 towns and 7 flat types
- Robust handling of zero-variance groups and missing data
- Mathematical foundation allows for reproducible, bias-free property valuation

**🗺️ Geospatial Innovation**
- KDTree spatial indexing enables O(log n) nearest-neighbor queries on 200K+ properties
- Haversine distance calculation accounts for Earth's curvature (±0.5% accuracy)
- Regex-based LRT exclusion (BP/SW/SE/PW/PE lines) ensures only heavy rail stations are considered
- Caching mechanism reduces repeated spatial queries from minutes to milliseconds

**💰 Financial Modeling**
- Lease depreciation model adjusts for Singapore's 99-year leasehold system
- Time-value-of-money consideration through remaining lease normalization
- Price efficiency metric captures $/sqm/year for true cost-of-ownership analysis

**🏗️ Software Engineering**
- Object-oriented pipeline with strict type hints (PEP 484 compliant)
- Dual interface: CLI for analysts, Python API for integration
- 66% test coverage with 26/26 tests passing
- Comprehensive logging and error handling for production reliability

## Installation

From PyPI:
```
pip install hdb-valuation-engine
```

From source (recommended in a virtual environment):
```
python3 -m venv .venv
source .venv/bin/activate  # Windows: .venv\\Scripts\\Activate.ps1
pip install -r requirements.txt
```

## Quick Start

Platform-agnostic data fetching (no Make needed):
- Fetch all supported datasets (HDB resale CSV + MRT exits GeoJSON):
```
hdb-valuation-engine fetch
```
- Fetch entire HDB resale dataset (no row limit) plus MRT exits:
```
hdb-valuation-engine fetch --limit 0
```
- Only MRT exits to a custom path:
```
hdb-valuation-engine fetch --datasets mrt --mrt-out .data/LTAMRTStationExitGEOJSON.geojson
```
- Only HDB resale CSV with 10k rows to default location:
```
hdb-valuation-engine fetch --datasets resale --limit 10000
```

**Module usage (NEW - Clean Python API):**
```python
from hdb_valuation_engine import HDBValuationEngineApp

# Initialize the engine
app = HDBValuationEngineApp()

# Process data and get results
results = app.process(
    input_path="ResaleFlatPrices/Resale flat prices based on registration date from Jan-2017 onwards.csv",
    town="PUNGGOL",
    budget=600000,
    top_n=5
)

# Results is a pandas DataFrame
print(results)
print(f"\nFound {len(results)} undervalued properties")

# Access specific columns
for idx, row in results.iterrows():
    print(f"{row['town']}, {row['flat_type']}: ${row['resale_price']:,.0f} (Score: {row['valuation_score']:.2f})")
```

**With MRT accessibility (default):**
```python
from hdb_valuation_engine import HDBValuationEngineApp

app = HDBValuationEngineApp()

# MRT scoring runs by default when the default MRT dataset is present
results = app.process(
    input_path="resale.csv",
    town="BISHAN",
    budget=800000,
    top_n=10
)

# Results include MRT distance and accessibility scores
print(results[["town", "resale_price", "Nearest_MRT", "Dist_m", "Accessibility_Score", "valuation_score"]])
```

To use a custom MRT catalog:
```python
results = app.process(
    input_path="resale.csv",
    mrt_catalog=".data/LTAMRTStationExitGEOJSON.geojson",
    town="BISHAN",
    budget=800000,
    top_n=10
)
```

**See `EXAMPLES.md` for 10+ comprehensive usage examples including:**
- Using pre-loaded DataFrames
- Custom grouping and filters
- Exporting results
- Using individual pipeline components
- Integration with Flask/web APIs

---

## 📚 Interactive Jupyter Tutorials

Learn the HDB Valuation Engine through hands-on interactive notebooks:

### [notebooks/01_quickstart_tutorial.ipynb](notebooks/01_quickstart_tutorial.ipynb) 🚀
**15-minute beginner tutorial**
- Loading and processing HDB data
- Running valuation analysis with filters
- Visualizing Bala's Curve depreciation
- Understanding valuation scores
- Exporting results

### [notebooks/02_advanced_analysis.ipynb](notebooks/02_advanced_analysis.ipynb) 📊
**30-minute intermediate deep dive**
- Transport accessibility and MRT proximity analysis
- Peer group strategies and z-score interpretation
- Statistical outlier detection
- Cross-town market comparisons
- Custom filtering workflows

### [notebooks/03_custom_workflows.ipynb](notebooks/03_custom_workflows.ipynb) 🛠️
**20-minute advanced customization**
- Building custom analysis pipelines
- Experimenting with depreciation models
- Creating rental yield estimators
- Multi-scenario batch processing
- Property comparison tools

**[See notebooks/README.md](notebooks/README.md) for full documentation, installation guide, and learning paths.**

---

## 🎨 Rich CLI Interface (v0.4.1+)

The HDB Valuation Engine features a beautiful, modern command-line interface powered by the Rich library:

### ✨ Visual Features

**Before (v0.4.0):**
```
Processing complete.
town        flat_type    resale_price    floor_area_sqm
PUNGGOL     2 ROOM       225000          50.0
BISHAN      4 ROOM       550000          90.0
```

**After (v0.4.1):**
```
┌───────────────────────────────────────────────────────────────────────────┐
│ HDB Valuation Engine v0.4.1                                               │
│ Identifying undervalued properties using Bala's Curve & transport scoring │
└───────────────────────────────────────────────────────────────────────────┘
⠙ Processing data...

Filters: Town: PUNGGOL | Budget: $600,000
Found 15 properties

                     🏠 Top 10 Undervalued HDB Properties
┌────────┬─────────┬───────────┬──────────────┬─────────────┬──────────┬────────────┬────────┐
│   Rank │ Town    │ Flat Type │ Address      │       Price │ Area (m²)│ Lease (yrs)│  Score │
├────────┼─────────┼───────────┼──────────────┼─────────────┼──────────┼────────────┼────────┤
│      1 │ PUNGGOL │ 4 ROOM    │ 310A Pungg…  │    $450,000 │     90.0 │       85.2 │   2.34 │
│      2 │ PUNGGOL │ 4 ROOM    │ 268C Pungg…  │    $475,000 │     92.0 │       89.5 │   1.87 │
│      3 │ PUNGGOL │ 3 ROOM    │ 110 Edgef…   │    $385,000 │     67.0 │       82.1 │   1.65 │
└────────┴─────────┴───────────┴──────────────┴─────────────┴──────────┴────────────┴────────┘

⠋ Exporting to CSV...
✓ Exported top 10 results to results.csv
```

### 🎯 Rich Features

- **Styled Tables:** Beautiful Unicode borders with color-coded scores
  - 🟢 **Bold Green:** Excellent value (score ≥ 2.0)
  - 🟢 **Green:** Good value (score ≥ 1.0)
  - 🟡 **Yellow:** Fair value (score ≥ 0)
  - ⚪ **Dim:** Below average (score < 0)

- **Progress Indicators:** Animated spinners for long operations
  - Data processing
  - Export operations
  - Cache building

- **Status Icons:** Clear visual feedback
  - ✓ Success messages
  - ⚠ Warnings
  - ✗ Errors

- **Smart Formatting:**
  - Prices: `$450,000` (thousands separators)
  - Areas: `90.0 m²` (decimal precision)
  - Scores: `2.34` (2 decimal places)

- **Filter Summaries:** See your active filters at a glance
- **Result Counts:** Know exactly how many properties match

### Example Usage

```bash
# Basic search with Rich output
hdb-valuation-engine --town PUNGGOL --budget 600000 --top 10

# With export and progress indicators
hdb-valuation-engine --town BISHAN --output results.csv --output-format csv

# Multiple filters with visual feedback
hdb-valuation-engine --town-like PUNGG --flat-type "4 ROOM" --budget 500000
```

CLI usage (after install):
```
hdb-valuation-engine --input "ResaleFlatPrices/Resale flat prices based on registration date from Jan-2017 onwards.csv" --top 5 -v
```

## Usage
```
hdb-valuation-engine --input <path/to/file.csv> [OPTIONS]
```

### Core options
- `--input` Path to HDB resale CSV data
- `--top` Number of results to display (default: 10)
- Logging: `-v` (INFO) or `-vv` (DEBUG)

### Filters
- Town: `--town PUNGGOL` (exact), `--town-like unggol` (partial)
- Flat Type: `--flat-type "5 ROOM"` (exact), `--flat-type-like room` (partial)
- Flat Model: `--flat-model "Improved"` (exact), `--flat-model-like improv` (partial)
- Storey: `--storey-min 7 --storey-max 12`
- Area (sqm): `--area-min 60 --area-max 120`
- Remaining Lease (years): `--lease-min 60 --lease-max 95`
- Budget (max `resale_price`): `--budget 600000`

### Grouping (peer comparison)
```
--group-by town flat_type [flat_model]
```

## Transport Accessibility (MRT via GeoJSON)
- Fast, cached KDTree for nearest MRT exit queries (10k+ rows). Cache saved under `.cache_transport/`.
- Provide LTA MRT Station Exit GeoJSON to enable accessibility scoring:
```
# You can fetch a current GeoJSON via the built-in fetcher
hdb-valuation-engine fetch --datasets mrt --mrt-out .data/LTAMRTStationExitGEOJSON.geojson

# Then reference it when running valuations
--mrt-catalog .data/LTAMRTStationExitGEOJSON.geojson
```
- Excludes LRT strictly via regex `^(BP|S[WE]|P[WE])` and filters names containing `LRT` as a fallback.
- Adds:
  - `Nearest_MRT`
  - `Dist_m`
  - `Accessibility_Score = max(0, 10 - 2 * dist_km)`
- Analysis-only mode (no adjustment):
```
--no-accessibility-adjust
```

### Exporting
```
--output top10.csv --output-format csv            # CSV (default)
--output top10.json --output-format json          # JSON
--output top10.parquet --output-format parquet    # Parquet (falls back to CSV if engine missing)
--export-full                                     # Export all filtered rows instead of Top-N
```

## Quick Usage Examples

0) Cache management subcommand
```
# Show cache directory
hdb-valuation-engine cache -v

# Clear cache in default location
hdb-valuation-engine cache --clear -v

# Use a custom cache dir
hdb-valuation-engine cache --transport-cache-dir .transport_cache --clear -v
```

1) Build and cache KDTree from LTA GeoJSON; show Top-10 with adjustment:
```
hdb-valuation-engine \
  --input "ResaleFlatPrices/Resale flat prices based on registration date from Jan-2017 onwards.csv" \
  --mrt-catalog ".data/LTAMRTStationExitGEOJSON.geojson" \
  --top 10 -v
```

2) Use cached KDTree on subsequent runs (faster); analysis-only mode (no price adjustment):
```
hdb-valuation-engine \
  --input "ResaleFlatPrices/Resale flat prices based on registration date from Jan-2017 onwards.csv" \
  --mrt-catalog ".data/LTAMRTStationExitGEOJSON.geojson" \
  --no-accessibility-adjust --top 10 -v
```

3) Custom cache directory and force clear before building:
```
hdb-valuation-engine \
  --input "ResaleFlatPrices/Resale flat prices based on registration date from Jan-2017 onwards.csv" \
  --mrt-catalog ".data/LTAMRTStationExitGEOJSON.geojson" \
  --transport-cache-dir ".transport_cache" --clear-transport-cache --top 5 -v
```

4) CSV catalog path (still supported; auto-excludes LRT lines):
```
hdb-valuation-engine \
  --input "ResaleFlatPrices/Resale flat prices based on registration date from Jan-2017 onwards.csv" \
  --mrt-catalog "/path/to/mrt_catalog.csv" --top 5 -v
```

5) Combine with group-by and export options:
```
hdb-valuation-engine \
  --input "ResaleFlatPrices/Resale flat prices based on registration date from Jan-2017 onwards.csv" \
  --mrt-catalog ".data/LTAMRTStationExitGEOJSON.geojson" \
  --group-by town flat_type flat_model \
  --export-full --output top.json --output-format json --top 10 -v
```

## Smoke Test Summary
- 2017 onwards: Parsed `remaining_lease` strings successfully; produced Top-10 Punggol table under budget 600k. Export worked.
- 2012–2014: Inferred remaining lease from `lease_commence_date` and `month`; produced Top-10 Punggol table.
- 2000–Feb 2012: Inference path also worked; produced Top-5 for Ang Mo Kio under budget 200k.
- Extended filters and partial matching verified; `--output`, `--export-full`, and `--output-format` worked as expected.

## Design & Implementation Notes
- Columns normalized to lowercase with underscores
- Robust z-score handling for zero-variance groups
- Logging across load, feature engineering, scoring, filtering, and export

## Release and Tagging

To create a 0.1.0 release and push the tag:
```
git add -A
git commit -m "chore(release): cut 0.1.0"

git tag -a v0.1.0 -m "Initial PyPI packaging for hdb-valuation-engine"

git push origin main
git push origin v0.1.0
```

## Running Tests

Note: You can fetch data without Make on any platform using the built-in fetch command:
```
# Fetch all datasets with defaults
hdb-valuation-engine fetch

# Fetch entire resale CSV and MRT exits
hdb-valuation-engine fetch --limit 0
```

- Recommended: use a virtual environment
```
python3 -m venv .venv
source .venv/bin/activate  # Windows: .venv\\Scripts\\Activate.ps1
pip install -r requirements.txt
pytest -q
```

### Optional dataset for an extra smoke test
One test is skipped by default unless a local dataset is available. To enable it:
- Create a folder named `ResaleFlatPrices` at the repository root (same level as `tests/` and `README.md`).
- Place one or more HDB resale CSV files inside that folder, for example:
  - `Resale flat prices based on registration date from Jan-2017 onwards.csv`

You can fetch a small sample automatically with:
```
make setup-venv          # one-time environment setup
make fetch-sample-data   # downloads a subset into ./ResaleFlatPrices/
```

When this folder exists and contains at least one `.csv` file, the optional smoke test in `tests/test_cli_export.py::TestOptionalRealDataset` will run. If the folder is missing or empty, the test is skipped with reason:

```
ResaleFlatPrices folder not present; skipping optional smoke test
```

---

## ⚠️ Current Limitations

While the HDB Valuation Engine provides sophisticated quantitative analysis, users should be aware of the following limitations in our approach and available data:

### 📊 Data & Methodology Constraints

**1. Historical Data Only**
- **Limitation:** Analysis is based on past transactions from data.gov.sg
- **Impact:** Cannot predict future market conditions, policy changes, or economic shifts
- **Mitigation:** Use as one input among many; combine with market research and professional advice

**2. Bala's Curve Parameterization**
- **Limitation:** Default parameters (k=3.0, n=2.5) are empirically derived but not officially validated
- **Impact:** Depreciation curve may not perfectly match individual property circumstances
- **Mitigation:** Parameters are configurable; users can adjust based on their research or domain expertise

**3. Incomplete Quality Metrics**
- **Missing factors we cannot quantify:**
  - Unit condition and renovation status
  - View quality (facing, unblocked)
  - Noise levels (traffic, construction)
  - Unit position within block (corner, middle)
  - Block facilities (lift landing, accessibility)
  - Estate maturity and community amenities
  - Upcoming infrastructure developments
- **Impact:** Two flats with identical specs may have vastly different actual value
- **Mitigation:** Use tool for initial screening; conduct physical inspections before decisions

**4. MRT Accessibility Simplification**
- **Limitation:** Distance-to-MRT is Euclidean (straight-line), not walking distance
- **Missing factors:**
  - Bus connectivity and frequency
  - Actual walking paths and obstacles
  - MRT line quality differences (Express vs regular)
  - Future MRT line plans
- **Impact:** Score may over/undervalue properties based on real commute experience
- **Mitigation:** Visit properties and test actual commute times

**5. Static Peer Grouping**
- **Limitation:** Z-scores compare within (town, flat_type) only by default
- **Impact:** May miss value opportunities across towns or compare incomparable properties
- **Mitigation:** Use custom `--group-by` parameters; analyze multiple groupings

### 🛠️ Technical & Data Limitations

**6. Schema Assumptions**
- **Limitation:** Expects standardized column names from data.gov.sg format
- **Impact:** May fail or produce incorrect results with differently structured data
- **Mitigation:** Review `Schema` class in `loader.py`; customize if needed

**7. No Ground Truth Validation**
- **Limitation:** We cannot verify if "undervalued" properties actually become good investments
- **Impact:** High valuation score ≠ guaranteed good deal
- **Mitigation:** This is a screening tool, not investment advice

**8. Outlier Sensitivity**
- **Limitation:** Z-scores can be skewed by extreme outliers in small peer groups
- **Impact:** Unusual transactions can distort valuations for entire groups
- **Mitigation:** Review raw data; filter by sample size; use multiple grouping strategies

**9. No Macroeconomic Context**
- **Missing factors:**
  - Interest rate environment
  - Government housing policies (grants, restrictions)
  - Economic cycles and unemployment
  - Population growth and immigration trends
- **Impact:** Tool cannot warn about systemic overvaluation or market timing
- **Mitigation:** Consult economic indicators and professional financial advisors

### 🏗️ Singapore-Specific Constraints

**10. HDB-Only Focus**
- **Limitation:** Does not cover private condos, landed property, or commercial real estate
- **Impact:** Cannot compare HDB vs private housing value propositions
- **Mitigation:** Use specialized tools for private property analysis

**11. Policy Change Risk**
- **Limitation:** Cannot predict changes to:
  - Lease Buyback Scheme eligibility
  - Voluntary Early Redevelopment Scheme (VERS)
  - CPF usage rules
  - Resale levy structures
- **Impact:** Tool may not capture full financial picture of HDB ownership
- **Mitigation:** Stay updated on HDB policies and consult with HDB directly

**12. No Rental Yield Analysis**
- **Limitation:** Does not estimate rental income potential or investment yields
- **Impact:** Cannot advise on buy-to-rent strategies
- **Mitigation:** Use separate rental market analysis tools

---

### 🎯 Recommended Usage

**This tool is best used as:**
- ✅ Initial screening to identify potentially undervalued properties
- ✅ Quantitative input to supplement qualitative research
- ✅ Learning tool to understand lease depreciation and market dynamics
- ✅ Comparative analysis within similar property cohorts

**This tool should NOT be used as:**
- ❌ Sole basis for property purchase decisions
- ❌ Investment advice or financial recommendations
- ❌ Replacement for professional valuation services
- ❌ Predictor of future appreciation or returns

**Always combine with:**
- Physical property inspections
- Professional property agents and valuers
- Financial advisors for affordability analysis
- Legal consultation for transaction structure
- Personal circumstances and long-term plans

---

