Metadata-Version: 2.4
Name: valuein-sdk
Version: 0.5.4
Summary: Official Python SDK for the Valuein US Core Fundamentals dataset — SEC EDGAR financials via API.
Project-URL: Homepage, https://valuein.biz
Project-URL: Documentation, https://valuein.biz/docs
Project-URL: Repository, https://github.com/valuein/quants
Project-URL: Bug Tracker, https://github.com/valuein/quants/issues
Project-URL: Changelog, https://github.com/valuein/quants/blob/main/CHANGELOG.md
Author-email: Valuein <support@valuein.biz>
License: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: EDGAR,SEC,backtesting,duckdb,finance,fundamentals,parquet,quant
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Office/Business :: Financial :: Investment
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: duckdb>=1.1.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: python-dotenv>=1.0.0
Provides-Extra: research
Requires-Dist: ipykernel>=6.29.0; extra == 'research'
Requires-Dist: jupytext>=1.16.0; extra == 'research'
Requires-Dist: matplotlib>=3.8.0; extra == 'research'
Requires-Dist: numpy>=1.26.0; extra == 'research'
Provides-Extra: test
Requires-Dist: pytest-cov>=4.1.0; extra == 'test'
Requires-Dist: pytest>=9.0.0; extra == 'test'
Description-Content-Type: text/markdown

# Valuein Quants

[![PyPI version](https://img.shields.io/pypi/v/valuein-sdk)](https://pypi.org/project/valuein-sdk/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://pypi.org/project/valuein-sdk/)
[![License](https://img.shields.io/badge/license-Apache%202.0-green)](LICENSE)
[![Tests](https://github.com/valuein/quants/actions/workflows/ci.yml/badge.svg)](https://github.com/valuein/quants/actions)

**Point-in-Time accurate. Survivorship-bias free. 105M+ facts from 1990 to present.**

US fundamental data from SEC EDGAR — 10-K, 10-Q, 8-K, 20-F and amendments — covering 10,000+ active and delisted entities. Cleaned, standardized, and queryable via SQL with zero local downloads.

---

## What You Can Do With This Repository

| Use Case | Who | Where to Start |
|---|---|---|
| **Query financial data via Python** | Quants, data engineers | [Quickstart](#quickstart) |
| **Run 39 pre-built financial signals** | Analysts, quants | [SQL Templates](#sql-templates) |
| **Learn with interactive notebooks** | Students, new users | [Examples & Notebooks](#examples--notebooks) |
| **Pull data into Excel** | Financial analysts | [Excel Integration](#excel-integration) |
| **Prove data quality to stakeholders** | Institutional buyers, compliance | [Research & Quality Proofs](#research--quality-proofs) |
| **Read methodology and compliance docs** | Due diligence, enterprise | [Documentation](#documentation) |
| **Contribute templates, examples, research** | Open-source contributors | [Contributing](#contributing) |

---

## Quickstart

**1. Install**

```bash
pip install valuein-sdk
```

**2. Get a free API token** (S&P 500 coverage, no credit card required)

→ [Register at valuein.biz](https://buy.stripe.com/3cI28qgx81Og66weXgco004)

**3. Set your key and run**

```bash
export VALUEIN_API_KEY="your_token"
```

```python
from valuein_sdk import ValueinClient

client = ValueinClient()
print(client)
# ValueinClient(plan='sp500', status='active', snapshot='2026-03-14', tables=6)

me = client.me()
print(me["plan"], me["email"])
```

---

## Data Plans

| Plan | Entities | Coverage | Price |
|---|---|---|---|
| **S&P 500** | ~605 current + historical members, full history | All 6 tables, all columns | Free — [Register](https://buy.stripe.com/3cI28qgx81Og66weXgco004) |
| **US Core Fundamentals** | 10,000+ active and delisted | Full universe 1990–present, PIT, restatements | Subscription — [Buy now](https://buy.stripe.com/5kQ3cudkW0Kc3Yo02mco005) |

---

## Use Case 1 — Python SDK

Install `valuein-sdk`, authenticate once, run any SQL against the data lake. No downloads. No local database. DuckDB executes your queries in-process against Parquet files on Cloudflare R2.

### Ticker lookup

```python
from valuein_sdk import ValueinClient

client = ValueinClient(tables=["entity", "security"])

df = client.query("""
    SELECT e.cik, e.name, e.sector, e.status,
           s.symbol, s.exchange
    FROM   security s
    JOIN   entity   e ON s.entity_id = e.cik
    WHERE  s.symbol = 'AAPL' AND s.is_active = TRUE
""")
print(df)
```

### Revenue trends

```python
client = ValueinClient(tables=["entity", "security", "filing", "fact"])

df = client.query("""
    SELECT fa.fiscal_year,
           round(fa.numeric_value / 1e9, 2) AS revenue_bn
    FROM   fact     fa
    JOIN   filing   f  ON fa.accession_id = f.accession_id
    JOIN   security s  ON f.entity_id     = s.entity_id
    WHERE  s.symbol            = 'MSFT'
      AND  s.is_active         = TRUE
      AND  fa.standard_concept = 'Revenues'
      AND  f.form_type         = '10-K'
    ORDER  BY fa.fiscal_year DESC
    LIMIT  10
""")
print(df)
```

### Point-in-Time backtest (the most important pattern)

Filter by `filing_date`, not `report_date`. Apple's Q3 2023 ended Sep 30 but was filed Nov 3 — using `report_date` leaks 34 days of future data into your backtest.

```python
client = ValueinClient(tables=["security", "filing", "fact"])

TRADE_DATE = "2024-01-15"

df = client.query(f"""
    SELECT fa.standard_concept, fa.fiscal_year,
           f.filing_date,
           round(fa.numeric_value / 1e9, 2) AS value_bn
    FROM   fact     fa
    JOIN   filing   f  ON fa.accession_id = f.accession_id
    JOIN   security s  ON f.entity_id     = s.entity_id
    WHERE  s.symbol            = 'NVDA'
      AND  s.is_active         = TRUE
      AND  fa.standard_concept IN ('Revenues', 'NetIncomeLoss')
      AND  f.form_type         = '10-K'
      AND  f.filing_date      <= '{TRADE_DATE}'   -- PIT gate
    ORDER  BY f.filing_date DESC
""")
print(df)
```

### Date columns reference

| Column | Table | Use for |
|---|---|---|
| `report_date` / `period_end` | `filing` / `fact` | Aligning to fiscal calendar |
| `filing_date` | `filing` | **PIT backtest filter** — when the SEC received the filing |
| `knowledge_at` | `fact` | Millisecond-precision PIT for intraday signal research |

### API reference

```python
client = ValueinClient(
    api_key="...",       # or read from VALUEIN_API_KEY env var
    gateway_url="...",   # override for local dev only
    tables=["entity"],   # load specific tables; omit to load all
)

client.me()                           # dict: plan, status, email, createdAt
client.manifest()                     # dict: snapshot, last_updated, tables
client.health()                       # dict: ok, worker, env (no auth required)
client.query(sql)                     # DuckDB SQL → pandas DataFrame
client.get(table)                     # Download full table → pandas DataFrame
client.run_template(name, **kwargs)   # Named SQL template → pandas DataFrame
client.tables()                       # List loaded table names
```

### Error handling

```python
from valuein_sdk import (
    ValueinAuthError,      # HTTP 401/403 — invalid or expired token
    ValueinPlanError,      # HTTP 403 — endpoint requires a higher plan
    ValueinRateLimitError, # HTTP 429 — includes .retry_after (seconds)
    ValueinAPIError,       # HTTP 5xx — includes .status_code
)

try:
    df = client.query("SELECT * FROM fact LIMIT 1000000")
except ValueinRateLimitError as e:
    print(f"Rate limited. Retry in {e.retry_after}s")
except ValueinPlanError:
    print("Upgrade to Full plan for this data")
```

---

## Use Case 2 — SQL Templates

39 production-ready SQL templates bundled with the SDK. Run complex financial signals in one line — no SQL required.

```python
# DuPont decomposition for MSFT
df = client.run_template("16_dupont_decomposition", ticker="MSFT")

# Piotroski F-Score screen across S&P 500
df = client.run_template("17_piotroski_f_score", tickers="'AAPL', 'MSFT', 'NVDA'")

# Altman Z-Score (bankruptcy probability)
df = client.run_template("18_altman_z_score", ticker="TSLA")

# Trailing-twelve-months revenue
df = client.run_template("06_trailing_twelve_months_ttm", ticker="AMZN")
```

### Template categories

| Range | Category | Examples |
|---|---|---|
| 01–04 | Data Access | Fundamentals by ticker, FIGI lookup, peer comparison, survivorship-bias-free screen |
| 05–09 | Income Statement | YoY revenue growth, TTM, margin analysis, FCF, R&D intensity |
| 10–15 | Balance Sheet | Liquidity, solvency, interest coverage, cash conversion, capex ratios |
| 16–20 | Investment Scores | DuPont, Piotroski F-Score, Altman Z-Score, accruals anomaly |
| 21–26 | Valuation & Screening | Sector aggregates, peer ranking, dilution, arbitrage signals |
| 27–33 | Short Signals | Late filers, restatements, 8-K events, ghost companies |
| 34–39 | Advanced Analytics | PIT backtest engine, Z-score outliers, seasonality, XBRL audit |

See [`valuein_sdk/queries/SQL_CHEATSHEET.md`](valuein_sdk/queries/SQL_CHEATSHEET.md) for the full template reference.

---

## Use Case 3 — Examples & Notebooks

Six standalone Python scripts and four Jupyter notebooks, designed to go from install to insight in under 3 minutes.

### Python scripts (`examples/python/`)

| Script | Level | What it demonstrates |
|---|---|---|
| [`getting_started.py`](examples/python/getting_started.py) | Beginner | Auth check, first query, entity counts by sector |
| [`usage.py`](examples/python/usage.py) | Reference | Every public SDK method demonstrated |
| [`entity_screening.py`](examples/python/entity_screening.py) | Beginner | Screen by sector, SIC code, active vs inactive status |
| [`financial_analysis.py`](examples/python/financial_analysis.py) | Intermediate | Revenue trends, margins, concept normalization, peer comparison |
| [`pit_backtest.py`](examples/python/pit_backtest.py) | Intermediate | Correct PIT discipline, restatement impact, `filing_date` vs `report_date` |
| [`survivorship_bias.py`](examples/python/survivorship_bias.py) | Intermediate | Delisted/bankrupt companies, index_membership, bias quantification |

Run any script standalone:

```bash
VALUEIN_API_KEY=your_token python examples/python/getting_started.py
```

### Jupyter notebooks (`examples/notebooks/`)

| Notebook | Open in Colab |
|---|---|
| [Quickstart](examples/notebooks/quickstart.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/quants/blob/main/examples/notebooks/quickstart.ipynb) |
| [Fundamental Analysis](examples/notebooks/fundamental_analysis.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/quants/blob/main/examples/notebooks/fundamental_analysis.ipynb) |
| [PIT Backtest](examples/notebooks/pit_backtest.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/quants/blob/main/examples/notebooks/pit_backtest.ipynb) |
| [Survivorship Bias](examples/notebooks/survivorship_bias.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/quants/blob/main/examples/notebooks/survivorship_bias.ipynb) |

---

## Use Case 4 — Excel Integration

Pull live SEC fundamental data into Microsoft Excel via Power Query. No Python required.

**Requirements:** Microsoft 365 (build 16.0.17531 or later)

1. Download [`excel/valuein-fundamentals.xlsx`](excel/valuein-fundamentals.xlsx)
2. Open the workbook and enter your API token in the **Connectivity Guide** sheet
3. Click **Refresh All** — data streams directly from Parquet files on Cloudflare R2

The workbook includes 8 pre-configured sheets: Income Statement, Balance Sheet, Cash Flow, Entities, Securities, Filings, Index Membership, and a Data Dictionary.

For DIY Power Query connections, the M-language source files are in [`excel/power-query/`](excel/power-query/).

Full setup walkthrough: [`docs/excel-guide.md`](docs/excel-guide.md)

---

## Use Case 5 — Research & Quality Proofs

16 runnable research modules that prove every data quality claim with code. Designed for institutional due diligence and quantitative research.

```bash
# Install research dependencies
uv sync --group research

# Run a proof
python research/quantitative/pit_correctness_proof.py
python research/quality_proof/balance_sheet_check.py
```

### Research modules

**`research/fundamental/`** — Financial statement analysis workflows
- Income statement, balance sheet, cash flow, DuPont decomposition, Altman Z-Score

**`research/quantitative/`** — Factor model and strategy research
- PIT correctness proof, survivorship bias quantification, restatement tracking as short signal, sector rotation

**`research/data_engineering/`** — XBRL normalization and pipeline analysis
- Concept mapping explorer, taxonomy coverage, filing timeline, data freshness by sector

**`research/quality_proof/`** — Automated data quality validation
- Zero PIT violations, balance sheet equation check (Assets = Liabilities + Equity within 1%), coverage report, SEC cross-reference spot-check

See [`research/README.md`](research/README.md) for a full breakdown of what each module proves and the key metric it validates.

---

## Data Schema

Six tables, 90+ columns, fully documented in [`docs/DATA_CATALOG.xlsx`](docs/DATA_CATALOG.xlsx) and [`docs/schema.json`](docs/schema.json).

| Table | Primary Key | Description |
|---|---|---|
| `entity` | `cik` | Legal company — name, sector, SIC code, fiscal year end, `status` |
| `security` | `id` | Ticker symbols with SCD Type 2 date ranges (`valid_from`, `valid_to`, `is_active`) |
| `filing` | `accession_id` | SEC filing metadata — form type, `filing_date`, `report_date`, `accepted_at` |
| `fact` | `(entity_id, accession_id, concept, period_end, unit)` | Every financial fact from every filing, with `knowledge_at` PIT timestamp |
| `taxonomy_guide` | `standard_concept` | Definitions for 150 canonical concept names |
| `index_membership` | — | Historical index constituent records (S&P 500 entry and exit dates) |

### Key joins

```
security.entity_id        →  entity.cik
filing.entity_id          →  entity.cik
fact.entity_id            →  entity.cik
fact.accession_id         →  filing.accession_id
index_membership.security_id  →  security.id
```

### Standard concept names

Raw XBRL tags (15,000+) are normalized to canonical `standard_concept` values. Use these exact strings:

| Concept | `standard_concept` |
|---|---|
| Revenue / Sales | `'Revenues'` |
| Net Income | `'NetIncomeLoss'` |
| Total Assets | `'Assets'` |
| Gross Profit | `'GrossProfit'` |
| Operating Income | `'OperatingIncomeLoss'` |

Both the raw `concept` tag and the normalized `standard_concept` are on the `fact` table — no join to a separate mapping table needed.

---

## Why Valuein

| | |
|---|---|
| **Point-in-Time (PIT)** | Every fact carries `filing_date` (SEC receipt) and `knowledge_at` (millisecond precision). Filter `filing_date <= trade_date` to eliminate look-ahead bias. Most providers silently overwrite restated numbers — we append every revision. |
| **Survivorship-bias free** | 10,000+ entities including every delisted, bankrupt, and acquired company in the SEC record back to 1990. A backtest on survivors only is not a real backtest. |
| **Standardized concepts** | 15,000+ raw XBRL tags mapped to ~150 canonical `standard_concept` values. One concept name works across every filer, regardless of what tag they filed. |
| **DuckDB SQL** | In-process DuckDB with authenticated Parquet streaming. Queries run in milliseconds — no local downloads required. |
| **39 SQL templates** | Production-ready queries for Altman Z-Score, DuPont, Piotroski F-Score, TTM, FCF, sector screening, restatement signals, PIT backtest engine, and more. |

---

## Documentation

| Document | Description |
|---|---|
| [`docs/METHODOLOGY.md`](docs/METHODOLOGY.md) | Data sourcing, PIT architecture, restatement handling, XBRL normalization logic |
| [`docs/COMPLIANCE_AND_DDQ.md`](docs/COMPLIANCE_AND_DDQ.md) | Data provenance, MNPI policy, PIT integrity, security, SLA summary |
| [`docs/SLA.md`](docs/SLA.md) | Uptime targets, data freshness SLAs, support response times, SLA credits |
| [`docs/excel-guide.md`](docs/excel-guide.md) | Full Excel / Power Query setup walkthrough |
| [`docs/DATA_CATALOG.xlsx`](docs/DATA_CATALOG.xlsx) | All columns, types, definitions, sample values |
| [`docs/schema.json`](docs/schema.json) | Machine-readable JSON schema |
| [`CHANGELOG.md`](CHANGELOG.md) | Full release history |

---

## Contributing

Contributions are welcome — new SQL templates, example scripts, research modules, and documentation improvements.

```bash
git clone https://github.com/valuein/quants.git
cd quants
uv sync --group dev
uv run pytest tests/ -k "not integration"   # all tests pass offline
```

See [`CONTRIBUTING.md`](CONTRIBUTING.md) for code standards, naming conventions, and the PR process.

---

> For research and educational purposes only. Not financial advice.

Apache-2.0 License — see [LICENSE](LICENSE).
