Metadata-Version: 2.4
Name: valuein-sdk
Version: 0.3.3
Summary: Official Python SDK for the Valuein US Core Fundamentals dataset — SEC EDGAR financials via API.
Project-URL: Homepage, https://valuein.biz
Project-URL: Documentation, https://valuein.biz/docs
Project-URL: Repository, https://github.com/valuein/quants
Project-URL: Bug Tracker, https://github.com/valuein/quants/issues
Project-URL: Changelog, https://github.com/valuein/quants/blob/main/CHANGELOG.md
Author-email: Valuein <support@valuein.biz>
License: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: EDGAR,SEC,backtesting,duckdb,finance,fundamentals,parquet,quant
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Office/Business :: Financial :: Investment
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: duckdb>=1.1.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: python-dotenv>=1.0.0
Provides-Extra: research
Requires-Dist: ipykernel>=6.29.0; extra == 'research'
Requires-Dist: jupytext>=1.16.0; extra == 'research'
Requires-Dist: matplotlib>=3.8.0; extra == 'research'
Requires-Dist: numpy>=1.26.0; extra == 'research'
Provides-Extra: test
Requires-Dist: pytest-cov>=4.1.0; extra == 'test'
Requires-Dist: pytest>=9.0.0; extra == 'test'
Description-Content-Type: text/markdown

![Valuein](https://valuein.biz/valuein/linkedin-rounded.png)

[![PyPI version](https://img.shields.io/pypi/v/valuein-sdk)](https://pypi.org/project/valuein-sdk/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://pypi.org/project/valuein-sdk/)
[![License](https://img.shields.io/badge/license-Apache%202.0-green)](https://github.com/valuein/quants/blob/main/LICENSE)
[![CI](https://github.com/valuein/quants/actions/workflows/publish.yml/badge.svg)](https://github.com/valuein/quants/actions/workflows/publish.yml)

# Valuein US Core Fundamentals

**Institutional-grade SEC financial data. Point-in-Time accurate. Survivorship-bias free.**

Standardized fundamentals from SEC EDGAR (10-K, 10-Q, 8-K, 20-F). 10,000+ active and delisted entities. ~105M facts. Coverage from 1990 to present. Query with DuckDB SQL — no downloads, no local database.

---

## Install & first query

```bash
pip install valuein-sdk
export VALUEIN_API_KEY="your_token"   # get one at valuein.biz
```

```python
from valuein_sdk import ValueinClient

client = ValueinClient(tables=["security", "filing", "fact"])

df = client.query("""
    SELECT fa.fiscal_year, round(fa.numeric_value / 1e9, 2) AS revenue_bn
    FROM   fact fa
    JOIN   filing   f ON fa.accession_id = f.accession_id
    JOIN   security s ON f.entity_id     = s.entity_id
    WHERE  s.symbol = 'NVDA' AND s.is_active = TRUE
      AND  f.form_type = '10-K' AND fa.standard_concept = 'TotalRevenue'
      AND  fa.fiscal_period = 'FY'
    QUALIFY row_number() OVER (PARTITION BY fa.fiscal_year ORDER BY fa.period_end DESC) = 1
    ORDER  BY fa.fiscal_year DESC LIMIT 5
""")
print(df)
#    fiscal_year  revenue_bn
# 0         2026      215.94
# 1         2025      130.50
# 2         2024       60.92
# 3         2023       26.97
# 4         2022       26.91
```

---

## Why Valuein

| | |
|---|---|
| **Point-in-Time (PIT)** | Every fact has `filing_date` (SEC receipt date) and `knowledge_at` (millisecond precision). Filter `filing_date <= trade_date` to eliminate look-ahead bias. Most providers silently overwrite restated numbers — we append every revision. |
| **Survivorship-bias free** | 10,000+ entities including every delisted, bankrupt, and acquired company in the SEC record. A strategy back-tested on survivors only is not a real back-test. |
| **Standardized concepts** | 15,000+ raw XBRL tags mapped to ~150 canonical `standard_concept` values via waterfall priority logic. One concept name works across every filer, regardless of what tag they chose. |
| **DuckDB SQL** | An in-process DuckDB connection with authenticated access. Your SQL queries run in milliseconds against Parquet files — no local downloads required. |
| **40+ SQL templates** | Production-ready queries for Altman Z-score, DuPont decomposition, Piotroski F-score, TTM, FCF, sector screening, restatement signals, PIT backtest engine, and more. |

---

## Installation

```bash
# pip
pip install valuein-sdk

# uv (recommended)
uv add valuein-sdk

# with research extras (numpy, matplotlib, Jupyter)
pip install "valuein-sdk[research]"
```

**Authentication** — set your token in any of these ways:

```bash
# environment variable (recommended for scripts)
export VALUEIN_API_KEY="your_token"

# .env file in your project root (auto-loaded by the SDK)
echo 'VALUEIN_API_KEY=your_token' >> .env
```

```python
# or pass directly
client = ValueinClient(api_key="your_token")
```

Get a token at **[valuein.biz](https://valuein.biz)**.

---

## Core examples

### 1 — Load only what you need

```python
# Instant auth check — no tables downloaded
client = ValueinClient(tables=[])
print(client.me())      # {'plan': 'sp500', 'status': 'active', 'email': '...'}
print(client.manifest()) # {'last_updated': '...', 'tables': [...]}

# Load specific tables — much faster than loading all 7
client = ValueinClient(tables=["entity", "security"])
```

**Always pass `tables=`**. The `fact` table is large; load it only when you need financials.

---

### 2 — Ticker lookup and company metadata

```python
client = ValueinClient(tables=["entity", "security"])

df = client.query("""
    SELECT e.cik, e.name, e.sector, e.industry, e.status,
           s.symbol, s.exchange
    FROM   security s
    JOIN   entity   e ON s.entity_id = e.cik
    WHERE  s.symbol = 'AAPL' AND s.is_active = TRUE
""")
```

Use `is_active = TRUE` for current tickers. Historical and delisted companies have `is_active = FALSE`.

---

### 3 — Point-in-Time backtesting

The most important pattern. **Always filter by `filing_date`, not `report_date`.**

```python
client = ValueinClient(tables=["security", "filing", "fact"])

TRADE_DATE = "2024-01-15"

# CORRECT — only data the market had on the trade date
df = client.query(f"""
    SELECT fa.standard_concept, fa.fiscal_year,
           f.filing_date, round(fa.numeric_value / 1e9, 2) AS value_bn
    FROM   fact    fa
    JOIN   filing  f  ON fa.accession_id = f.accession_id
    JOIN   security s ON f.entity_id     = s.entity_id
    WHERE  s.symbol            = 'NVDA'
      AND  s.is_active         = TRUE
      AND  fa.standard_concept IN ('TotalRevenue', 'NetIncome')
      AND  f.form_type         = '10-K'
      AND  f.filing_date      <= '{TRADE_DATE}'   -- ← PIT gate
    ORDER  BY f.filing_date DESC
""")
```

| Column | Use for |
|--------|---------|
| `period_end` / `report_date` | Aligning to fiscal calendar |
| `filing_date` | **PIT backtest filter** — when the SEC received the filing |
| `knowledge_at` | Millisecond-precision PIT for intraday signal research |

> Filtering by `report_date` instead of `filing_date` leaks future data into your backtest. Apple's Q3 2023 ended Sep 30 but was filed Nov 3 — using `report_date` gives you 34 days of look-ahead.

---

### 4 — Annual financial metrics (deduplication)

10-K filings include comparative prior-year figures. Use `QUALIFY` to keep only the current fiscal year row.

```python
client = ValueinClient(tables=["security", "filing", "fact"])

df = client.query("""
    SELECT
        s.symbol,
        fa.fiscal_year,
        round(fa.numeric_value / 1e9, 2) AS revenue_bn
    FROM   fact    fa
    JOIN   filing  f  ON fa.accession_id = f.accession_id
    JOIN   security s ON f.entity_id     = s.entity_id
    WHERE  s.symbol            IN ('NVDA', 'TSLA', 'AMZN')
      AND  s.is_active         = TRUE
      AND  f.form_type         = '10-K'
      AND  fa.standard_concept = 'TotalRevenue'
      AND  fa.fiscal_period    = 'FY'
    QUALIFY row_number() OVER (
        PARTITION BY s.symbol, fa.fiscal_year
        ORDER BY fa.period_end DESC          -- latest period = current year
    ) = 1
    ORDER BY fa.fiscal_year DESC, s.symbol
""")
```

---

### 5 — Cross-sectional screening

```python
client = ValueinClient(tables=["entity", "security", "filing", "fact"])

# All companies with revenue > $10B in their most recent 10-K
df = client.query("""
    WITH latest AS (
        SELECT entity_id, accession_id,
               row_number() OVER (PARTITION BY entity_id ORDER BY filing_date DESC) AS rn
        FROM filing WHERE form_type = '10-K'
    )
    SELECT e.name, e.sector::VARCHAR AS sector,
           round(fa.numeric_value / 1e9, 1) AS revenue_bn
    FROM   latest  l
    JOIN   entity  e  ON l.entity_id    = e.cik
    JOIN   fact    fa ON fa.accession_id = l.accession_id
    WHERE  l.rn = 1
      AND  fa.standard_concept = 'TotalRevenue'
      AND  fa.numeric_value    > 10e9
    QUALIFY row_number() OVER (PARTITION BY e.cik ORDER BY fa.period_end DESC) = 1
    ORDER  BY revenue_bn DESC
    LIMIT  20
""")
```

---

### 6 — Survivorship-bias free universe

Most providers delete companies that went bankrupt or were delisted. We keep every one.

```python
client = ValueinClient(tables=["entity"])

# Count active vs inactive — see the full picture
df = client.query("""
    SELECT status, count(*) AS companies,
           round(100.0 * count(*) / sum(count(*)) OVER (), 1) AS pct
    FROM entity GROUP BY status ORDER BY companies DESC
""")

# Financial data for inactive companies works exactly the same
# as for active ones — just remove the is_active filter
df_inactive = client.query("""
    SELECT e.name, e.status, fa.fiscal_year,
           round(fa.numeric_value / 1e9, 2) AS revenue_bn
    FROM   fact    fa
    JOIN   filing  f ON fa.accession_id = f.accession_id
    JOIN   entity  e ON f.entity_id     = e.cik
    WHERE  e.status IN ('INACTIVE', 'DELISTED')
      AND  fa.standard_concept = 'TotalRevenue'
      AND  fa.fiscal_period    = 'FY'
    QUALIFY row_number() OVER (PARTITION BY e.cik, fa.fiscal_year ORDER BY fa.period_end DESC) = 1
    ORDER  BY e.name, fa.fiscal_year DESC
    LIMIT  30
""")
```

---

### 7 — Pre-built SQL templates

40+ production-ready templates. Run by name, pass parameters.

```python
client = ValueinClient(tables=["entity", "security", "filing", "fact"])

df = client.run_template(
    "01_fundamentals_by_ticker",
    ticker="NVDA",
    form_types=["10-K"],
    metrics=["TotalRevenue", "NetIncome", "OperatingIncome"],
    start_date="2020-01-01",
    end_date="2026-01-01",
)

# List all available templates
print(client.query_cache.keys())
```

**Template catalogue:**

| Template | What it does |
|----------|-------------|
| `01_fundamentals_by_ticker` | Core financials for a single ticker |
| `02_peer_group_comparison` | Side-by-side multi-ticker comparison |
| `04_altman_z_score_inputs` | Inputs for Altman Z-score distress model |
| `05_piotroski_f_score_inputs` | Inputs for Piotroski F-score quality screen |
| `07_trailing_twelve_months_ttm` | TTM revenue/income from quarterly filings |
| `09_free_cash_flow` | FCF = operating cash flow − capex |
| `15_earnings_quality_accruals_anomaly` | Accruals-based earnings quality signal |
| `18_true_point_in_time_backtest_engine` | Full PIT engine keyed on `knowledge_at` |
| `19_cross_sector_dupont_roe_breakdown` | DuPont decomposition across sectors |
| `06_restatement_negative_revision_alpha` | Short signal from negative restatements |

See all 40+ templates in [`valuein_sdk/queries/`](valuein_sdk/queries/).

---

## Data schema

### Tables

| Table | Primary key | Description |
|-------|-------------|-------------|
| `entity` | `cik` | Legal company entity — name, sector, SIC code, status |
| `security` | `id` | Ticker symbols with SCD Type 2 date ranges (`valid_from`, `valid_to`, `is_active`) |
| `filing` | `accession_id` | SEC filing metadata — form type, `filing_date`, `report_date` |
| `fact` | `(entity_id, accession_id, concept, period_end, unit)` | Every financial fact from every filing |
| `taxonomy_guide` | `standard_concept` | Human-readable definitions for 150+ standard concepts |
| `concept_mapping` | `xbrl_tag` | Maps 15,000+ raw XBRL tags → `standard_concept` |
| `index_membership` | — | Historical S&P 500 and index constituent records |

### Key joins

```
security.entity_id  →  entity.cik
filing.entity_id    →  entity.cik
fact.entity_id      →  entity.cik
fact.accession_id   →  filing.accession_id
```

### The `fact` table — critical columns

| Column | Description |
|--------|-------------|
| `standard_concept` | Canonical metric name — use this, not `concept` |
| `numeric_value` | The number (USD, shares, ratio, etc.) |
| `unit` | `USD`, `shares`, `pure`, etc. |
| `fiscal_year` / `fiscal_period` | e.g. `2024` / `FY`, `Q1`, `Q2` |
| `period_start` / `period_end` | Reporting period boundaries |
| `filing_date` | When the SEC received the filing (**use for PIT**) |
| `knowledge_at` | Millisecond-precision public knowledge timestamp |
| `statement_type` | `income`, `balance`, `cashflow`, `notes` |

### Common `standard_concept` values

| Category | Concepts |
|----------|----------|
| Income | `TotalRevenue`, `GrossProfit`, `OperatingIncome`, `NetIncome`, `EPS_Diluted` |
| Balance sheet | `TotalAssets`, `TotalLiabilities`, `StockholdersEquity`, `CurrentAssets` |
| Cash flow | `OperatingCashFlow`, `InvestingCashFlow`, `FinancingCashFlow` |
| Efficiency | `ResearchAndDevelopment`, `CostOfRevenue` |

---

## API reference

```python
client = ValueinClient(
    api_key="...",       # defaults to VALUEIN_API_KEY env var
    gateway_url="...",   # override for local dev only
    tables=["entity"],   # load specific tables (omit to load all)
)

client.query(sql)                    # DuckDB SQL → pandas DataFrame
client.get(table)                    # Download full table → pandas DataFrame
client.run_template(name, **kwargs)  # Named SQL template → pandas DataFrame
client.tables()                      # List loaded table names
client.me()                          # Token: plan, status, email
client.manifest()                    # Snapshot: last_updated, tables
```

### Exception hierarchy

```python
from valuein_sdk import (
    ValueinError,          # base — catch-all
    ValueinAuthError,      # 401/403 — invalid, expired, or revoked token
    ValueinPlanError,      # 403 — resource requires Full Dataset plan
    ValueinNotFoundError,  # 404 — table or path not found
    ValueinRateLimitError, # 429 — includes .retry_after (seconds)
    ValueinAPIError,       # 5xx — gateway server error
)

try:
    df = client.get("fact")
except ValueinPlanError:
    print("Upgrade to Full Dataset at valuein.biz/pricing")
except ValueinAuthError:
    print("Check your API key at valuein.biz/portal")
```

The transport layer retries automatically (3× with exponential backoff) on 429, 502, 503, 504.

---

## Examples

### Notebooks [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/quants/blob/main/examples/notebooks/quickstart.ipynb)

| Notebook | Description | Colab |
|----------|-------------|-------|
| [`quickstart.ipynb`](examples/notebooks/quickstart.ipynb) | Auth, table loading, first financial query | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/quants/blob/main/examples/notebooks/quickstart.ipynb) |
| [`fundamental_analysis.ipynb`](examples/notebooks/fundamental_analysis.ipynb) | Mag 7 revenue, income statement, concept normalisation, $10B screen | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/quants/blob/main/examples/notebooks/fundamental_analysis.ipynb) |
| [`pit_backtest.ipynb`](examples/notebooks/pit_backtest.ipynb) | PIT methodology deep-dive, restatement history, correct vs incorrect patterns | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/quants/blob/main/examples/notebooks/pit_backtest.ipynb) |
| [`survivorship_bias.ipynb`](examples/notebooks/survivorship_bias.ipynb) | Bankrupt and delisted company data — scale, sector failure rates | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/quants/blob/main/examples/notebooks/survivorship_bias.ipynb) |

### Python scripts

| Script | Tables used |
|--------|-------------|
| [`getting_started.py`](examples/getting_started.py) | `entity`, `security` |
| [`usage.py`](examples/usage.py) | All — full SDK walkthrough |
| [`01_entity_screening.py`](examples/01_entity_screening.py) | `entity`, `security` |
| [`02_financial_analysis.py`](examples/02_financial_analysis.py) | `entity`, `security`, `filing`, `fact` |
| [`03_pit_backtest.py`](examples/03_pit_backtest.py) | `security`, `filing`, `fact` |
| [`04_survivorship_bias.py`](examples/04_survivorship_bias.py) | `entity`, `security`, `filing`, `fact` |

---

## Plans

| Plan | Coverage | Price |
|------|----------|-------|
| **S&P 500** | Full history for S&P 500 constituents, all tables | Free (register at valuein.biz) |
| **Full Dataset** | All 10,000+ entities from 1990, active + inactive | Subscription — [valuein.biz/pricing](https://valuein.biz/pricing) |

```python
# Check your current plan
print(client.me())
# {'plan': 'sp500', 'status': 'active', 'email': 'you@example.com'}
```

---

## Contributing / data issues

- **Data quality report:** [open a ticket](https://github.com/valuein/quants/issues/new)
- **SDK bug:** [open an issue](https://github.com/valuein/quants/issues)
- **Questions:** [support@valuein.biz](mailto:support@valuein.biz)

---

> For research and educational purposes only. Not financial advice.

Apache-2.0 License — see [LICENSE](LICENSE).
