Metadata-Version: 2.4
Name: valuein-sdk
Version: 3.5.1
Summary: Official Python SDK for the Valuein US Core Fundamentals dataset — SEC EDGAR financials via API.
Project-URL: Homepage, https://valuein.biz
Project-URL: Documentation, https://valuein.biz/docs
Project-URL: Repository, https://github.com/valuein/sdk
Project-URL: Bug Tracker, https://github.com/valuein/sdk/issues
Project-URL: Changelog, https://github.com/valuein/sdk/blob/main/CHANGELOG.md
Author-email: Valuein <support@valuein.biz>
License: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: EDGAR,SEC,backtesting,duckdb,finance,fundamentals,parquet,quant
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Office/Business :: Financial :: Investment
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: duckdb>=1.1.0
Requires-Dist: httpx[http2]>=0.27.0
Requires-Dist: jinja2>=3.1.6
Requires-Dist: pandas>=2.0.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Provides-Extra: polars
Requires-Dist: polars>=0.20.0; extra == 'polars'
Provides-Extra: research
Requires-Dist: ipykernel>=6.29.0; extra == 'research'
Requires-Dist: jupytext>=1.16.0; extra == 'research'
Requires-Dist: matplotlib>=3.8.0; extra == 'research'
Requires-Dist: numpy>=1.26.0; extra == 'research'
Provides-Extra: test
Requires-Dist: pytest-cov>=4.1.0; extra == 'test'
Requires-Dist: pytest>=9.0.0; extra == 'test'
Description-Content-Type: text/markdown

[![Valuein Logo](https://www.valuein.biz/valuein/twitter-rounded.png)](https://valuein.biz)

[![PyPI version](https://img.shields.io/pypi/v/valuein-sdk?cacheSeconds=300)](https://pypi.org/project/valuein-sdk/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://pypi.org/project/valuein-sdk/)
[![License](https://img.shields.io/badge/license-Apache%202.0-green)](LICENSE)
[![CI](https://github.com/valuein/sdk/actions/workflows/sdk-ci.yml/badge.svg)](https://github.com/valuein/sdk/actions/workflows/sdk-ci.yml)
[![PyPI downloads](https://img.shields.io/pypi/dm/valuein-sdk?color=blue)](https://pypi.org/project/valuein-sdk/)

# 💎 Valuein Python SDK: Frictionless Financial Data
A high-performance toolkit for querying point-in-time US fundamentals from SEC EDGAR, built for quants, analysts, and data engineers.

The Valuein SDK, is a complete infrastructure solution for consuming point-in-time accurate US Core financial fundamentals (facts) on your daily workflow. Whether you are building complex asynchronous Python pipelines or executing templated SQL, this library provides frictionless, zero-setup access to institutional-grade data.

## The Data Engine
Powered by survivorship-bias-free data containing 12M+ filings and 105M+ facts from 10-Ks, 10-Qs, 8-Ks, 20-Fs, 40-Fs, and amendments across 19,000+ active and delisted US companies since 1993 (the SEC EDGAR electronic-filing floor).

> **🔜 Coming soon — semantic search over filing narratives (ETA 2026-Q3).** Risk Factors, MD&A, Business, Legal Proceedings, and Controls & Procedures sections from every 10-K / 10-Q / 20-F are being extracted as iXBRL TextBlocks and indexed into Cloudflare Vectorize. The `search_filing_text` MCP tool and the `filing_text` SDK table activate automatically once the backfill lands. See [Semantic Search (preview)](#-semantic-search-preview--table-filing_text).

## Why use this toolkit?

⚡ Lightning-Fast Python SDK: Execute blazing-fast queries against remote Parquet files hosted on R2, powered entirely by DuckDB under the hood. No database setup, no massive local downloads.

🛠️ Plug-and-Play SQL Templates: Skip the boilerplate. Use our pre-built SQL templates to immediately extract insights, calculate intrinsic values, or model standardized financial statements.

📚 Comprehensive Context: Deep-dive documentation mapping out table schemas, primary keys, and field definitions to support your specific financial research use cases.

## 🚀 Why Valuein Data and SDK Library
> Easy of use and intelligence.

| Feature                               | Benefit                                                    |
|---------------------------------------|-----------------------------------------------------------|
| 🕒 **<span title="Provides historical snapshots for accurate backtesting">Point‑in‑Time Data</span>** | Eliminate look‑ahead bias in backtests |
| ⚖️ **<span title="Includes companies that went bankrupt, delisted, or were acquired">Survivorship‑Bias Free</span>** | Includes bankrupt, delisted, and acquired firms |
| 📊 **<span title="Maps ~11,966 raw XBRL tags to ~292 standard financial concepts">Standardized Concepts</span>** | ~11,966 raw XBRL tags mapped to ~292 canonical financial concepts |
| 🚀 **<span title="High-speed in-memory SQL engine using DuckDB">DuckDB SQL Engine</span>** | Millisecond analytics directly in Python |
| ☁️ **<span title="Stream Parquet data from cloud storage without downloading locally">Cloud Parquet Streaming</span>** | No local data downloads required |
| 🧩 **<span title="Ready-to-use financial templates for backtesting and signals">Financial Templates</span>** | Production‑ready investment signals |


## 🧠 What You Can Do With This Repository

| Use Case | Who | Where to Start |
|---|---|---|
| **Query financial data via Python** | Quants, data engineers | [Quickstart](#-quickstart) |
| **Call every SDK method** | AI agents, integrators | [API Reference](#-api-reference) |
| **Run 54 pre-built financial signals** | Analysts, quants | [SQL Templates](#-template-categories) |
| **Learn with interactive notebooks** | Students, new users | [Python Examples](#-python-examples) |
| **Prove data quality to stakeholders** | Institutional buyers, compliance | [Research & Quality Proofs](#-research--quality-proofs) |
| **Read methodology and compliance docs** | Due diligence, enterprise | [Documentation](#-documentation) |
| **Contribute templates, examples, research** | Open-source contributors | [Contributing](#-contributing) |

---

## ⚡ Quickstart

**1. Install Package**

Pick the workflow you already use — both work, no extra setup:

``` bash
# Option A — pip (universal, ships with Python)
python -m venv .venv && source .venv/bin/activate
pip install valuein-sdk
```

``` bash
# Option B — uv (10–100× faster; install from https://docs.astral.sh/uv/)
uv venv && source .venv/bin/activate
uv pip install valuein-sdk
```

### Run the script — no token required

Install and query real data immediately, no registration needed for the SAMPLE dataset (S&P500, last 5 years). The SDK and the edge gateway both fall back to sample mode automatically when `VALUEIN_API_KEY` is unset, so the same code runs against sample today and against your full plan tomorrow with zero changes:

```python
from valuein_sdk import ValueinClient

client = ValueinClient()                            # load all tables by default
print("Me: ", client.me())                          # dict: plan, status, email, createdAt
print("Manifest: ", client.manifest())              # dict: snapshot, last_updated, tables (cached 5 min)
print("Tables: ", client.tables())                  # list of loaded table names
print("Fact Schema: ", client.get_schema("fact"))  # dict: column → DuckDB type for any loaded table

sql_query = "SELECT COUNT(cik) FROM entity"
result_df = client.run_query(sql_query)         # SQL → pandas DataFrame
print(f"Query Results: {result_df}")
```

Add a token at any time to unlock the full dataset — no code changes needed.

## 🔑 2. Get Your API Token

| Data Plan                  | Coverage                                                                                                                                                          | Price             | Get Access                                       |
|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|--------------------------------------------------|
| **Sample**                 | S&P 500 universe, last 5 years<br>Active & delisted companies                                                                                                     | **Free**          | No registration                                  |
| **S&P 500 (Free)**         | Full S&P 500 universe, full history (1993 → present)<br>Active & delisted companies                                                                               | **Free**          | [Register](https://valuein.biz/signup/free)      |
| **Pro**                    | Full US universe (19,000+ entities, active + delisted)<br>15-year rolling point-in-time window (2011 → present)<br>10-K, 10-Q, 8-K, 20-F + amendments<br>Individual / single-seat license, no redistribution | **$49 / mo** · **$490 / yr** | [Subscribe](https://valuein.biz/checkout?tier=pro) |
| **Institutional**          | Everything in Pro **plus** the smart-money dataset (Forms 3/4/5/144 + 13F/13D/13G), foreign issuers, full history back to 1993, intraday `accepted_at`, filing-event webhooks, commercial redistribution license, business-hours SLA | **$499 / mo** · **$4,790 / yr** | [Subscribe](https://valuein.biz/checkout?tier=full) |
| **Enterprise** (custom)    | Dedicated infrastructure, zero-retention option, sub-minute filing push, multi-seat team access, white-label resale, bespoke SLA                                  | **Talk to sales** | [Contact](https://valuein.biz/enterprise)        |


## 🔐 3. Set Your API Token

```bash
# optional — sample tier works without a key
echo 'VALUEIN_API_KEY="your_token"' >> .env
```

## ▶️ 4. Production-ready code

The ValueinClient handles authentication, table discovery, and local caching in a high-performance DuckDB instance.

The Recommended Way For Production is the Context Manager block/pattern because it ensures that temporary files and database connections are closed automatically, even if your script crashes.

```python
from valuein_sdk import ValueinClient, ValueinError

# Two-level try/except is intentional:
#   outer = init errors (auth, manifest fetch, gateway 503 at __enter__)
#   inner = per-query errors (rate-limit, plan denial, bad SQL, validation)

try:
    with ValueinClient() as client:
        try:
            # 1) Build & run a raw SQL query → pandas DataFrame
            sql = "SELECT COUNT(cik) FROM entity"
            result_df = client.run_query(sql)
            print(result_df)

            # 2) Run a named SQL template with kwargs (the SDK quotes safely)
            df = client.run_template(
                "fundamentals_by_ticker",
                ticker="AAPL",
                start_date="2020-01-01",
                end_date="2024-01-01",
                form_types=["10-K", "10-Q"],
                metrics=["TotalRevenue", "NetIncome", "OperatingCashFlow"],
            )
            print(df)
        except ValueinError as ve:
            print(f"Query failed: {ve}")
except Exception as e:
    print(f"Initialization failed: {e}")
```

---

## 🧰 API Reference

Everything below is importable from the top-level package (`from valuein_sdk import ...`)
and fully type-hinted with `help()`-ready docstrings — so an LLM reading the
signatures can call it correctly on the first try.

### `ValueinClient(as_of=None, api_key=None, gateway_url=None, tables=None, config=None)`

| Method | Returns | What it does |
|---|---|---|
| **Querying** | | |
| `run_query(sql)` | `DataFrame` | Run read-only DuckDB SQL (SELECT / WITH / EXPLAIN only). PIT-filtered via the views. |
| `run_template(name, **params)` | `DataFrame` | Run a named SQL template from `queries/` with validated, injection-safe params. |
| `read_table(table, *, columns=None, limit=None)` | `DataFrame` | **Recommended** whole-table read — PIT-safe, loads on demand, optional column/row subset. |
| `get(table)` | `DataFrame` | Lower-level: download a table's raw Parquet (does **not** apply the `as_of` filter). |
| `to_arrow(sql)` | `pyarrow.Table` | Zero-copy Arrow output (Spark/Polars hand-off). |
| `to_polars(sql)` | `polars.DataFrame` | Zero-copy Polars output (needs `pip install valuein-sdk[polars]`). |
| `stream(sql, batch_size=10_000)` | `Iterator[DataFrame]` | Chunked iteration for results too large to materialize. |
| **Typed helpers** | | |
| `factor_scores(ticker=None, sector=None, min_composite_rank=None, limit=100)` | `DataFrame` | Cross-sectional factor scores + percentile ranks. |
| `earnings_signals(ticker=None, min_surprise_pct=None, direction=None, limit=100)` | `DataFrame` | EPS trend / surprise + YoY revenue change. |
| `pit_universe(as_of_date, index="SP500", as_of_basis="effective")` | `DataFrame` | Survivorship-free index members on a historical date. |
| **Introspection** | | |
| `me()` | `dict` | Plan, status, email for the current token. |
| `manifest()` / `refresh_manifest()` | `dict` | Snapshot metadata (5-min cache / force-refresh). |
| `health()` | `dict` | Gateway health. |
| `tables()` | `list[str]` | Loaded table names. |
| `list_templates()` | `list[str]` | Every runnable SQL template name (pass to `run_template`). |
| `get_schema(table)` | `dict[str, str]` | Column → DuckDB type for any known table. |
| **Document generation** *(Pro+ — proxies to the MCP server)* | | |
| `generate_dcf_xlsx(ticker, *, as_of_date=None, depth="standard")` | `dict` | DCF workbook → 15-min presigned download URL. |
| `generate_research_brief_docx(ticker, *, as_of_date=None, depth="full", peers=None)` | `dict` | Institutional research brief `.docx`. |
| `generate_comps_xlsx(ticker, *, peers=None, as_of_date=None)` | `dict` | Peer-comparables workbook. |
| **Lifecycle** | | |
| `close()` / `with ValueinClient() as c:` | — | Release DuckDB + HTTP resources (use the context manager). |

### Alpha framework

```python
from valuein_sdk import AlphaEngine, ROE, GROSS_MARGIN, CURRENT_RATIO

with ValueinClient() as client:
    result = (
        AlphaEngine(client)
        .add_factors(ROE, GROSS_MARGIN, CURRENT_RATIO)
        .compute(as_of="2024-01-01")
    )
    top = result.rank().combine()   # composite cross-sectional score
    print(top.head())
```

Built-ins: `ROE`, `GROSS_MARGIN`, `OPERATING_MARGIN`, `NET_PROFIT_MARGIN`,
`REVENUE_GROWTH_YOY`, `FCF_TO_ASSETS`, `DEBT_TO_EQUITY`, `ASSET_TURNOVER`,
`CURRENT_RATIO`, `PIOTROSKI_F_SCORE` (all in `BUILTIN_FACTORS`). Define your own
with `AlphaFactor`.

### One token, five channels

Your Stripe-issued Bearer token unlocks every Valuein surface at your tier — no
per-channel billing:

| Channel | How | Best for |
|---|---|---|
| **Python SDK** (this package) | `pip install valuein-sdk` | Heavy out-of-core DuckDB compute |
| **MCP server** | `mcp.valuein.biz/mcp` | AI agents (Claude, Cursor, Codex) — reasoning + 50+ tools |
| **Bulk Data API** | `GET data.valuein.biz/v1/{plan}/{table}` | Direct HTTP / partner integrations (raw Parquet streams) |
| **Web dashboard / Workspace** | [valuein.biz](https://valuein.biz) | Browser analysis, theses, watchlists, reports |

The SDK is the **Muscle** (compute); the MCP server is the **Nervous System**
(reasoning). The `generate_*` methods above are the bridge — they let SDK power
users reach the MCP's document-generation tools without writing a JSON-RPC client.

---

# 🗂️ Data Schema
> The authoritative schema is embedded in each tier's R2 `manifest.json` (written by `data-pipeline/run_exports.py`). The SDK fetches it at init and exposes the active version via `client.get_schema(table)` — there is no bundled schema file in this package. Use `print(client.get_schema("fact"))` to inspect column types live.

Current snapshot: **schema v2.10.0**, **18 tables**. The exact column list for any table is live in the manifest — run `client.get_schema("fact")`.

| Table              | Description                   | Records |
|-------------------|-------------------------------|---------|
| **references**     | **Start here.** Flat join of entity + security (one row per security) with `cik`, `symbol`, `name`, `sector`, `industry`, `sic_code`, `is_active`, FIGI. For index membership (current OR historical) JOIN `index_membership` on `cik = cik`. There is no `is_sp500` flag — it was dropped 2026-05-02 because it was snapshot-only and single-index. | 19K+ |
| **entity**         | Company metadata. `business_address` + `mailing_address`, `sic_description`, `state_of_incorporation_description`, `country_code`, `is_foreign`, `flags`, `has_insider_transactions`, `is_insider_owner`, `former_names` (JSON). | 19K+    |
| **security**       | Ticker history (SCD Type 2). `is_primary_ticker` for multi-share-class issuers (BRK-A/BRK-B, GOOG/GOOGL). | 19K+    |
| **filing**         | Filing metadata since 1993. `is_xbrl_numeric` flags filings whose XBRL exhibit carries numeric facts; `superseded_by` chains the amendment lineage; `core_type` strips `/A` for form-family filtering. | 12M+    |
| **fact**           | Standardized financial facts. Bloomberg Option-C restatement columns (`value_current`, `value_as_filed`, `first_filed_at`, `restated`), quality columns (`confidence_score` — filter `>= 0.95` for Bloomberg-grade; `reliability_code` 1–4), `accounting_standard` (US-GAAP/IFRS), `reporting_currency`, and PIT semantics via `accepted_at`. | 105M+   |
| **valuation**      | Pre-computed intrinsic values per (entity_id, valuation_date, model_type). `model_type` rows coexist: `dcf` / `dcf_fcf` / `ddm`. Recomputed each run — **not** PIT-filtered (use `created_at`). | 500K+   |
| **ratio**          | Pipeline-computed financial ratios per entity per fiscal period (`ratio_name`, `category`, `value`, `is_ttm`). Recomputed each run; freshness via `computed_at`. | 10M+    |
| **standard_concept** | Curated gold-standard concept catalog — the dictionary for `fact.standard_concept`. Carries `level`, `statement_type`, `definition`, `unit_default`, `bloomberg_equivalent`, `factset_equivalent`. The `cpa_verified_concepts` template surfaces CPA-reviewed concepts via `review_confidence` as that field rolls out (v3.4.0). | ~292 |
| **taxonomy_guide** | Raw SEC us-gaap tag reference for the `fact.concept` column (`human_name`, `definition`, `balance_type`, `level`). | ~11,966 |
| **index_membership** | Survivorship-free index constituents — **SP500, RUSSELL1000, RUSSELL2000, RUSSELL3000**. Keyed on **`cik`** (since migration 0015 — same column name as `references.cik`). `effective_date` / `removal_date` use `[)` interval semantics; carries `announcement_date`, `removal_reason`, `successor_cik`, `source`, `confidence`. | 4 indices |
| **factor_scores** | Cross-sectional factor scores + percentile ranks (10 factors + `composite_rank`) from recent annual filings. | 1M+    |
| **earnings_signals** | Trailing earnings-trend estimate + EPS surprise %, plus YoY revenue change. | 500K+   |
| **insider_party · insider_filing · insider_transaction · institutional_filing · institutional_holding · insider_ownership** | Smart-money dataset — Forms 3/4/5/144 + 13F-HR + SC 13D/13G. **Institutional / `full` plan only.** | FULL tier |
| **filing_text** 🔜 | 10-K / 10-Q / 20-F narrative chunks (Risk Factors, MD&A, …) from iXBRL TextBlocks — source of the `search_filing_text` Vectorize index. **Preview, ETA 2026-Q3** (not yet in the warehouse). See [Semantic Search (preview)](#-semantic-search-preview--table-filing_text). | preview |


### 🔗 Key Joins

```
references.cik                  →  entity.cik  (references is the fast entry point)
security.entity_id              →  entity.cik
filing.entity_id                →  entity.cik
fact.entity_id                  →  entity.cik
fact.accession_id               →  filing.accession_id
index_membership.cik            →  entity.cik   (same column name on both sides
                                                 since migration 0015 — JOIN as
                                                 references.cik = im.cik)
```

### 🎯 Survivorship-free PIT universe (the canonical backtest pattern)

```sql
-- All SP500 members on a historical as_of_date, resolved to active ticker.
-- index_membership keys on `cik` (since pipeline migration 0015) — same column
-- name as references.cik / entity.cik, so the join read straight across.
SELECT m.cik, s.id AS security_id, s.symbol AS ticker_at_date,
       e.name AS company_name, m.effective_date, m.removal_date, m.confidence
FROM index_membership m
JOIN entity e ON e.cik = m.cik
LEFT JOIN security s ON s.entity_id = m.cik
    AND s.is_primary_ticker = TRUE
    AND $as_of_date >= s.valid_from
    AND $as_of_date <  COALESCE(s.valid_to, '9999-12-31'::DATE)
WHERE m.index_name = 'SP500'
  AND $as_of_date >= m.effective_date
  AND $as_of_date <  COALESCE(m.removal_date, '9999-12-31'::DATE);
```

`LEFT JOIN security` so delisted companies still surface
(`ticker_at_date = NULL`).  `is_primary_ticker = TRUE` pins
multi-share-class issuers to one row per CIK — universe count stays at
~500 even when BRK-A and BRK-B both exist in `security`.  `[)`
semantics — a company removed on 2017-06-19 is NOT a member ON
2017-06-19.

### ⚡ DuckDB Query Patterns

Three patterns that eliminate redundant joins and scans on every cross-company query:

**1. `references` replaces the entity + security join; index_membership stays separate**
```sql
-- Filter current S&P 500 tech companies (membership lives in index_membership).
-- The previous references.is_sp500 flag was dropped 2026-05-02 — both
-- snapshot-only and single-index, two footguns avoided by JOIN-on-membership.
SELECT r.symbol, r.name, r.sector
FROM   "references" r
JOIN   index_membership im ON im.cik = r.cik
WHERE  im.index_name   = 'SP500'
  AND  im.removal_date IS NULL
  AND  r.sector ILIKE '%technology%'
  AND  r.is_active     = TRUE
```

**2. `LATERAL` for the latest filing per company**
```sql
JOIN LATERAL (
  SELECT accession_id, filing_date
  FROM   filing
  WHERE  entity_id = r.cik AND form_type = '10-K'
  ORDER  BY filing_date DESC
  LIMIT  1
) f ON true
```

**3. Pivot multiple concepts in one `fact` scan**
```sql
-- Debt + equity in one pass — no self-join
SELECT
  MAX(CASE WHEN standard_concept = 'LongTermDebt'       THEN numeric_value END) AS debt,
  MAX(CASE WHEN standard_concept = 'StockholdersEquity' THEN numeric_value END) AS equity
FROM fact WHERE standard_concept IN ('LongTermDebt', 'StockholdersEquity')
GROUP BY accession_id
```

> For quarterly cash flow metrics, use `COALESCE(derived_quarterly_value, numeric_value)` — Q2/Q3 10-Qs report YTD; this column isolates the single quarter.

See [`valuein_sdk/queries/SQL_CHEATSHEET.md`](valuein_sdk/queries/SQL_CHEATSHEET.md) for 8 complete patterns including FCF screens, PIT backtesting, and restatement auditing.

---

### 🏷️ Standard Concept Names

> [!Note]
> Each `fact` row carries **both** the raw XBRL tag (`concept`, ~11,966 distinct us-gaap tags — see `taxonomy_guide`) **and** the normalized `standard_concept` (~292 canonical concepts — see `standard_concept`). Tags that don't map to a canonical concept fall through to the `Other` bucket, so no fact is dropped.
>
> Because both columns live on `fact`, you never need to join a separate mapping table to filter by either the raw tag or the canonical concept.


### 📅 Date Columns Reference

| Column | Table | Use for |
|---|---|---|
| `report_date` / `period_end` | `filing` / `fact` | Aligning to fiscal calendar |
| `filing_date` | `filing` | **PIT backtest filter** — when the SEC received the filing |
| `accepted_at` | `fact` | Millisecond-precision PIT for intraday signal research |


### 🧩 Template Categories

**54 templates** ship in `valuein_sdk/queries/`, auto-discovered by filename. Run any of them with `client.run_template("<name>", ...)`:

| Category | Templates (examples) |
|---|---|
| Data Access | `fundamentals_by_ticker`, `figi_to_fundamentals_mapping`, `peer_group_comparison`, `survivorship_bias_free_screen` |
| Income Statement | `revenue_yoy_growth`, `trailing_twelve_months_ttm`, `margin_analysis`, `free_cash_flow`, `rnd_intensity` |
| Balance Sheet | `liquidity_ratios`, `solvency_debt_to_equity`, `interest_coverage`, `efficiency_cash_conversion`, `capex_to_revenue` |
| Investment Scores | `dupont_analysis_inputs`, `piotroski_f_score_inputs`, `altman_z_score_inputs`, `earnings_quality_accruals_anomaly` |
| Valuation & Screening | `sector_relative_valuation_outperformers`, `financial_ratios_screener`, `shareholder_dilution`, `factor_screen_top_quintile` |
| Smart-money & Ownership *(FULL tier)* | `insider_buys`, `blockholders`, `top_institutional_holders`, `manager_portfolio` |
| Event & Short Signals | `late_reporter_short_signal`, `restatement_history`, `8k_material_event_signal`, `ghost_company_screener` |
| Advanced Analytics | `true_point_in_time_backtest_engine`, `time_series_outlier_detection_zscore`, `seasonal_frame_based_extraction`, `cpa_verified_concepts` |

See [`valuein_sdk/queries/SQL_CHEATSHEET.md`](valuein_sdk/queries/SQL_CHEATSHEET.md) for the full reference with copy-paste DuckDB patterns.


## 🔎 Semantic Search (Preview) — table `filing_text`

> **🔜 Preview — ETA 2026-Q3.** The serving path is live but the vector index is
> still backfilling, so `search_filing_text` currently returns a structured
> `FEATURE_NOT_AVAILABLE` (501) and the `filing_text` table is not yet in the
> warehouse manifest. Both activate automatically once the backfill lands — no
> code change needed on your side. The SDK already ships the `filing_text_by_ticker`
> template and the `FilingTextRow` model so your integration is ready on day one.
> Until then, use `get_sec_filing_links` (MCP) to fetch a filing's EDGAR index and
> read the section directly, or `verify_fact_lineage` to trace a number to its source.

When it ships, `filing_text` will be a chunk-level table of narrative text pulled
from every 10-K / 10-Q / 20-F's iXBRL TextBlock concepts: Risk Factors, MD&A,
Business, Legal Proceedings, Controls & Procedures. Each chunk is ≤ 1,800
characters and reassembles into the full section via
`(accession_id, section, chunk_no)`.

**Two ways to use it (once live)**:

**1. Semantic search via MCP — no embedding work required.**
The Valuein MCP server at `mcp.valuein.biz` exposes a `search_filing_text`
tool that embeds your query into a Vectorize index and returns top-K passages
with ticker, accession, section, and a citation URL:

```text
# Natural language from any MCP client (Claude, Cursor, agents)
> Companies discussing AI capex pressure in their most recent 10-K
```

No SDK code needed — the agent calls the MCP tool directly. (Phrase the query
as a statement, not a question — it improves recall.)

**2. Raw chunks via SDK for your own retrieval pipeline.**

```python
from valuein_sdk import ValueinClient

client = ValueinClient(tables=["references", "filing_text"])

# Pull every Risk Factors chunk for Apple since 2020
df = client.run_template(
    "filing_text_by_ticker",
    ticker="AAPL",
    section="risk_factors",
    start_date="2020-01-01",
)

# Reassemble the full section
full_text = " ".join(
    df[df["chunk_no"].astype(int).between(0, int(df["total_chunks"].iloc[0]) - 1)]
      .sort_values("chunk_no")["text"]
      .tolist()
)
print(full_text)
```

Typed wrappers are available via `FilingTextRow` in `valuein_sdk.models`.

**PIT contract**: `filing_text.accepted_at` will equal the filing's SEC
`acceptedDateTime`, same as `fact.accepted_at`, so the SDK's client-side
`as_of` filter applies to it automatically — no look-ahead bias.


## 📚 Documentation

| Document | Description | Where |
|----------|------------|--------|
| Methodology | Data sourcing, PIT architecture, restatement handling, XBRL normalization | [valuein.biz/docs](https://valuein.biz/docs) · [public hub](https://github.com/valuein/valuein/blob/main/docs/methodology.md) |
| Compliance & DDQ | Data provenance, MNPI policy, PIT integrity, security, SLA summary | [public hub](https://github.com/valuein/valuein) |
| SLA | Uptime targets, data freshness SLAs, support response times | [valuein.biz/sla](https://valuein.biz/sla) |
| Data Catalog | Every column, type, definition, sample value | [public hub `DATA_CATALOG.xlsx`](https://github.com/valuein/valuein/blob/main/docs/DATA_CATALOG.xlsx) |
| Live schema | Machine-readable schema for the active snapshot | `client.get_schema("<table>")` — read from the R2 manifest at runtime |

---

## 🐍 Python Examples
Standalone Python scripts and eleven Jupyter notebooks, designed to go from install to insight in under 3 minutes.

### Ticker lookup example
Run any SQL against the data lake. No downloads. No local database. DuckDB executes your queries in-process.

```python
from valuein_sdk import ValueinClient

client = ValueinClient(tables=["entity", "security"])
# This client only fetch these 2 tables, making it faster!

df = client.run_query("""
    SELECT e.cik, e.name, e.sector, e.status,
           s.symbol, s.exchange
    FROM   security s
    JOIN   entity   e ON s.entity_id = e.cik
    WHERE  s.symbol = 'AAPL' AND s.is_active = TRUE
""")
print(df)
```

You are now querying **SEC financial statements directly from the
cloud**.


### Python scripts (`examples/python/`)

| Script | Level | What it demonstrates |
|---|---|---|
| [`getting_started.py`](examples/python/getting_started.py) | Beginner | Auth check, first query, entity counts by sector |
| [`usage.py`](examples/python/usage.py) | Reference | Every public SDK method demonstrated end to end |
| [`production-ready.py`](examples/python/production-ready.py) | Reference | Context-manager pattern, config, full error hierarchy |
| [`entity_screening.py`](examples/python/entity_screening.py) | Beginner | Screen by sector, SIC code, active vs inactive status |
| [`financial_analysis.py`](examples/python/financial_analysis.py) | Intermediate | Revenue trends, margins, concept normalization, peer comparison |
| [`pit_backtest.py`](examples/python/pit_backtest.py) | Intermediate | Correct PIT discipline, restatement impact, `filing_date` vs `report_date` |
| [`survivorship_bias.py`](examples/python/survivorship_bias.py) | Intermediate | Delisted/bankrupt companies, index_membership, bias quantification |

### Jupyter notebooks (`notebooks/`)

| Notebook | Open in Colab |
|---|---|
| [`01_quickstart`](notebooks/01_quickstart.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/sdk/blob/main/notebooks/01_quickstart.ipynb) |
| [`02_dcf_valuation`](notebooks/02_dcf_valuation.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/sdk/blob/main/notebooks/02_dcf_valuation.ipynb) |
| [`03_piotroski_screen`](notebooks/03_piotroski_screen.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/sdk/blob/main/notebooks/03_piotroski_screen.ipynb) |
| [`04_earnings_quality`](notebooks/04_earnings_quality.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/sdk/blob/main/notebooks/04_earnings_quality.ipynb) |
| [`05_factor_backtest`](notebooks/05_factor_backtest.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/sdk/blob/main/notebooks/05_factor_backtest.ipynb) |
| [`06_sector_comparison`](notebooks/06_sector_comparison.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/sdk/blob/main/notebooks/06_sector_comparison.ipynb) |
| [`07_restatement_alpha`](notebooks/07_restatement_alpha.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/sdk/blob/main/notebooks/07_restatement_alpha.ipynb) |
| [`08_capital_allocation`](notebooks/08_capital_allocation.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/sdk/blob/main/notebooks/08_capital_allocation.ipynb) |
| [`09_filing_delay`](notebooks/09_filing_delay.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/sdk/blob/main/notebooks/09_filing_delay.ipynb) |
| [`10_survivorship_demo`](notebooks/10_survivorship_demo.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/sdk/blob/main/notebooks/10_survivorship_demo.ipynb) |
| [`11_risk_factors_semantic_demo`](notebooks/11_risk_factors_semantic_demo.ipynb) 🔜 | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valuein/sdk/blob/main/notebooks/11_risk_factors_semantic_demo.ipynb) |

---

### 🛡️ Error Handling

```python
from valuein_sdk import (
    ValueinAuthError,      # HTTP 401/403 — invalid or expired token
    ValueinPlanError,      # HTTP 403 — endpoint requires a higher plan
    ValueinNotFoundError,  # HTTP 404 — no table found 
    ValueinRateLimitError, # HTTP 429 — includes .retry_after (seconds)
    ValueinAPIError,       # HTTP 5xx — includes .status_code
    ValueinClient
)

client = None
try:
    client = ValueinClient()
    df = client.run_query("SELECT * FROM fact LIMIT 1000000")
except ValueinAuthError:
    print("Check your VALUEIN_API_KEY. It might be expired or invalid.")
except ValueinPlanError:
    print("This requires a higher-tier plan. Upgrade at valuein.biz.")
except ValueinRateLimitError as e:
    print(f"Slow down! Retry allowed in {e.retry_after}s.")
except ValueinNotFoundError as e:
    print(f"That table or endpoint doesn't exist: {e}")
except ConnectionError as e:
    print(f"Physical network issue: {e}")
except ValueinAPIError as e:
    print(f"The Gateway is having a bad day (Status {e.status_code}).")
except Exception as e:
    print(f"Non-SDK error (Python/Logic): {e}")
finally:
    # Always close manually if not using a context manager and if a client was created.
    if client is not None:
        client.close()
```

---

## 🔬 Research & Quality Proofs

The eleven [notebooks](#jupyter-notebooks-notebooks) double as runnable due-diligence
proofs — PIT correctness, survivorship-bias quantification, restatement alpha, and
balance-sheet identity checks — all against live data. Install the research extras
to run them locally:

```bash
# pip
pip install "valuein-sdk[research]"
# uv (dev workflow)
uv sync --group research
```

For institutional due-diligence material (methodology, compliance/DDQ, SLA, and the
full reproducible research suite) see the open-source hub at
[github.com/valuein/valuein](https://github.com/valuein/valuein).

---

## 🤝 Contributing

We welcome contributions including SQL templates, notebooks, scripts, research modules, and documentation improvements.

See [CONTRIBUTING.md](CONTRIBUTING.md) for code standards, naming conventions, and the PR process.

## 📄 License

Apache-2.0 License — see [LICENSE](LICENSE).

**Disclosure:** This repository is for research and educational purposes
only and does not constitute financial advice.
