Metadata-Version: 2.4
Name: infon
Version: 0.1.1
Summary: Streaming knowledge-graph reasoner with a typed query layer (multilingual SPLADE + Dempster-Shafer MCTS + cassette persistence).
Author: Infon contributors
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/infon-ai/infon
Project-URL: Documentation, https://github.com/infon-ai/infon/tree/main/docs
Project-URL: Repository, https://github.com/infon-ai/infon
Project-URL: Issues, https://github.com/infon-ai/infon/issues
Keywords: knowledge-graph,information-extraction,splade,sparse-retrieval,dempster-shafer,modal-logic,fastapi,multilingual,reasoning,mcts
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Framework :: FastAPI
Classifier: Natural Language :: English
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch<3,>=2.0
Requires-Dist: transformers<5,>=4.40
Requires-Dist: numpy>=1.24
Requires-Dist: scipy>=1.11
Requires-Dist: requests>=2.31
Requires-Dist: pyarrow>=15
Requires-Dist: fsspec>=2024.1
Requires-Dist: fastcoref>=2.1
Requires-Dist: fastapi>=0.100
Requires-Dist: uvicorn>=0.20
Requires-Dist: httpx>=0.24
Provides-Extra: aws
Requires-Dist: boto3>=1.28; extra == "aws"
Provides-Extra: agent
Requires-Dist: strands-agents[bedrock]>=1.0; extra == "agent"
Provides-Extra: demo
Requires-Dist: strands-agents[bedrock]>=1.0; extra == "demo"
Requires-Dist: ddgs>=6.0; extra == "demo"
Provides-Extra: study
Requires-Dist: boto3>=1.28; extra == "study"
Requires-Dist: scikit-learn>=1.3; extra == "study"
Requires-Dist: matplotlib>=3.7; extra == "study"
Provides-Extra: all
Requires-Dist: boto3>=1.28; extra == "all"
Requires-Dist: scikit-learn>=1.3; extra == "all"
Requires-Dist: matplotlib>=3.7; extra == "all"
Requires-Dist: strands-agents[bedrock]>=1.0; extra == "all"
Requires-Dist: ddgs>=6.0; extra == "all"
Dynamic: license-file

# Infon

[![PyPI](https://img.shields.io/pypi/v/infon.svg)](https://pypi.org/project/infon/)
[![Python](https://img.shields.io/pypi/pyversions/infon.svg)](https://pypi.org/project/infon/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)

**Streaming knowledge-graph reasoner with a typed query layer.** Stream
text in over HTTP, get typed triples back, query them with composable
operators (`covers`, `place_overlaps`, `compose`, `evaluate`, `plan`).
The default deployment is a Redis-style FastAPI service. Persist to a
content-addressed cassette when you outgrow RAM.

The point isn't *"ask a question, get an answer"*. The point is that
the things the question is *about* — a duration, an ongoing interval,
a country, a clause — are first-class records in the index, and the
operators that combine them are pure-Python predicates with no neural
in the loop.

## 30-second hello world

```bash
pip install infon
infon schema init defence -o schema.json
infon serve --schema schema.json &
infon feed   doc_a "BAE Systems delivered six long-endurance UAVs to the RAAF in 2025."
infon flush  doc_a --doc-id press.bae.2025_03 --timestamp 2025-03-12
infon query  candidates --actor bae --top-k 5
```

Or in Python:

```python
from infon import AnchorSchema, Service, CandidateQuery

schema = AnchorSchema.from_file("schema.json")
svc = Service(schema=schema)

svc.feed("doc_a", "Toyota partnered with Panasonic on solid-state batteries.")
svc.feed("doc_a", "The Japanese automaker plans to launch them by 2027.")
rep = svc.flush("doc_a", doc_id="press.toyota.2025_09",
                timestamp="2025-09-01")
# rep.n_infons, rep.infon_ids

rows = svc.query(CandidateQuery(actor="toyota", min_confidence=0.6))
for c in rows:
    inf = svc.store.get(c.hit.loc.infon_id)
    print(c.score, inf.subject, inf.predicate, inf.object)
```

In-process queries return `Candidate(hit, score, provenance)`; hydrate
the Infon via `svc.store.get(...)`. Over HTTP, the wire format is
`CandidateRow(subject, predicate, object, score, ...)` — no hydration
step needed.

Or over the wire, against a live server:

```python
from infon import Client

with Client("http://localhost:8000") as c:
    c.feed("doc_a", "Toyota partnered with Panasonic on solid-state batteries.")
    rep = c.flush("doc_a")
    rows = c.candidates(actor="toyota", min_confidence=0.6, top_k=20)
    verdict = c.connect("toyota", "samsung", max_hops=3)
    counts = c.aggregate("count", group_by="actor")
```

## What's inside

```
              feed text                   query typed predicates
                  │                                │
                  ▼                                ▼
        ┌────────────────────────────────────────────────┐
        │   FastAPI service (Redis-style streaming)      │
        │   POST /feed   /flush   /query   /snapshot     │
        └────────────────────────────────────────────────┘
                              │
                              ▼
        ┌────────────────────────────────────────────────┐
        │   Service kernel (in-process, RAM-only)        │
        │   feed(stream_id, text)  → per-stream buffer   │
        │   flush(stream_id)       → fastcoref+extract   │
        │                              → MemoryStore     │
        └────────────────────────────────────────────────┘
                              │
                              ▼
        ┌────────────────────────────────────────────────┐
        │   Operators over typed records                 │
        │   covers / ongoing_at / bind_event   (temporal)│
        │   matches / filter_to_index_args     (threshold)│
        │   place_overlaps / within_radius     (spatial) │
        │   convert / convert_currency         (quantity)│
        │   evaluate / restrict / compose      (modal)   │
        │   plan / route                       (planner) │
        └────────────────────────────────────────────────┘
```

`feed` and `flush` are two calls because **coreference is a
document-level operation**. Resolving "it" / "the company" against a
single sentence is wrong by design — the actor hasn't been named yet.
Single-sentence coref would be a fallback; we don't ship fallbacks. So
feed buffers, flush runs fastcoref over the buffered window, and only
post-flush sentences become queryable.

## Documentation

| | |
|---|---|
| [Schema authoring](docs/schema.md) | Anchor types, JSON shape, hierarchies, multilingual surfaces. |
| [Operators](docs/operators.md) | Full reference: numeric, temporal, spatial, modality, planner, MCTS, JSON DSL. |
| [Examples](docs/examples.md) | Tested transcripts for every operator family + Service end-to-end. |
| [Multilingual](docs/multilingual.md) | XLM-R SPLADE coverage, when to add explicit anchor tokens, performance. |
| [Deployment](docs/deployment.md) | `infon serve`, env vars, snapshot/reload, S3, production checklist. |
| [CLI](docs/cli.md) | All `infon` subcommands. |

## Try it on a vertical

Six end-to-end Jupyter notebooks, each running the full streaming
pipeline against a real corpus:

| Notebook | Domain |
|---|---|
| [`notebooks/01_supply_chain.ipynb`](notebooks/01_supply_chain.ipynb) | Multi-tier supplier risk (ERP/AIS/customs ingest). |
| [`notebooks/02_legal_contracts.ipynb`](notebooks/02_legal_contracts.ipynb) | Clause refs, defined terms, modality, conditional contexts. |
| [`notebooks/03_defence_industry.ipynb`](notebooks/03_defence_industry.ipynb) | Eight operators per sentence; persona-conditioned MCTS. |
| [`notebooks/04_compliance_regulatory.ipynb`](notebooks/04_compliance_regulatory.ipynb) | Threshold predicates, currency over time, jurisdictions. |
| [`notebooks/05_kano_conjoint.ipynb`](notebooks/05_kano_conjoint.ipynb) | Product voice → Kano + conjoint structural analysis. |
| [`notebooks/06_drug_discovery.ipynb`](notebooks/06_drug_discovery.ipynb) | Bio entities, mechanism-of-action, evidence tiers. |

Run any of them with `pip install 'infon[demo]'` first.

## Multilingual support

The default encoder is multilingual XLM-R SPLADE
([`opensearch-project/opensearch-neural-sparse-encoding-multilingual-v1`](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-multilingual-v1) — Apache-2.0,
~1.1 GB). Latin-script languages and most Cyrillic / Greek work out of
the box. Japanese, Chinese, and Korean need explicit `tokens` per
script in the schema:

```json
{
  "toyota": {
    "type": "actor",
    "tokens": ["toyota", "トヨタ", "丰田", "도요타"],
    "canonical_name": "Toyota Motor Corporation",
    "aliases": ["TMC", "トヨタ自動車", "丰田汽车"]
  }
}
```

See [`docs/multilingual.md`](docs/multilingual.md) for the full story.

## Operators — what we compute

A taste; full reference in [`docs/operators.md`](docs/operators.md).

### Threshold — unit-aware numeric predicates

```python
from infon.threshold import Threshold, matches, filter_to_index_args

t = Threshold(kind="duration", op=">=", value=14, unit_surface="hours")
filter_to_index_args(t)        # {'kind': 'duration', 'value_min': 50400.0}
matches(t, value=50400.0, unit_surface="hours")    # True
matches(t, value=50400.0, unit_surface="km")       # False — wrong dimension
```

### compose / covers / bind_event — temporal inference

```python
from infon.temporal_inference import compose, covers, ongoing_at

refs = [
    TemporalReference(op="in",    start_iso="1982-01-01",
                                  end_iso="1982-12-31"),
    TemporalReference(op="since", start_iso="1982-01-01", end_iso=None),
]
compose(refs)                  # InferredInterval(..., ongoing=True)
covers(refs, "2010-06-15")     # True
ongoing_at(refs, "2026-05-26") # True
```

### Quantity — distribution-aware arithmetic

```python
from infon.quantity import Quantity, FXTable, convert_currency

# Uncertain quantity: "around 1000 kg" → normal(μ=1000, σ=50)
mass = Quantity(Distribution.normal(1000.0, 50.0), KILOGRAM)

# FX is dated. No silent fallback to "today's rate".
fx = FXTable()
fx.add("EUR", "USD", "2026-01-15", 1.10)
revenue_eur = Quantity.from_value(1_000_000.0, currency_unit("EUR"))
convert_currency(revenue_eur, "USD", on="2025-01-01", fx=fx)   # → None
```

### plan — multi-constraint search

```python
from infon import CandidateQuery, NumericPredicate, TemporalPredicate

q = CandidateQuery(
    predicate="produces",
    numeric=(NumericPredicate(kind="duration", value_min=50400.0),),  # ≥14h
    temporal=(TemporalPredicate(mode="overlap", t_start="2023-01-01"),),
    actor_in=("AU", "GB", "JP"),
    actor_not_in=("US",),
    evidentiality="primary",
    min_confidence=0.6,
)
cands = svc.query(q)
```

### connect / any_of — multi-hop reachability

```python
from infon import ConnectQuery, AnyOfQuery

v = svc.query(ConnectQuery(source="toyota", target="samsung", max_hops=3))
# Verdict(label='SUPPORTS',
#         mass=Mass(supports=0.403, refutes=0.000, theta=0.597),
#         sources=[Edge(toyota -supply→ catl),
#                  Edge(catl   -license→ samsung)],
#         n_candidates=2, n_hydrated=2)
```

MCTS over the hypergraph, Dempster–Shafer chain mass, refutation
aware: a refuting edge cancels affirmation on the same triple.

### evaluate — JSON DSL

```python
from infon.logical_tool import evaluate

evaluate(cog, {
    "and": [
        {"triple": {"s": "toyota", "p": "invest", "o": "solid_state"}},
        {"not":   {"triple": {"s": "toyota", "p": "invest", "o": "lithium_ion"}}},
    ]
})
# {'verdict': 'SUPPORTS', 'mass': {...}, 'trace': [...]}
```

`triple` / `and` / `or` / `not` / `if` / `exists` / `forall`. Each
sub-expression returns its own Dempster–Shafer mass plus an audit
trace.

## One sentence, eight operators

> *"Per §3.2(a), the Supplier shall, within 30 days of the Effective
>  Date (as defined in §1.4), deliver to the Buyer's facility within
>  50 km of Canberra a system meeting the endurance specification (≥14
>  hours), provided the system is not manufactured in the United
>  States."*

```
clause-ref §3.2(a)             →  IKLIst("clause:3.2(a)", obligation)
defined-term Effective Date    →  IKLThat("defined:effective_date")
modality "shall"               →  ModalClaim(operator='O', ...)
duration "30 days"             →  bind_event(within, effective_date)
spatial radius "within 50km"   →  PlaceReference(op='within', radius_km=50)
threshold "≥14 hours"          →  Threshold(kind='duration', op='>=', 14h)
conditional "provided"         →  ConditionalContext(antecedent=...)
spatial negation "not in US"   →  actor_not_in=("US",)
```

The planner takes the eight typed records and resolves them in one
call. None of the operators ever re-read the sentence string.

## Persistence — the snapshot upgrade hatch

The streaming service is RAM-only by contract — kill the process,
lose the data. Same shape as `redis-server` without RDB/AOF. To
persist, snapshot the live store:

```python
path = svc.snapshot_to_cassette("./data/run_2026_05_27")
```

Reload any time:

```python
from infon import Service, AnchorSchema
schema = AnchorSchema.from_file("schema.json")
svc = Service.from_cassette("./data/run_2026_05_27", schema=schema)
```

The cassette substrate gives you what RAM doesn't: time-travel
snapshots, S3-backed delta ingest, Kan-pushforward schema migration
(62× faster than re-extracting), and a manifest pruner that skips
7–16× of shard opens at scale. URI swap for cloud:
`InfonStore("s3://bucket/prefix")`.

## Install

```bash
pip install infon                  # core
pip install 'infon[demo]'          # + ddgs (web search) + strands-agents
pip install 'infon[aws]'           # + boto3 + s3fs (cassette on S3)
pip install 'infon[all]'           # everything
```

| Dependency | Purpose | Required? |
|---|---|---|
| `torch` ≥ 2.0 | GNN + SSL losses | yes |
| `transformers` ≥ 4.40 | SPLADE tokenizer/model | yes |
| `numpy` ≥ 1.24 | linear algebra | yes |
| `pyarrow` ≥ 15 | cassette indexes | yes |
| `fsspec` ≥ 2024.1 | local + S3 paths | yes |
| `fastcoref` ≥ 2.1 | coreference | yes |
| `fastapi` ≥ 0.100 | HTTP service | yes |
| `uvicorn` ≥ 0.20 | ASGI server | yes |
| `httpx` | client transport | yes (via fastapi) |
| `s3fs` ≥ 2024.1 | S3 backend | optional (`[aws]`) |
| `strands-agents[bedrock]` ≥ 1.0 | conversational layer | optional (`[agent]`) |
| `ddgs` ≥ 6.0 | live web search | optional (`[demo]`) |

The default anchor encoder downloads from Hugging Face on first use
and caches under `~/.cache/huggingface`. There is no bundled fallback
— missing network on first run fails loudly.

## Public API

The supported surface is everything in `infon.__all__`:

```python
from infon import (
    # Core
    AnchorSchema, ServiceConfig, InfonConfig,
    Infon, Edge, Constraint, Span, QueryResult,

    # Streaming kernel + transport
    Service, FlushReport, IngestReport, ResearchPlan,
    create_app,
    Client, AsyncClient,
    FlushReceipt, CandidateRow, Mass, InfonRow, Verdict,
    ResearchPlanResult,

    # Typed queries
    CandidateQuery,
    NumericPredicate, TemporalPredicate, PlacePredicate,
    SequencePredicate, FrequencyPredicate,
    ConnectQuery, AnyOfQuery, AggregateQuery,

    # Persistence
    InfonStore, MemoryStore, Query,

    # Encoder (lazy — torch loads on first attribute access)
    Encoder,

    # Errors
    InfonError, ConfigError, SchemaError, ExtractionError,
    RoutingError, SnapshotError, DimensionError, LogicalExprError,
)
```

Heavy deps (torch, transformers, fastcoref) load lazily — `import
infon` is cheap until you reach for an operator that needs the real
models.

## References

Bodnar et al. 2022 (Neural Sheaf Diffusion) · Schlichtkrull et al.
2018 (R-GCN) · Shafer 1976 (Dempster–Shafer) · Barwise & Perry 1983
(situation semantics) · Kan 1958 (adjoint functors for schema
migration) · Formal, Piwowarski & Clinchant 2021 (SPLADE).

### Encoder attribution

Infon's default encoder is the multilingual SPLADE checkpoint
published by the OpenSearch Project:

```bibtex
@misc{opensearch2024multilingualsparse,
  author = {OpenSearch Project},
  title  = {Neural Sparse Encoding Multilingual v1},
  year   = {2024},
  url    = {https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-multilingual-v1}
}
```

Released under Apache-2.0. Users embedding Infon in commercial
products should still verify the upstream model card for their
deployment context.

## License

Apache 2.0 — see [LICENSE](LICENSE).
