Metadata-Version: 2.4
Name: yoktez
Version: 0.1.2
Summary: Typed Python client for the National Thesis Center of Turkey.
Project-URL: homepage, https://github.com/ozefe/yoktez
Project-URL: source, https://github.com/ozefe/yoktez
Project-URL: issues, https://github.com/ozefe/yoktez/issues
Author-email: Efe Özyay <hi@efe.cv>
License-Expression: MIT
License-File: LICENSE
Keywords: tez,thesis,turkey,yoktez
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Natural Language :: Turkish
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows :: Windows 11
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.14
Requires-Dist: beautifulsoup4<5,>=4.12
Requires-Dist: httpx<1,>=0.28
Requires-Dist: lxml<7,>=5.3
Description-Content-Type: text/markdown

# yoktez

<img alt="yoktez mascot generated by Google's Nano Banana 2" align="right" src=".github/mascot.png" width="200" />

Typed Python client for the [National Thesis Center of Turkey](https://tez.yok.gov.tr/UlusalTezMerkezi/).

`yoktez` wraps the YOK NTC JSP/AJAX surface behind a single synchronous `Client` with frozen-dataclass return types, a deterministic exception hierarchy, and bilingual-aware fields. Built for application and CLI developers who need a typed surface and a small install footprint without writing bespoke scraping code for each project.

## Installation

```bash
pip install yoktez
```

Requires Python 3.14+.

## Quickstart

```python
"""End-to-end yoktez quickstart: search -> metadata -> assets.

Demonstrates the typical three-call flow without writing files to disk.

Run with: `python examples/quickstart.py`
"""

from yoktez import AssetStatus, Client

_QUERY = "yapay zeka"


with Client() as client:
    results = client.search.simple(_QUERY)
    print(f"{results.total} matches for {_QUERY!r}")

    thesis = results[0]
    print(f"  title:   {thesis.title}")
    print(f"  author:  {thesis.author}")
    print(f"  year:    {thesis.year}")
    print(f"  keys:    {thesis.registration_no} / {thesis.thesis_no}")

    metadata = client.metadata.get(thesis)
    print(f"  advisor: {metadata.supervisor}")
    if metadata.affiliation is not None:
        print(f"  uni:     {metadata.affiliation.university}")
    if metadata.keywords is not None:
        print(f"  tags:    {len(metadata.keywords)} keywords")

    assets = client.assets.get(thesis)
    print(f"  status:  {assets.status.name}")
    if assets.status is AssetStatus.AVAILABLE:
        print(f"  pdf_key: {assets.pdf_key}")
```

Sample output:

```text
6841 matches for 'yapay zeka'
  title:   Kimya eğitiminde yapay zekâ araştırmalarına ilişkin bir meta-sentez çalışması
  author:  MURAT EBUBEKİR YAYLA
  year:    2026
  keys:    nslbSyAODG1_FIruL8qUAA / THvIvDpZXvJIiHZpuqpKVw
  advisor: PROF. DR. MUSA ÜCE
  uni:     MARMARA ÜNİVERSİTESİ
  tags:    5 keywords
  status:  AVAILABLE
  pdf_key: 5T1_CZ5-UGb9QCmoURec4AbpuuyvqUeed_1PcCh_6DVZ4b1fbX7Gcu-DQFLIcE11
```

## Features

- **Four search modes:** `simple`, `advanced`, `detail`, and `recent` from a single `client.search` namespace, all returning a sliceable `SearchResults` carrying the database-wide match total alongside the result window.
- **Structured metadata:** `client.metadata.get(thesis)` returns a typed `ThesisMetadata` with bilingual keywords (`Bilingual(raw, tr, en)`), a tiered `Affiliation`, and pre-formatted citation strings (APA / IEEE / MLA / Chicago / Harvard).
- **Two-step asset download:** `client.assets.get(thesis)` resolves to one of `AVAILABLE` / `UNDER_EMBARGO` / `NO_PERMIT` / `PREPARING` before any bytes move; the available branch exposes a `pdf_key` (and optional `appendix_key`) to feed `download_pdf` / `download_appendix`.
- **Catalog lookups:** `client.lookups` covers universities (TR / INT), institutes, divisions, subjects, departments, sections, and keywords, with per-instance memoization and an explicit `refresh()`.
- **Typed value objects:** every returned record is a `@dataclass(frozen=True, slots=True)`; values are immutable, hashable where field types allow, and ship with `py.typed` for downstream type checkers.
- **Sync-only, thread-friendly:** no `async`/`await` surface; the recommended concurrency pattern is one `Client` per thread.
- **Small dependency surface:** `httpx`, `beautifulsoup4`, and `lxml`. No Rust core, no auth, no hidden state.

## Usage

All snippets assume `with Client() as client:` for deterministic cleanup of the underlying HTTP connection pool.

### Search

Simple search by free text, optionally narrowed to a single field:

```python
from yoktez import Client, SearchField

with Client() as client:
    results = client.search.simple("yapay zeka", field=SearchField.ABSTRACT)

    print(f"{results.total} matches")
    for thesis in results[:5]:
        print(thesis.year, thesis.title)
```

Advanced search joins up to three terms with boolean operators:

```python
from yoktez import AdvancedOperator, Client, MatchType

with Client() as client:
    results = client.search.advanced(
        "sosyal",
        term2="medya",
        op1=AdvancedOperator.AND,
        match=MatchType.INCLUDES,
    )
```

Detail search accepts the full filter surface; enum-shaped parameters also accept the member name as a string or the raw int code:

```python
from yoktez import Client, ThesisType

with Client() as client:
    unis = client.lookups.universities()
    results = client.search.detail(
        university=unis[0],
        year_min=2020,
        year_max=2025,
        degree_type=ThesisType.MASTER,  # also accepts "MASTER" or 1
    )
```

Recently added theses (server-fixed 15-day window):

```python
from yoktez import Client

with Client() as client:
    results = client.search.recent()
```

### Metadata

```python
from yoktez import Client

with Client() as client:
    thesis = client.search.simple("makine öğrenmesi")[0]
    metadata = client.metadata.get(thesis)

    if metadata.affiliation is not None:
        print(metadata.affiliation.university)
    if metadata.keywords:
        print(metadata.keywords[0].tr, "=", metadata.keywords[0].en)
    if metadata.references is not None:
        print(metadata.references.apa)
```

### Assets (two-step download)

```python
from yoktez import AssetStatus, Client

with Client() as client:
    thesis = client.search.simple("yapay zeka")[0]
    assets = client.assets.get(thesis)

    if assets.status is AssetStatus.AVAILABLE and assets.pdf_key is not None:
        client.assets.download_pdf(assets.pdf_key, "thesis.pdf")

        if assets.appendix_key is not None:
            client.assets.download_appendix(assets.appendix_key, "thesis-ek.rar")
```

`download_pdf` and `download_appendix` accept a filesystem path (`Path` or `str`, opened and closed for you) or a pre-opened binary file-like (written to but not closed — ownership stays with the caller).

### Lookups

```python
from yoktez import Client, UniversitySource

with Client() as client:
    unis = client.lookups.universities(UniversitySource.TR)
    institutes = client.lookups.institutes(unis[0])
    divisions = client.lookups.divisions(unis[0], institutes[0])

    # Bulk catalogs; keywords() also accepts group / language / first_letter / search.
    keywords = client.lookups.all_keywords()
```

Every `client.lookups.*` call is memoized on the `Client` instance. Call `client.lookups.refresh()` to clear the cache if YOKSIS IDs are suspected to have rotated.

### HTTP client configuration

`Client` accepts keyword-only overrides for the underlying `httpx.Client`:

```python
from yoktez import Client

with Client(timeout=60, retries=5, user_agent="my-app/1.0") as client:
    ...
```

For full control, inject a pre-built `httpx.Client` via `http_client=`. Ownership stays with the caller; `Client.close()` is a no-op for an injected client:

```python
import httpx
from yoktez import Client

http = httpx.Client(timeout=30.0, follow_redirects=True)
try:
    with Client(http_client=http) as client:
        ...
finally:
    http.close()
```

## Concurrency

`yoktez.Client` is single-threaded by design — share one per thread, never across threads. The library ships no concurrency primitives; threading strategy is the caller's choice.

## Design principles

- **Synchronous-only API:** Sync is sufficient for YOK NTC's IO patterns; an async surface would double the API and complicate testing for no proven benefit. Concurrency strategy belongs to the caller, and `examples/multithreaded_pool.py` demonstrates the one-`Client`-per-thread pattern.
- **Frozen-dataclass value objects:** Every returned record is `@dataclass(frozen=True, slots=True)`. Stdlib-only, immutable, hashable, and very fast.
- **Coerce-on-input enum handling:** Enum-shaped parameters accept the matching `Enum` member, its name (e.g., `"MASTER"`), or its raw int code; the raw-`int` passthrough tolerates new YOK NTC codes the library hasn't yet enumerated, so wire-side additions don't gate a release.
- **Two-step download flow:** `client.assets.get(...)` resolves status first; `download_pdf` and `download_appendix` run only on the available branch. Honest to the underlying YOK NTC flow, and lets callers inspect embargo dates and appendix availability before committing to a second request.
- **Hierarchical logger naming:** Every sub-package logs under `yoktez.<concern>` (`yoktez.http`, `yoktez.search`, `yoktez.lookups`, `yoktez.assets`). Operators can silence the high-volume HTTP DEBUG channel while preserving the rarer parser WARNING channels; a single `logging.getLogger("yoktez").setLevel(...)` still catches every child through parent propagation.

## Limitations

`yoktez` is intentionally narrow. The following are out of scope and will not land in this package:

- **No async API:** Synchronous code throughout; no `async def`, no asyncio surface.
- **No multi-threaded helper functions:** Concurrency strategy is the caller's choice.
- **No authentication or login flows (e-Devlet):** Anonymous public-data access only; features requiring login (favorites, history) are excluded.
- **No bypassing access restrictions:** Embargoed and no-permit theses surface their state via `AssetStatus` and the matching exception types; the library does not attempt to circumvent these.
- **No data hosting or mirroring:** The library fetches on demand; no bundled snapshots of the YOK NTC database.
- **No CLI shipped from this package:** A separate package may add one later — out of scope here.

## License

MIT — see [`LICENSE`](LICENSE).
