Metadata-Version: 2.4
Name: ataturk-mcp
Version: 0.1.2
Summary: An MCP server that exposes the corpus of Atatürk's speeches, statements, telegrams and proclamations (1906-1938) for researchers worldwide. Created by bugraayan.com.
Project-URL: Homepage, https://bugraayan.com
Project-URL: Repository, https://github.com/bugraayan/ataturk-mcp
Project-URL: Issues, https://github.com/bugraayan/ataturk-mcp/issues
Project-URL: Author, https://bugraayan.com
Author-email: Buğra Ayan <bugraayan.com@gmail.com>
Maintainer-email: Buğra Ayan <bugraayan.com@gmail.com>
License: MIT
License-File: AUTHORS.md
License-File: LICENSE
Keywords: ataturk,bugraayan,history,mcp,nutuk,speeches,turkish-republic
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Natural Language :: Turkish
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Education
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.10
Requires-Dist: fastmcp>=2.0.0
Requires-Dist: pydantic>=2.5.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.6.0; extra == 'dev'
Provides-Extra: etl
Requires-Dist: beautifulsoup4>=4.12.0; extra == 'etl'
Requires-Dist: httpx>=0.27.0; extra == 'etl'
Requires-Dist: lxml>=5.0.0; extra == 'etl'
Requires-Dist: pdfplumber>=0.11.0; extra == 'etl'
Requires-Dist: rich>=13.0.0; extra == 'etl'
Description-Content-Type: text/markdown

# Atatürk MCP

> Created by **[Buğra Ayan — bugraayan.com](https://bugraayan.com)**
>
> *Tüm dünyadan Atatürk üzerine araştırma yapanlar için açık kaynak bir köprü.*

A [Model Context Protocol](https://modelcontextprotocol.io/) server that exposes the
complete corpus of **Mustafa Kemal Atatürk's speeches, statements, telegrams and
proclamations (1906-1938)** to any MCP-aware AI client (Claude Desktop, Cursor, Cline,
Windsurf, …).

Built for researchers, historians, journalists and students anywhere in the world who
want to ask LLMs questions like:

- *"What did Atatürk say about women's rights in 1923?"*
- *"Find quotes about education and modernisation."*
- *"Show me his opening address to the Grand National Assembly on 1 March 1922."*
- *"Compare the 1927 Nutuk's treatment of the War of Independence with his 1933 10th
  Year Speech."*

## Corpus

| Source | Coverage | Format | Speeches |
| --- | --- | --- | --- |
| [ATAM — Atatürk Araştırma Merkezi](https://atam.gov.tr/) "Söylev ve Demeçleri" Cilt I-III (2024 edition) | 1906-1938, **the definitive corpus** | PDF | 366 |
| [Vikikaynak](https://tr.wikisource.org/wiki/Kategori:Mustafa_Kemal_Atat%C3%BCrk) | Individual speeches, telegrams, all TBMM opening addresses (1920-1938) | HTML | 45 |
| [Internet Archive — Nutuk (English)](https://archive.org/details/361505859-a-speech-delivered-by-mustafa-kemal-ataturk-1927-aka-nutuk-in-english-aka-the-great-speech) | The 1927 *Nutuk* in English | PDF / DjVu | 20 chapters |
| **Total** | 1906-1938 | SQLite + FTS5 | **411 speeches + Nutuk** |

Atatürk passed away in 1938; his words are in the public domain. Editorial
annotations from the ATAM edition are *not* redistributed — only the speech bodies
themselves, with a `source_ref` pointing back to the corresponding ATAM volume and
page number for academic citation.

## Quick start

### 1. Install

From PyPI (recommended for end users):

```bash
pip install ataturk-mcp                # runtime only
pip install "ataturk-mcp[etl]"         # also includes the ETL scripts
```

From source (for hacking on the ETL):

```bash
git clone https://github.com/bugraayan/ataturk-mcp.git
cd ataturk-mcp
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[etl]"
```

### 2. Build the database (one-off, ~1 minute)

```bash
python scripts/fetch_atam.py            # ~10 MB of PDFs from atam.gov.tr
python scripts/fetch_wikisource.py      # MediaWiki API, ~45 pages
python scripts/fetch_nutuk_en.py        # ~2 MB DjVu text from Internet Archive
python scripts/build_db.py              # produces data/speeches.db (~11 MB)
```

Or, if you only want the core corpus:

```bash
python scripts/fetch_atam.py
python scripts/build_db.py --skip-wikisource --skip-nutuk
```

A pre-built `speeches.db` is also published on the GitHub Releases page so end
users do not need to run the ETL themselves.

### 3. Run the MCP server

```bash
ataturk-mcp                  # stdio transport (default, used by all MCP clients)
# or
fastmcp run src/ataturk_mcp/server.py:mcp
# or
python -m ataturk_mcp.server
```

## Connecting to MCP clients

### Claude Desktop

Edit `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or the
equivalent on your platform and add:

```json
{
  "mcpServers": {
    "ataturk": {
      "command": "/absolute/path/to/.venv/bin/ataturk-mcp"
    }
  }
}
```

If your database lives somewhere other than the repo, set the path explicitly:

```json
{
  "mcpServers": {
    "ataturk": {
      "command": "/absolute/path/to/.venv/bin/ataturk-mcp",
      "env": {
        "ATATURK_MCP_DB": "/path/to/speeches.db"
      }
    }
  }
}
```

Restart Claude Desktop; the hammer icon now exposes the Atatürk tools.

### Cursor

Edit `~/.cursor/mcp.json` (or the project-level `.cursor/mcp.json`):

```json
{
  "mcpServers": {
    "ataturk": {
      "command": "/absolute/path/to/.venv/bin/ataturk-mcp"
    }
  }
}
```

### Cline / Continue / any MCP host

Use the same `command` line with whichever JSON the host expects. The server speaks
stdio by default and follows the MCP 2024-11-05 spec.

## Tools exposed

| Tool | Purpose |
| --- | --- |
| `search_speeches(query, lang, year_from, year_to, kind, limit)` | Full-text BM25 search over the entire corpus, with snippets. Diacritic-insensitive; supports FTS5 operators (`AND`, `OR`, `NEAR`, `"phrases"`, `prefix*`). |
| `get_speech(speech_id, lang)` | Return one speech in full, in Turkish or English. |
| `list_speeches(year, year_from, year_to, kind, source, limit, offset)` | Browse the corpus chronologically. |
| `random_speech(lang, kind)` | Pick a random speech (useful for daily-quote agents). |
| `list_topics()` / `speeches_by_topic(topic_id, limit)` | Topical browsing (when ATAM Konular İndeksi is loaded). |
| `nutuk_search(query, lang, limit)` | Search within Nutuk (1927). |
| `nutuk_chapter(chapter, lang)` | Return one Nutuk chapter (1-20). |
| `cite(speech_id)` | Generate APA / MLA / Chicago citations. |
| `corpus_stats()` | Summary statistics about the corpus. |

### Resources

| URI | Description |
| --- | --- |
| `ataturk://speech/{speech_id}` | Plain-text rendering of a single speech with header. |
| `ataturk://nutuk/{chapter}/{lang}` | One Nutuk chapter. |
| `ataturk://corpus/stats` | Statistics as JSON. |

### Prompts

| Prompt | Use |
| --- | --- |
| `analyze_speech(speech_id)` | Scholarly analysis template (context, rhetoric, themes, citation). |
| `find_quote(theme, n_quotes)` | Theme-based quote hunter across the corpus. |

## Development

```bash
pip install -e ".[etl,dev]"
pytest -q
ruff check .
```

The test suite uses an in-memory seeded SQLite DB and FastMCP's in-process client,
so it runs in under a second and does **not** require the production DB to be
built.

## Architecture

```
ATAM PDFs ─┐
Vikikaynak ─┼─► ETL scripts ─► SQLite (FTS5) ─► FastMCP stdio server ─► AI clients
Nutuk EN ──┘                  speeches.db
```

- ETL (`scripts/`) is fully decoupled from the server (`src/ataturk_mcp/`).
- The server opens the DB read-only and is safe to run in parallel from multiple
  clients.
- Turkish search quality: FTS5 with `unicode61 remove_diacritics 2` plus
  application-level I/İ normalisation in `db.normalise_query`.

## Publishing to PyPI

The project is wired to publish from GitHub Actions (see
[`.github/workflows/release.yml`](.github/workflows/release.yml)) when you push a
tag of the form `vX.Y.Z`. For manual publishing:

```bash
pip install build twine
python -m build                          # produces dist/*.whl and dist/*.tar.gz
twine check dist/*
twine upload dist/*                      # uploads to https://pypi.org/project/ataturk-mcp/
# or for a dry run on TestPyPI:
twine upload --repository testpypi dist/*
```

## Credits & author

- **Author:** [Buğra Ayan — bugraayan.com](https://bugraayan.com)
- **Email:** `hello@bugraayan.com`
- **Atatürk Araştırma Merkezi (ATAM)** — for digitising and editing the *Söylev
  ve Demeçleri* corpus, the gold-standard source for this project.
- **Vikikaynak / Türkçe Wikisource contributors** — for transcribing
  individual addresses and the TBMM opening speeches.
- **Internet Archive** — for hosting the public-domain English Nutuk.

If you build research or journalism on top of this server, please cite both
Atatürk's words (via the in-tool `cite` command) **and** the original sources
(ATAM volume + page, or the Wikisource URL).

## License

MIT for the code (© Buğra Ayan / [bugraayan.com](https://bugraayan.com)). The
speech texts themselves are in the public domain (Atatürk died in 1938). ATAM,
TBMM and Wikisource are credited as the digital sources used to build the
corpus; please respect each source's terms when redistributing.
