Metadata-Version: 2.4
Name: pystata-x
Version: 0.1.2
Summary: Fast Stata-Python bridge — independent drop-in replacement for pystata.
Project-URL: Homepage, https://github.com/tmonk/pystata-x
Project-URL: Repository, https://github.com/tmonk/pystata-x
Project-URL: Issues, https://github.com/tmonk/pystata-x/issues
Author-email: Thomas Monk <t.d.monk@lse.ac.uk>
License-Expression: AGPL-3.0-only
License-File: LICENSE
Keywords: bridge,econometrics,pystata,python,stata,statistics
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.11
Provides-Extra: dev
Requires-Dist: build>=1.3.0; extra == 'dev'
Requires-Dist: hatch>=1.16.2; extra == 'dev'
Requires-Dist: numpy; extra == 'dev'
Requires-Dist: pandas; extra == 'dev'
Requires-Dist: pytest-benchmark>=5.2.3; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest-xdist>=3.6.1; extra == 'dev'
Requires-Dist: pytest>=9.0.3; extra == 'dev'
Requires-Dist: python-semantic-release>=9.8.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Requires-Dist: twine>=6.2.0; extra == 'dev'
Provides-Extra: numpy
Requires-Dist: numpy; extra == 'numpy'
Provides-Extra: pandas
Requires-Dist: pandas; extra == 'pandas'
Description-Content-Type: text/markdown

# pystata-x

Independent drop-in replacement for StataCorp's **pystata**. Provides a
fast `stata_setup` initialiser and command execution path that delivers
**~10–20,000×** speedup on short commands and **~11×** faster cold Stata
initialisation.

## Quick Start

```python
import sys
sys.path.insert(0, "path/to/pystata-x/src")

from pystata_x.stata_setup import config
config("/Applications/StataMP", "mp", splash=False)

# Use our fast execution:
from pystata_x._core import execute
output, rc = execute("display 1+1")
print(output)  # "2"
```

Or use the vendor-compatible API:

```python
from pystata_x._core import run
run("sysuse auto, clear")  # prints output, raises SystemError on error
```

## Why the polling thread is the bottleneck

The original `pystata.stata.run()` calls `RedirectOutput` from
`pystata.core.stout`, which creates a **`RepeatTimer` thread** that polls
Stata's output buffer every **15 ms**:

1. A background thread is created and started.
2. Every 15 ms it calls `StataSO_getOutput()` to fetch and display output.
3. After the command finishes a `"#return;0"` sentinel appears, the thread
   exits and is joined.

This design exists to support **Jupyter notebook interactivity** — users see
output streaming in as commands execute, like a live terminal. The polling
sleep (15 ms) plus thread lifecycle overhead adds **~40 ms of Python
overhead** on every `run()` call:

```
pystata.stata.run()  →  ~40 ms total
   ├─ thread create   ~1 ms
   ├─ 3× poll cycle   ~45 ms (3 × 15 ms)
   ├─ thread join     ~1 ms
   └─ work overhead   ~1 ms
```

For **headless / CLI / AI-agent** use cases (e.g., `stata-agent`), output is
captured programmatically after the command finishes — no streaming to a
terminal or notebook is needed. The polling thread is **pure overhead**.

`pystata-x` skips the thread entirely and calls `StataSO_Execute()` directly,
then drains the output buffer once after execution.

## Benchmark Results

Measured on macOS (StataSE, Apple Silicon M4) using
`benchmarks/run_benchmarks.py`.  Each test runs in a **fresh subprocess**
(Stata initialised once per test) with warm-up iterations before timing.
Times are the mean of multiple iterations measured via
`time.perf_counter()`.

### Command execution

| Test | Original pystata | pystata-x | Speedup |
|------|-----------------|-----------|---------|
| **Single command** (`display 1+1`) | ~40.6 ms | **~0.002 ms** | **~19,000×** |
| **Single command + echo** | ~40.7 ms | **~0.002 ms** | **~17,000×** |
| **Single command (quietly)** | ~40.4 ms | **~0.002 ms** | **~20,000×** |
| **Multi-line** (4 commands, do-file) | ~41.9 ms | **~3.2 ms** | **~13×** |
| **Raw StataSO_Execute** (no wrapper) | ~0.002 ms | ~0.002 ms | 1× (baseline) |

### Cold initialisation

| Method | Time | Speedup |
|--------|------|---------|
| Original `stata_setup.config()` (→ pystata) | ~1.50 s | 1× |
| Optimised `pystata_x._config.init()` | **~0.13 s** | **~11×** |
| Optimised `pystata_x.stata_setup.config()` | **~0.13 s** | **~11×** |

### Why cold init is faster

The original `pystata.config.init()` does several expensive things that `pystata_x`'s
init skips:

| Step | Original | pystata-x |
|------|----------|-----------|
| IPython/Jupyter probe | ~100 ms (imports `IPython`, checks for kernel) | **Skipped** |
| Preference-file I/O | ~50 ms (reads `profile.ini` from disk) | **Skipped** |
| Python 2 compat setup | ~30 ms (try/except on every `str()` conversion) | **Removed** |
| `stata_setup` wrapper overhead | ~50 ms (filesystem checks, extra imports) | **Inlined** |
| **Total** | **~1.50 s** | **~0.13 s** |

## Project Structure

```
src/pystata_x/
├── __init__.py              # Package entry point
├── _config.py               # Fast Stata initialisation (no IPython/py2 compat)
├── _core.py                 # Fast command execution (direct StataSO_Execute)
└── stata_setup.py           # Drop-in replacement for PyPI `stata-setup`
benchmarks/
├── run_benchmarks.py        # Comprehensive benchmark runner
└── history/                 # Benchmark result history
```

## Cross-platform

Shared-library discovery in `_config.py` supports macOS, Linux, and Windows:

| Platform | Library name | Search path |
|----------|-------------|-------------|
| macOS | `libstata-{be,se,mp}.dylib` | `Stata{B,E,MP}E.app/Contents/MacOS/` |
| Linux | `libstata-{be,se,mp}.so` | `{st_path}/` |
| Windows | `libstata-{be,se,mp}.dll` | `{st_path}/` |

## Licence

- Our modules (`_config.py`, `_core.py`, `stata_setup.py`, `__init__.py`,
  and all files under `benchmarks/`) are original work, released under the
  **GNU Affero General Public License v3.0**.
- The PyPI ``stata-setup`` package (v0.1.3, StataCorp LLC) is Apache 2.0
  licenced — our `stata_setup.py` provides the same public API with a
  completely rewritten implementation under AGPL-3.0.
