Metadata-Version: 2.4
Name: psair
Version: 0.0.2a3
Summary: Backend scaffolding for documentation, data handling, and analysis-oriented research software.
Author: Nick McCloskey
License-Expression: MIT
Project-URL: Repository, https://github.com/nmccloskey/PSAIR
Project-URL: Issues, https://github.com/nmccloskey/PSAIR/issues
Keywords: research,analysis,pipeline,documentation,etl,eda,nlp
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: <3.13,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Provides-Extra: log
Provides-Extra: view
Requires-Dist: streamlit<2.0,>=1.44; extra == "view"
Requires-Dist: python-docx<2.0,>=1.1; extra == "view"
Requires-Dist: PyYAML<7.0,>=6.0; extra == "view"
Provides-Extra: docs
Requires-Dist: docx2txt<1.0,>=0.8; extra == "docs"
Requires-Dist: openpyxl<3.2,>=3.1; extra == "docs"
Requires-Dist: PyPDF2<3.1,>=3.0; extra == "docs"
Requires-Dist: pypandoc<2.0,>=1.17; extra == "docs"
Requires-Dist: svglib<2.0,>=1.5; extra == "docs"
Requires-Dist: reportlab<5.0,>=4.3; extra == "docs"
Requires-Dist: python-docx<2.0,>=1.1; extra == "docs"
Requires-Dist: PyYAML<7.0,>=6.0; extra == "docs"
Provides-Extra: eda
Requires-Dist: numpy<1.28,>=1.23; extra == "eda"
Requires-Dist: pandas<2.3,>=2.2; extra == "eda"
Requires-Dist: scikit-learn<1.7,>=1.6; extra == "eda"
Requires-Dist: scipy<1.16,>=1.10.1; extra == "eda"
Requires-Dist: matplotlib<3.11,>=3.10; extra == "eda"
Requires-Dist: seaborn<0.14,>=0.13; extra == "eda"
Requires-Dist: openpyxl<3.2,>=3.1; extra == "eda"
Provides-Extra: nlp
Requires-Dist: spacy<4.0,>=3.8; extra == "nlp"
Provides-Extra: nlp-phon
Requires-Dist: nltk<4.0,>=3.9; extra == "nlp-phon"
Provides-Extra: nlp-parse
Requires-Dist: benepar<0.3,>=0.2; extra == "nlp-parse"
Provides-Extra: nlp-process
Requires-Dist: numpy<1.28,>=1.23; extra == "nlp-process"
Requires-Dist: pandas<2.3,>=2.2; extra == "nlp-process"
Requires-Dist: docx2txt<1.0,>=0.8; extra == "nlp-process"
Requires-Dist: tqdm<5.0,>=4.67; extra == "nlp-process"
Provides-Extra: nlp-full
Requires-Dist: numpy<1.28,>=1.23; extra == "nlp-full"
Requires-Dist: pandas<2.3,>=2.2; extra == "nlp-full"
Requires-Dist: spacy<4.0,>=3.8; extra == "nlp-full"
Requires-Dist: benepar<0.3,>=0.2; extra == "nlp-full"
Requires-Dist: nltk<4.0,>=3.9; extra == "nlp-full"
Requires-Dist: docx2txt<1.0,>=0.8; extra == "nlp-full"
Requires-Dist: tqdm<5.0,>=4.67; extra == "nlp-full"
Provides-Extra: etl
Requires-Dist: pandas<2.3,>=2.2; extra == "etl"
Requires-Dist: PyYAML<7.0,>=6.0; extra == "etl"
Provides-Extra: web
Requires-Dist: streamlit<2.0,>=1.44; extra == "web"
Requires-Dist: markdown<4.0,>=3.7; extra == "web"
Requires-Dist: python-docx<2.0,>=1.1; extra == "web"
Requires-Dist: PyYAML<7.0,>=6.0; extra == "web"
Provides-Extra: dev
Requires-Dist: pytest<9,>=8; extra == "dev"
Provides-Extra: full
Requires-Dist: numpy<1.28,>=1.23; extra == "full"
Requires-Dist: pandas<2.3,>=2.2; extra == "full"
Requires-Dist: PyYAML<7.0,>=6.0; extra == "full"
Requires-Dist: tqdm<5.0,>=4.67; extra == "full"
Requires-Dist: openpyxl<3.2,>=3.1; extra == "full"
Requires-Dist: PyPDF2<3.1,>=3.0; extra == "full"
Requires-Dist: pypandoc<2.0,>=1.17; extra == "full"
Requires-Dist: svglib<2.0,>=1.5; extra == "full"
Requires-Dist: reportlab<5.0,>=4.3; extra == "full"
Requires-Dist: python-docx<2.0,>=1.1; extra == "full"
Requires-Dist: scikit-learn<1.7,>=1.6; extra == "full"
Requires-Dist: scipy<1.16,>=1.10.1; extra == "full"
Requires-Dist: matplotlib<3.11,>=3.10; extra == "full"
Requires-Dist: seaborn<0.14,>=0.13; extra == "full"
Requires-Dist: spacy<4.0,>=3.8; extra == "full"
Requires-Dist: benepar<0.3,>=0.2; extra == "full"
Requires-Dist: nltk<4.0,>=3.9; extra == "full"
Requires-Dist: docx2txt<1.0,>=0.8; extra == "full"
Requires-Dist: streamlit<2.0,>=1.44; extra == "full"
Requires-Dist: markdown<4.0,>=3.7; extra == "full"
Dynamic: license-file

# PSAIR - Python Scaffolding for Analysis Itineraries in Research


![PyPI version](https://img.shields.io/pypi/v/psair)
![Python](https://img.shields.io/pypi/pyversions/psair)
![License](https://img.shields.io/github/license/nmccloskey/PSAIR)
[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://psair-dev.streamlit.app/)

**Status:** Alpha, active development  
**Current supported components:** documentation/manual tooling, plus alpha metadata and NLP utilities  
**Broader scope:** experimental infrastructure for ETL, EDA, NLP, and pipeline development

PSAIR is a backend utility package for research software repositories. Its long-term goal is to provide reusable scaffolding for documentation workflows, data handling, exploratory analysis, and pipeline-oriented tooling across projects such as DIAAD, ALASTR, CLATR, and related systems.

At present, **the documentation toolchain is the most stable component and is ready for general use**. PSAIR also includes early, usable metadata and NLP utilities for filename metadata extraction, file discovery, text preprocessing, and shared spaCy model loading. Other package areas are included as part of the package's evolving architecture, but they should currently be treated as **experimental, incomplete, and subject to substantial change**.

## What is ready now

The currently supported portion of PSAIR focuses on repository documentation workflows, including tools for:

- modular manual preparation
- outline generation
- character and formatting checks
- PDF-oriented manual export
- manual viewing utilities for Streamlit-style apps

These tools are intended to support repositories that maintain structured Markdown manuals and want lightweight support for viewing and export.

Also available in alpha form:

- metadata utilities for relative-path metadata field extraction and matching related files
- NLP utilities for text preprocessing and shared spaCy model/resource loading

## What is not ready yet

The broader `psair` namespace also contains modules related to:

- ETL
- exploratory data analysis
- pipeline scaffolding

These components are being actively developed and reorganized. They are not yet stable enough to treat as public APIs.

## Installation

For the currently supported documentation tooling:

```bash
pip install psair[docs]
```

If you are developing against the full experimental package layout:

```bash
pip install psair[full]
```

After installation, the documentation CLI is available as:

```bash
psair --help
```

If a terminal cannot find `psair`, confirm that the intended environment is active:

```bash
conda activate psair
python -m pip install -e ".[docs]"
psair --help
```

You can also run the command through Conda without changing the current shell:

```bash
conda run -n psair psair --help
```

The CLI currently focuses on manual/documentation workflows:

```bash
psair tree docs/manual
psair index docs/manual --show-files
psair search "topic" docs/manual
psair outline docs/manual --title "Instruction Manual" --version "0.0.1"
psair chars docs/manual --check-trailing --check-line-endings
psair pdf docs/manual --non-interactive --force
```

PDF compilation uses Pandoc and a LaTeX PDF engine such as XeLaTeX when using
the CLI PDF builder. Those executables must be installed separately and
available on `PATH`.

## Testing

This project uses [pytest](https://docs.pytest.org/) for its testing suite.  
All tests are located under the `tests/` directory, organized by module/function.

### Running Tests
To run the full suite:

```bash
pytest
```
Run with verbose output:
```bash
pytest -v
```
Run a specific test file:
```bash
pytest tests/test_manual/test_pdf.py
```

## Stability note

PSAIR is currently in alpha. Module structure, APIs, and dependency groupings may change significantly across early releases. Until the package reaches a more stable milestone, the documentation tooling should be treated as the primary supported interface; metadata and NLP utilities are available for alpha adopters but may still change.

## Intended use

PSAIR is primarily intended for developers and research programmers who want reusable infrastructure for analysis-oriented repositories. It is not yet a polished end-user application.

## Related projects

PSAIR serves as shared backend scaffolding for downstream repositories. Project-specific applications and domain workflows should be handled in those downstream tools rather than in PSAIR itself.
