Metadata-Version: 2.4
Name: vatrix
Version: 0.2.1
Summary: NLP Processor & SBERT Training Tool
Author-email: Brian Bates <brian_bates@me.com>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Utilities
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.9
Classifier: Operating System :: OS Independent
Requires-Python: <3.10,>=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: appdirs==1.4.4
Requires-Dist: beautifulsoup4==4.13.3
Requires-Dist: certifi==2025.1.31
Requires-Dist: charset-normalizer==2.1.1
Requires-Dist: click==8.1.8
Requires-Dist: colorlog==6.9.0
Requires-Dist: fastapi==0.115.12
Requires-Dist: filelock==3.18.0
Requires-Dist: fsspec==2025.3.0
Requires-Dist: gdown==5.2.0
Requires-Dist: huggingface-hub==0.16.4
Requires-Dist: idna==3.10
Requires-Dist: Jinja2==3.1.6
Requires-Dist: joblib==1.4.2
Requires-Dist: MarkupSafe==3.0.2
Requires-Dist: mpmath==1.3.0
Requires-Dist: networkx==3.2.1
Requires-Dist: nlpaug==1.1.11
Requires-Dist: nltk==3.9.1
Requires-Dist: numpy==1.26.4
Requires-Dist: packaging==24.2
Requires-Dist: pandas==2.2.3
Requires-Dist: pillow==11.1.0
Requires-Dist: PySocks==1.7.1
Requires-Dist: python-dateutil==2.9.0.post0
Requires-Dist: pytz==2025.2
Requires-Dist: PyYAML==6.0.2
Requires-Dist: regex==2024.11.6
Requires-Dist: requests==2.32.3
Requires-Dist: safetensors==0.5.3
Requires-Dist: scikit-learn==1.6.1
Requires-Dist: scipy==1.13.1
Requires-Dist: sentence-transformers==2.2.2
Requires-Dist: six==1.17.0
Requires-Dist: soupsieve==2.6
Requires-Dist: sympy==1.13.3
Requires-Dist: threadpoolctl==3.6.0
Requires-Dist: tokenizers==0.13.3
Requires-Dist: torch==2.2.2
Requires-Dist: tqdm==4.67.1
Requires-Dist: transformers==4.30.2
Requires-Dist: typing_extensions==4.12.2
Requires-Dist: tzdata==2025.2
Requires-Dist: urllib3==2.3.0
Dynamic: license-file

![Python](https://img.shields.io/badge/python-3.9-blue)  ![License](https://img.shields.io/badge/license-MIT-green) [![Last Commit](https://img.shields.io/github/last-commit/brianbatesactual/vatrix)](https://github.com/brianbatesactual/vatrix) [![Stars](https://img.shields.io/github/stars/brianbatesactual/vatrix?style=social)](https://github.com/brianbatesactual/vatrix)


# 🧠 Vatrix

**Vatrix** is a NLP log processor, rendering natural language descriptions from machine data, and serves several use cases:
- streaming NLP & vector embedding
- batch NDJSON file processing 
- augmented data injection 
- generating training pairs for fine-tuning Sentence Transformers (SBERT)

---

## ✨ Features

- CLI-powered NDJSON log processing
- Modular template system powered by Jinja2
- SBERT data generation and similarity scoring
- Supports file mode, stream mode, and CLI flags
- Exports training pairs to CSV
- Exports highly similar sentence pairs for SBERT fine-tuning
- Flexible and colorful logging with log rotation
- Direct integration with Qdrant vector database (OSAI-Demo Stack)
- Unit & integration testing
- 

---

## 📦 Installation

```bash
pip install vatrix
```
Or install the latest from source:

```bash
git clone https://github.com/brianbatesactual/vatrix.git
cd vatrix
make setup
```
---

## 🛠️ Usage
```bash
vatrix --mode file \
       --render-mode all \
       --input data/input_logs.json \
       --output data/processed_logs.csv \
       --unmatched data/unmatched_logs.json \
       --generate-sbert-data \
       --log-level DEBUG \
       --log-file logs/vatrix_debug.log
```
Makefile Commands
```bash
make setup         # Create venv and install dependencies
make run           # Run log processor on default file
make stream        # Start reading NDJSON from stdin
make retrain       # Export SBERT sentence pairs
make freeze        # Regenerate requirements.txt
make clean         # Clean environment and build artifacts
make nuke          # Full reset of the project environment
```
---

## 🧠 Example

---

## 🧪 Testing
```bash
make test
```
---

## 📁 Logs

All logs are saved to the logs/ directory with daily rotation.

---

## 🧼 Cleanup
```bash
make clean    # Clean temp data
make nuke     # Wipe and rebuild virtualenv
```
---

## 📚 License

MIT © Brian Bates

Built with ❤️ for log intelligibility and NLP adventures.
