Metadata-Version: 2.4
Name: pyctakes
Version: 1.0.0
Summary: Python-native clinical NLP framework mirroring Apache cTAKES functionality
Author-email: Sonish Sivarajkumar <sonish.sivarajkumar@gmail.com>
Maintainer-email: Sonish Sivarajkumar <sonish.sivarajkumar@gmail.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/sonishsivarajkumar/PyCTAKES
Project-URL: Documentation, https://sonishsivarajkumar.github.io/PyCTAKES/
Project-URL: Repository, https://github.com/sonishsivarajkumar/PyCTAKES.git
Project-URL: Bug Tracker, https://github.com/sonishsivarajkumar/PyCTAKES/issues
Keywords: nlp,clinical,medical,ctakes,umls,healthcare
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: spacy>=3.4.0
Requires-Dist: stanza>=1.4.0
Requires-Dist: transformers>=4.20.0
Requires-Dist: torch>=1.12.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: pydantic>=1.9.0
Requires-Dist: click>=8.0.0
Requires-Dist: fastapi>=0.75.0
Requires-Dist: uvicorn>=0.17.0
Requires-Dist: langchain>=0.0.100
Requires-Dist: flashtext>=2.7
Requires-Dist: pyyaml>=6.0
Requires-Dist: tqdm>=4.64.0
Requires-Dist: requests>=2.28.0
Requires-Dist: aiofiles>=0.8.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.18.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Requires-Dist: pre-commit>=2.17.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=4.5.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.17.0; extra == "docs"
Requires-Dist: myst-parser>=0.17.0; extra == "docs"
Provides-Extra: dask
Requires-Dist: dask[complete]>=2022.0.0; extra == "dask"
Provides-Extra: all
Requires-Dist: pyctakes[dask,dev,docs]; extra == "all"
Dynamic: license-file

# PyCTAKES 🏥
## Open Source Python-native Clinical NLP Framework

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![GitHub issues](https://img.shields.io/github/issues/sonishsivarajkumar/PyCTAKES)](https://github.com/sonishsivarajkumar/PyCTAKES/issues)
[![GitHub stars](https://img.shields.io/github/stars/sonishsivarajkumar/PyCTAKES)](https://github.com/sonishsivarajkumar/PyCTAKES/stargazers)
[![GitHub forks](https://img.shields.io/github/forks/sonishsivarajkumar/PyCTAKES)](https://github.com/sonishsivarajkumar/PyCTAKES/network)
[![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/sonishsivarajkumar/PyCTAKES/blob/main/CONTRIBUTING.md)

> **🚀 A modern, open source clinical NLP framework** that mirrors and extends Apache cTAKES functionality in pure Python. Drop-in replacement with superior usability, extensibility, and performance.

**PyCTAKES** transforms clinical text processing by providing a **100% open source**, Python-native alternative to Apache cTAKES. Built by the community, for the community - no vendor lock-in, no licensing fees, just powerful clinical NLP tools that anyone can use, modify, and contribute to.

---

## 🌟 Why Choose PyCTAKES?

<table>
<tr>
<td>

### 🔓 **Fully Open Source**
- **MIT License** - Free for commercial & research use
- **Transparent development** - All code, issues, and discussions public
- **Community-driven** - Shaped by real user needs
- **No vendor lock-in** - Own your clinical NLP pipeline

</td>
<td>

### ⚡ **Modern & Fast**
- **Pure Python** - No Java dependencies
- **pip installable** - Get started in seconds
- **Multiple backends** - spaCy, Stanza, rule-based
- **Production ready** - Optimized for real-world use

</td>
</tr>
<tr>
<td>

### 🏥 **Clinical-First Design**
- **Medical expertise built-in** - Clinical abbreviations, sections, terminology
- **cTAKES compatibility** - Drop-in replacement for existing workflows
- **Comprehensive NLP** - Tokenization → UMLS mapping
- **Assertion detection** - Negation, uncertainty, temporal context

</td>
<td>

### 🔧 **Developer Friendly**
- **Clean Python APIs** - Intuitive and well-documented
- **Modular architecture** - Use only what you need
- **Extensible framework** - Easy to add custom annotators
- **Rich ecosystem** - Integrates with pandas, spaCy, transformers

</td>
</tr>
</table>

---

## 🚀 Quick Start

### Installation
```bash
pip install pytakes
```

### 30-Second Demo
```python
import pytakes

# Create pipeline
pipeline = pytakes.create_default_pipeline()

# Process clinical text
clinical_note = """
Patient is a 65-year-old male with diabetes and hypertension.
He denies chest pain but reports shortness of breath.
Current medications: metformin 500mg BID, lisinopril 10mg daily.
"""

result = pipeline.process_text(clinical_note)

# Explore results
print(f"Found {len(result.entities)} clinical entities:")
for entity in result.entities[:3]:
    assertion = entity.assertion
    print(f"  • {entity.text} ({entity.label})")
    print(f"    → {assertion.polarity}, {assertion.uncertainty}")
```

**Output:**
```
Found 8 clinical entities:
  • diabetes (CONDITION)
    → POSITIVE, CERTAIN
  • hypertension (CONDITION)  
    → POSITIVE, CERTAIN
  • chest pain (SYMPTOM)
    → NEGATIVE, CERTAIN
```

---

## 📊 Performance & Features

### ⚡ Blazing Fast Performance
- **Basic Pipeline**: 39 annotations in 0.010s
- **Fast Pipeline**: 36 annotations in 0.001s  
- **Full Clinical Note**: 81 annotations in 0.504s

### 🎯 Comprehensive Clinical NLP

| Feature | Description | Status |
|---------|-------------|--------|
| **Sentence Segmentation** | Clinical-aware sentence boundary detection | ✅ |
| **Tokenization** | Advanced tokenization with POS tagging | ✅ |
| **Section Detection** | Chief Complaint, History, Medications, Assessment, etc. | ✅ |
| **Named Entity Recognition** | Medications, conditions, procedures, anatomy | ✅ |
| **Assertion Detection** | Negation, uncertainty, temporal, experiencer | ✅ |
| **UMLS Concept Mapping** | CUI normalization and semantic types | ✅ |
| **Relation Extraction** | Temporal and dosage relationships | 🔄 v1.1 |
| **REST API Service** | FastAPI deployment wrapper | 🔄 v1.1 |

### 🔧 Three Pipeline Types
```python
# Full-featured (highest accuracy)
pipeline = pytakes.create_default_pipeline()

# Speed-optimized (fastest processing)  
pipeline = pytakes.create_fast_pipeline()

# Minimal (basic entity extraction)
pipeline = pytakes.create_basic_pipeline()
```

---

## 💻 Command Line Interface

```bash
# Process single file
pytakes process note.txt --output results.json

# Batch processing
pytakes process notes/*.txt --output-dir results/

# Different pipelines and formats
pytakes process note.txt --pipeline fast --format xml
pytakes process note.txt --config custom_config.json
```

---

## 🤝 Open Source Community

### 👥 **Lead Contributors**
- **[Sonish Sivarajkumar](https://github.com/sonishsivarajkumar)** - *Lead Maintainer & Creator*
  - Clinical NLP researcher and software engineer
  - Apache cTAKES community member
  - Python & healthcare technology enthusiast

### 🌍 **Join Our Community**
We're building the future of clinical NLP together! Whether you're a:

- **👩‍⚕️ Clinician** - Help us understand real-world clinical text challenges
- **👨‍💻 Developer** - Contribute code, fix bugs, or add new features  
- **🔬 Researcher** - Share use cases, benchmarks, and domain expertise
- **📚 Technical Writer** - Improve documentation and tutorials
- **🎨 Designer** - Enhance user experience and visualization

**Everyone is welcome!** Check out our [Contributing Guide](CONTRIBUTING.md) to get started.

### 📈 **Community Stats**
- **Contributors**: Growing community of clinical NLP enthusiasts
- **Issues**: Active issue tracking and feature requests
- **Discussions**: Technical discussions and use case sharing
- **Releases**: Regular updates with new features and improvements

### 🎯 **Ways to Contribute**

<table>
<tr>
<td>

**🐛 Report Issues**
- Bug reports
- Feature requests  
- Documentation issues
- Performance problems

</td>
<td>

**💡 Share Ideas**
- New annotators
- Pipeline improvements
- Integration suggestions
- Use case examples

</td>
<td>

**🔧 Code Contributions**
- Bug fixes
- New features
- Performance optimizations
- Test improvements

</td>
<td>

**📖 Documentation**
- API documentation
- Tutorials & guides
- Example notebooks
- Translation support

</td>
</tr>
</table>

---

## 📚 Documentation & Resources

- **📖 [Full Documentation](https://sonishsivarajkumar.github.io/PyCTAKES)** - Complete guides and API reference
- **🚀 [Quick Start Guide](https://sonishsivarajkumar.github.io/PyTAKES/quickstart/)** - Get up and running in minutes
- **💡 [Examples](examples/)** - Real-world usage examples and configurations
- **🔧 [API Reference](https://sonishsivarajkumar.github.io/PyTAKES/api/)** - Detailed API documentation
- **⚡ [Performance Guide](https://sonishsivarajkumar.github.io/PyTAKES/advanced/performance/)** - Optimization tips and benchmarks
- **🤝 [Contributing](CONTRIBUTING.md)** - How to contribute to the project

---

## 🗺️ Roadmap

### 🎯 **v1.0** (Current) - Foundation
- ✅ Core pipeline architecture
- ✅ Clinical text processing (tokenization, NER, assertion)
- ✅ UMLS concept mapping framework
- ✅ CLI and Python APIs
- ✅ Comprehensive documentation

### ⚡ **v1.1** (Next) - Enhancement  
- 🔄 Enhanced UMLS integration (QuickUMLS)
- 🔄 Relation extraction (temporal, dosage)
- 🔄 REST API service wrapper
- 🔄 Docker containers & deployment guides
- 🔄 Performance optimizations

### 🚀 **v2.0** (Future) - Intelligence
- 🔮 LLM integration for disambiguation
- 🔮 Active learning capabilities  
- 🔮 Advanced relation extraction
- 🔮 Real-time processing pipelines
- 🔮 Federated learning support

---

## 🏆 Why Open Source Matters in Healthcare

Healthcare technology should be:

- **🔍 Transparent** - Auditable algorithms for patient safety
- **🤝 Collaborative** - Shared knowledge accelerates progress  
- **♿ Accessible** - No barriers to life-saving technology
- **🔧 Customizable** - Adaptable to diverse clinical environments
- **📈 Sustainable** - Community-driven long-term maintenance

PyTAKES embodies these principles by providing enterprise-grade clinical NLP capabilities as a **truly open source project**. No hidden costs, no vendor dependencies, just powerful tools for advancing healthcare through technology.

---

## 📄 License & Citation

### 📜 **License**
PyTAKES is released under the **MIT License** - see [LICENSE](LICENSE) for details.

```
Copyright (c) 2025 Sonish Sivarajkumar and Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
[Full license text in LICENSE file]
```

### 📝 **Citation**
If you use PyTAKES in your research, please cite:

```bibtex
@software{pytakes2025,
  title={PyTAKES: Open Source Python-native Clinical NLP Framework},
  author={Sivarajkumar, Sonish and Contributors},
  year={2025},
  url={https://github.com/sonishsivarajkumar/PyTAKES},
  version={1.0.0}
}
```

---

## 🙏 Acknowledgments

PyTAKES builds upon the excellent work of:

- **Apache cTAKES** - Pioneering clinical NLP framework
- **spaCy & Stanza** - Modern NLP processing libraries  
- **Clinical NLP Community** - Researchers and practitioners advancing the field
- **Open Source Contributors** - Everyone who helps make this project better

---

## 🚀 Get Started Today!

```bash
# Install PyTAKES
pip install pytakes

# Clone the repository
git clone https://github.com/sonishsivarajkumar/PyTAKES.git
cd PyTAKES

# Try the examples
python examples/comprehensive_demo.py
```

**Join us in revolutionizing clinical NLP!** 🎉

**[⭐ Star this repo](https://github.com/sonishsivarajkumar/PyTAKES)** | **[📚 Read the docs](https://sonishsivarajkumar.github.io/PyTAKES)** | **[🤝 Contribute](CONTRIBUTING.md)** | **[💬 Discuss](https://github.com/sonishsivarajkumar/PyTAKES/discussions)**
