Metadata-Version: 2.3
Name: dakoda-core
Version: 0.1.0
Summary: A Python library for DAKODA corpus management and querying.
License: Apache-2.0
Keywords: dakoda,corpus,nlp,uima,linguistics
Author: Marius Hamacher
Author-email: marius.hamacher@outlook.de
Requires-Python: >=3.9,<3.14
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Text Processing :: Linguistic
Requires-Dist: dkpro-cassis (>=0.10.1,<1.0.0)
Requires-Dist: polars (>=1.30.0,<2.0.0)
Requires-Dist: requests (>=2.32.0,<3.0.0)
Requires-Dist: xsdata (==22.12)
Project-URL: Bug Tracker, https://github.com/dakoda-project/dakoda-core/issues
Project-URL: Documentation, https://github.com/dakoda-project/dakoda-core#readme
Project-URL: Homepage, https://github.com/dakoda-project/dakoda-core
Project-URL: Repository, https://github.com/dakoda-project/dakoda-core
Description-Content-Type: text/markdown

# Dakoda Core

A Python library for [DAKODA](https://dakoda.org/) corpus management.

DAKODA is an interdisciplinary project with the overarching goal of advancing the data competencies of early-career researchers in German as a Foreign/Second Language (DaF/DaZ) in the field of learner corpus research. The project develops language technology resources for acquisition-related research questions and tests them based on a broad data foundation.

## Prerequisites

- **Python 3.9+** (3.11+ recommended via [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#a-getting-pyenv))
- **Poetry** for dependency management (see [installation](https://python-poetry.org/docs/#installing-with-the-official-installer) guide)

## Setup

1. Clone the repository

    ```bash
    git clone git@github.com:dakoda-project/dakoda-core.git
    cd dakoda-core
   ```
    
2. Install dependencies and create virtual environment

    ```bash
    poetry install
    ```
   
3. Verify installation
    
    ```bash
    poetry run python -c "from dakoda import *; print('Dakoda Core installed successfully!')"
    ```

## Development

### Project Structure

```
dakoda-core/
├── src/
│   └── dakoda/           
│       ├── corpus.py     
│       ├── countries.py  
│       ├── languages.py  
│       ├── dakoda_types.py          
│       ├── dakoda_metadata_scheme.py 
│       ├── util.py      
│       └── res/          
│           ├── dakoda-logo.png
│           └── dakoda_typesystem.xml
├── tests/                
└── data/                 
```

### Running Tests

```bash
# Run all tests
poetry run pytest

# Run with coverage
poetry run pytest --cov=dakoda
```



## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run tests: `poetry run pytest`
5. Submit a pull request

## Release (PyPI)

Short release checklist:

1. Update the version in `pyproject.toml`.
2. Run tests:

    ```bash
    poetry run pytest
    ```

3. Build distribution artifacts:

    ```bash
    poetry build
    ```

4. Validate package metadata and long description:

    ```bash
    poetry run python -m pip install twine
    poetry run python -m twine check dist/*
    ```

5. Upload to TestPyPI first:

    ```bash
    poetry run python -m twine upload --repository testpypi dist/*
    ```

6. Upload to PyPI:

    ```bash
    poetry run python -m twine upload dist/*
    ```

Notes:

- Use API tokens for authentication (`__token__` username).
- Create a fresh build (`poetry build`) for every release.

## License

This project is licensed under the Apache License 2.0. See [LICENSE](LICENSE).

