Metadata-Version: 2.2
Name: konfuzio_sdk
Version: 0.3.30.dev20250313220306
Summary: Konfuzio Software Development Kit
Home-page: https://dev.konfuzio.com/sdk/index.html
Author: Helm & Nagel GmbH
Author-email: info@helm-nagel.com
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: bentoml==1.2.20
Requires-Dist: fastapi<0.111.0
Requires-Dist: certifi==2023.7.22
Requires-Dist: cloudpickle==2.2.1
Requires-Dist: filetype==1.0.7
Requires-Dist: lz4>=4.3.2
Requires-Dist: matplotlib==3.7.1
Requires-Dist: nltk<3.8.2,>=3.6.3
Requires-Dist: numpy==1.23.5
Requires-Dist: pandas<2.0.0,>=1.3.5
Requires-Dist: Pillow>=8.4.0
Requires-Dist: pydantic<2.8,>2
Requires-Dist: python-dateutil>=2.8.2
Requires-Dist: python-decouple>=3.3
Requires-Dist: python-dotenv<1.1,>=1.0
Requires-Dist: requests
Requires-Dist: regex>=2020.6.8
Requires-Dist: scikit-learn==1.2.2
Requires-Dist: tabulate>=0.9.0
Requires-Dist: tqdm>=4.64.0
Requires-Dist: pympler>=1.0.1
Provides-Extra: dev
Requires-Dist: autodoc_pydantic==2.2.0; extra == "dev"
Requires-Dist: coverage==7.3.2; extra == "dev"
Requires-Dist: jupytext==1.16.4; extra == "dev"
Requires-Dist: pytest>=7.1.2; extra == "dev"
Requires-Dist: pre-commit>=2.20.0; extra == "dev"
Requires-Dist: parameterized>=0.8.1; extra == "dev"
Requires-Dist: Sphinx==5.0.0; extra == "dev"
Requires-Dist: sphinx-toolbox==3.4.0; extra == "dev"
Requires-Dist: sphinx-reload==0.2.0; extra == "dev"
Requires-Dist: sphinx-notfound-page==0.8; extra == "dev"
Requires-Dist: m2r2==0.3.2; extra == "dev"
Requires-Dist: nbval==0.10.0; extra == "dev"
Requires-Dist: sphinx-sitemap==2.2.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme==1.0.0; extra == "dev"
Requires-Dist: sphinxcontrib-jquery; extra == "dev"
Requires-Dist: sphinxcontrib-mermaid==0.8.1; extra == "dev"
Requires-Dist: sphinx-copybutton==0.5.2; extra == "dev"
Requires-Dist: myst_nb==0.17.2; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: pytest-rerunfailures; extra == "dev"
Provides-Extra: ai
Requires-Dist: torch<2.6.0,>=1.8.1; extra == "ai"
Requires-Dist: chardet==5.1.0; extra == "ai"
Requires-Dist: evaluate==0.4.3; extra == "ai"
Requires-Dist: tensorflow-cpu==2.12.0; extra == "ai"
Requires-Dist: datasets==2.14.6; extra == "ai"
Requires-Dist: timm==0.6.7; extra == "ai"
Requires-Dist: torch>=1.8.1; extra == "ai"
Requires-Dist: mlflow==2.15.0; extra == "ai"
Requires-Dist: accelerate==0.20.1; extra == "ai"
Requires-Dist: transformers==4.30.2; extra == "ai"
Requires-Dist: spacy<3.8.0,>=2.3.5; extra == "ai"
Requires-Dist: torchvision>=0.9.1; extra == "ai"
Dynamic: author
Dynamic: author-email
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: summary

# Konfuzio SDK

![Konfuzio Downloads](https://img.shields.io/pypi/dm/konfuzio_sdk)

The Konfuzio Software Development Kit (Konfuzio SDK) provides a
[Python API](https://dev.konfuzio.com/sdk/sourcecode.html) to interact with the
[Konfuzio Server](https://dev.konfuzio.com/index.html#konfuzio-server).

## Features

The SDK allows you to retrieve visual and text features to build your own document models. Konfuzio Server serves as an
UI to define the data structure, manage training/test data and to deploy your models as API.

Function               | Public Host Free* | On-Site (Paid) |
:--------------------- |:------------------|:---------------|
OCR Text               | ✔️                | ✔️             |
OCR Handwriting        | ✔️                | ✔️             |
Text Annotation        | ✔️                | ✔️             |
PDF Annotation         | ✔️                | ✔️             |
Image Annotation       | ✔️                | ✔️ ️            |
Table Annotation       | ✔️                | ✔️             |
Download Images        | ✔️                | ✔️             |
Download PDF with OCR  | ✔️                | ✔️             |
Deploy AI models       | ✖️                | ✔️             |

`*` Under fair use policy: We will impose 10 pages/hour throttling eventually.


|                                                                   |                                                             |
|-------------------------------------------------------------------|-------------------------------------------------------------|
| 📒 [Docs](https://dev.konfuzio.com/sdk/index.html)                | Read the docs                                               |
| 💾 [Installation](https://dev.konfuzio.com/sdk/get_started.html)  | How to install the Konfuzio SDK                             |
| 🎓 [Tutorials](https://dev.konfuzio.com/sdk/tutorials.html)       | See what the Konfuzio SDK can do with our tutorials         |
| 💡 [Explanations](https://dev.konfuzio.com/sdk/explanations.html) | Here are links to teaching material about the Konfuzio SDK. |
| ⚙️ [API Reference](https://dev.konfuzio.com/sdk/sourcecode.html)  | Python classes, methods, and functions                      |
| 🐛 [Issue Tracker](https://konfuzio.com/support)                  | Report Konfuzio SDK issues                                  |
| 🔭 [Changelog](https://dev.konfuzio.com/sdk/)                     | Review the release notes                                    |
| 📰 MIT License                                                    | Review the license in the section below                     |

## Installation

As developer register on our [public HOST for free: https://app.konfuzio.com](https://app.konfuzio.com/accounts/signup/)

Then you can use pip to install Konfuzio SDK and run init:

    pip install konfuzio_sdk

    konfuzio_sdk init

The init will create a Token to connect to the Konfuzio Server. This will create variables `KONFUZIO_USER`,
`KONFUZIO_TOKEN` and `KONFUZIO_HOST` in an `.env` file in your working directory.

By default, the SDK is installed without the AI-related dependencies like `torch` or `transformers` and allows for using 
only the Data-related SDK concepts but not the AI models. To install the SDK with the AI components,
run the following command:
  ```
  pip install konfuzio_sdk[ai]
  ```

Find the full installation guide [here](https://dev.konfuzio.com/sdk/get_started.html#install-sdk).
To configure a PyCharm setup, follow the instructions [here](https://dev.konfuzio.com/sdk/quickstart_pycharm.html).

## CLI

We provide the basic function to create a new Project via CLI:

`konfuzio_sdk create_project YOUR_PROJECT_NAME`

You will see "Project `{YOUR_PROJECT_NAME}` (ID `{YOUR_PROJECT_ID}`) was created successfully!" printed.

And download any project via the id:

`konfuzio_sdk export_project YOUR_PROJECT_ID`

## Tutorials

You can find detailed examples about how to set up and run document AI pipelines in our 
[Tutorials](https://dev.konfuzio.com/sdk/tutorials.html), including:
- [Split a file into separate Documents](https://dev.konfuzio.com/sdk/tutorials/context-aware-file-splitting-model/index.html)
- [Document Categorization](https://dev.konfuzio.com/sdk/tutorials/document_categorization/index.html)
- [Train a Konfuzio SDK model to get insights from annual reports](https://dev.konfuzio.com/sdk/tutorials/annual-reports/index.html)

### Basics

Here we show how to use the Konfuzio SDK to retrieve data hosted on a Konfuzio Server instance.

```python
from konfuzio_sdk.data import Project, Document

# Initialize the Project
YOUR_PROJECT_ID: int
my_project = Project(id_=YOUR_PROJECT_ID)

# Get any online Document
DOCUMENT_ID_ONLINE: int
doc: Document = my_project.get_document_by_id(DOCUMENT_ID_ONLINE)

# Get the Annotations in a Document
doc.annotations()

# Filter Annotations by Label
MY_OWN_LABEL_NAME: str
label = my_project.get_label_by_name(MY_OWN_LABEL_NAME)
doc.annotations(label=label)

# Or get all Annotations that belong to one Category
YOUR_CATEGORY_ID: int
category = my_project.get_category_by_id(YOUR_CATEGORY_ID)
label.annotations(categories=[category])

# Force a Project update. To save time Documents will only be updated if they have changed.
my_project.get(update=True)
```

## License

MIT License

Copyright (c) 2025 Helm & Nagel GmbH
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
