Metadata-Version: 2.1
Name: konfuzio_sdk
Version: 0.3.23.dev20241208182702
Summary: Konfuzio Software Development Kit
Home-page: https://github.com/konfuzio-ai/konfuzio-sdk/
Author: Helm & Nagel GmbH
Author-email: info@helm-nagel.com
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: bentoml==1.2.20
Requires-Dist: fastapi<0.111.0
Requires-Dist: certifi==2023.7.22
Requires-Dist: cloudpickle==2.2.1
Requires-Dist: filetype==1.0.7
Requires-Dist: lz4>=4.3.2
Requires-Dist: matplotlib==3.7.1
Requires-Dist: nltk<3.8.2,>=3.6.3
Requires-Dist: numpy==1.23.5
Requires-Dist: pandas<2.0.0,>=1.3.5
Requires-Dist: Pillow>=8.4.0
Requires-Dist: pydantic<2.8,>2
Requires-Dist: python-dateutil>=2.8.2
Requires-Dist: python-decouple>=3.3
Requires-Dist: python-dotenv<1.1,>=1.0
Requires-Dist: requests
Requires-Dist: regex>=2020.6.8
Requires-Dist: scikit-learn==1.2.2
Requires-Dist: tabulate>=0.9.0
Requires-Dist: tqdm>=4.64.0
Requires-Dist: pympler>=1.0.1
Provides-Extra: dev
Requires-Dist: autodoc_pydantic==2.2.0; extra == "dev"
Requires-Dist: coverage==7.3.2; extra == "dev"
Requires-Dist: jupytext==1.16.4; extra == "dev"
Requires-Dist: pytest>=7.1.2; extra == "dev"
Requires-Dist: pre-commit>=2.20.0; extra == "dev"
Requires-Dist: parameterized>=0.8.1; extra == "dev"
Requires-Dist: Sphinx==5.0.0; extra == "dev"
Requires-Dist: sphinx-toolbox==3.4.0; extra == "dev"
Requires-Dist: sphinx-reload==0.2.0; extra == "dev"
Requires-Dist: sphinx-notfound-page==0.8; extra == "dev"
Requires-Dist: m2r2==0.3.2; extra == "dev"
Requires-Dist: nbval==0.10.0; extra == "dev"
Requires-Dist: sphinx-sitemap==2.2.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme==1.0.0; extra == "dev"
Requires-Dist: sphinxcontrib-mermaid==0.8.1; extra == "dev"
Requires-Dist: sphinx-copybutton==0.5.2; extra == "dev"
Requires-Dist: myst_nb==0.17.2; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: pytest-rerunfailures; extra == "dev"
Provides-Extra: ai
Requires-Dist: chardet==5.1.0; extra == "ai"
Requires-Dist: spacy<3.8.0,>=2.3.5; extra == "ai"
Requires-Dist: torch>=1.8.1; extra == "ai"
Requires-Dist: transformers==4.30.2; extra == "ai"
Requires-Dist: timm==0.6.7; extra == "ai"
Requires-Dist: mlflow==2.15.0; extra == "ai"
Requires-Dist: evaluate==0.4.1; extra == "ai"
Requires-Dist: accelerate==0.20.1; extra == "ai"
Requires-Dist: torchvision>=0.9.1; extra == "ai"
Requires-Dist: tensorflow-cpu==2.12.0; extra == "ai"
Requires-Dist: datasets==2.14.6; extra == "ai"

# Konfuzio SDK

![Konfuzio Downloads](https://img.shields.io/pypi/dm/konfuzio_sdk)

The Konfuzio Software Development Kit (Konfuzio SDK) provides a
[Python API](https://dev.konfuzio.com/sdk/sourcecode.html) to interact with the
[Konfuzio Server](https://dev.konfuzio.com/index.html#konfuzio-server).

## Features

The SDK allows you to retrieve visual and text features to build your own document models. Konfuzio Server serves as an
UI to define the data structure, manage training/test data and to deploy your models as API.

Function               | Public Host Free* | On-Site (Paid) |
:--------------------- |:------------------|:---------------|
OCR Text               | ✔️                | ✔️             |
OCR Handwriting        | ✔️                | ✔️             |
Text Annotation        | ✔️                | ✔️             |
PDF Annotation         | ✔️                | ✔️             |
Image Annotation       | ✔️                | ✔️ ️            |
Table Annotation       | ✔️                | ✔️             |
Download Images        | ✔️                | ✔️             |
Download PDF with OCR  | ✔️                | ✔️             |
Deploy AI models       | ✖️                | ✔️             |

`*` Under fair use policy: We will impose 10 pages/hour throttling eventually.


|                                                                                      |                                                            |
|--------------------------------------------------------------------------------------|------------------------------------------------------------|
| 📒 [Docs](https://dev.konfuzio.com/sdk/index.html)                                   | Read the docs                                              |
| 💾 [Installation](https://github.com/konfuzio-ai/konfuzio-sdk#installation)          | How to install the Konfuzio SDK                            |
| 🎓 [Tutorials](https://dev.konfuzio.com/sdk/tutorials.html)                          | See what the Konfuzio SDK can do with our tutorials        |
| 💡 [Explanations](https://dev.konfuzio.com/sdk/explanations.html)                    | Here are links to teaching material about the Konfuzio SDK. |
| ⚙️ [API Reference](https://dev.konfuzio.com/sdk/sourcecode.html)                     | Python classes, methods, and functions                     |
| ❤️ [Contributing](https://dev.konfuzio.com/sdk/contribution.html)                    | Learn how to contribute!                                   |
| 🐛 [Issue Tracker](https://github.com/konfuzio-ai/konfuzio-sdk/issues)               | Report and monitor Konfuzio SDK issues                     |
| 🔭 [Changelog](https://github.com/konfuzio-ai/konfuzio-sdk/releases)                 | Review the release notes                                   |
| 📰 [MIT License](https://github.com/konfuzio-ai/konfuzio-sdk/blob/master/LICENSE.md) | Review the license                                         |

## Installation

As developer register on our [public HOST for free: https://app.konfuzio.com](https://app.konfuzio.com/accounts/signup/)

Then you can use pip to install Konfuzio SDK and run init:

    pip install konfuzio_sdk

    konfuzio_sdk init

The init will create a Token to connect to the Konfuzio Server. This will create variables `KONFUZIO_USER`,
`KONFUZIO_TOKEN` and `KONFUZIO_HOST` in an `.env` file in your working directory.

By default, the SDK is installed without the AI-related dependencies like `torch` or `transformers` and allows for using 
only the Data-related SDK concepts but not the AI models. To install the SDK with the AI components,
run the following command:
  ```
  pip install konfuzio_sdk[ai]
  ```

Find the full installation guide [here](https://dev.konfuzio.com/sdk/get_started.html#install-sdk).
To configure a PyCharm setup, follow the instructions [here](https://dev.konfuzio.com/sdk/quickstart_pycharm.html).

## CLI

We provide the basic function to create a new Project via CLI:

`konfuzio_sdk create_project YOUR_PROJECT_NAME`

You will see "Project `{YOUR_PROJECT_NAME}` (ID `{YOUR_PROJECT_ID}`) was created successfully!" printed.

And download any project via the id:

`konfuzio_sdk export_project YOUR_PROJECT_ID`

## Tutorials

You can find detailed examples about how to set up and run document AI pipelines in our 
[Tutorials](https://dev.konfuzio.com/sdk/tutorials.html), including:
- [Split a file into separate Documents](https://dev.konfuzio.com/sdk/tutorials/context-aware-file-splitting-model/index.html)
- [Document Categorization](https://dev.konfuzio.com/sdk/tutorials/document_categorization/index.html)
- [Train a Konfuzio SDK model to get insights from annual reports](https://dev.konfuzio.com/sdk/tutorials/annual-reports/index.html)

### Basics

Here we show how to use the Konfuzio SDK to retrieve data hosted on a Konfuzio Server instance.

```python
from konfuzio_sdk.data import Project, Document

# Initialize the Project
YOUR_PROJECT_ID: int
my_project = Project(id_=YOUR_PROJECT_ID)

# Get any online Document
DOCUMENT_ID_ONLINE: int
doc: Document = my_project.get_document_by_id(DOCUMENT_ID_ONLINE)

# Get the Annotations in a Document
doc.annotations()

# Filter Annotations by Label
MY_OWN_LABEL_NAME: str
label = my_project.get_label_by_name(MY_OWN_LABEL_NAME)
doc.annotations(label=label)

# Or get all Annotations that belong to one Category
YOUR_CATEGORY_ID: int
category = my_project.get_category_by_id(YOUR_CATEGORY_ID)
label.annotations(categories=[category])

# Force a Project update. To save time Documents will only be updated if they have changed.
my_project.get(update=True)
```
