Metadata-Version: 2.4
Name: pylexfluent
Version: 0.1.86
Summary: Librairie outils IA Lexia par Lexfluent
Project-URL: Homepage, https://dev.azure.com/LexFluent2020/RevolutionAI
Project-URL: Issues, https://dev.azure.com/LexFluent2020/RevolutionAI/_queries/query/180a1ed2-3494-42cc-8d8a-2e60217c2171/
Author-email: Jacques MASSA <jacques.massa@lexfluent.com>
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.12
Requires-Dist: docling[all]
Requires-Dist: easyocr
Requires-Dist: img2pdf==0.6.0
Requires-Dist: jax==0.4.38
Requires-Dist: jaxlib==0.4.38
Requires-Dist: langchain-community
Requires-Dist: langchain-ollama
Requires-Dist: langchain-openai
Requires-Dist: matplotlib
Requires-Dist: mediapipe
Requires-Dist: ocrmypdf
Requires-Dist: opencv-python
Requires-Dist: openpyxl
Requires-Dist: pandas
Requires-Dist: pdf2image
Requires-Dist: pdfplumber
Requires-Dist: pillow>=10.1.0
Requires-Dist: pymongo
Requires-Dist: pytesseract
Requires-Dist: scikit-learn
Requires-Dist: sentence-transformers
Requires-Dist: sentencepiece
Requires-Dist: setuptools
Requires-Dist: spacy[cuda12x]
Requires-Dist: tensorflow
Requires-Dist: tensorflow-hub
Requires-Dist: tensorrt
Requires-Dist: tensorrt-dispatch
Requires-Dist: tensorrt-lean
Requires-Dist: tf-keras
Requires-Dist: torch
Requires-Dist: torchaudio
Requires-Dist: torchvision
Requires-Dist: tqdm
Requires-Dist: transformers
Requires-Dist: wheel
Description-Content-Type: text/markdown

# Libraire python Lexfluent RevolutionAI

| Création/Révision | Auteur | date |
| -- | -- | --- | 
|Création | *Jacques MASSA* | 2 décembre 2024|
|Modification | *jacques MASSA* | 10 mars 2025 |
|Modification | *jacques MASSA* | 4 janvier 2026|
---

## Présentation

La librairie pyLexfluent propose toutes les fonctionnalités IA dans les domaines juridique et document.  

 - Classification : Entraînement et inférence
 - Extraction de données : ODP, CNI, IBAN, Document juridique, Certificat d'Urbanisme, Extrait Acte de naissance, Extrait Acte de Décés,Extrait Acte de Mariage
 - Augmentation des données : Finance

## Installations Prérequises 

``` 
"pip install setuptools",
"pip install wheel",
"pip install scikit-learn",
"pip install matplotlib",
"pip install tqdm",
"pip install pytesseract ",
"pip install pillow>=10.1.0",
"pip install jax==0.4.38",
"pip install jaxlib==0.4.38",
"pip install mediapipe",
"pip install opencv-python", 
"pip install pandas",
"pip install tensorrt",
"pip install tensorrt-lean",
"pip install tensorrt-dispatch",
"pip install tensorflow",
"pip install tf-keras",
"pip install tensorflow-hub",
"pip install torch",
"pip install torchvision",
"pip install torchaudio",
"pip install sentence-transformers",
"pip install spacy[cuda12x]",
"pip install ocrmypdf",
"pip install pdf2image",
"pip install pdfplumber",
"pip install langchain-community",
"pip install langchain-ollama",
"pip install langchain-openai",
"pip install pymongo",
"pip install openpyxl",
"pip install easyocr",
"pip install docling[all]"
python -m spacy download fr_core_news_lg

```
Il y peut y avoir un conflit de version avec cuDNN requis par TensforFlow et Torch Dans ce cas il faut supprimer **nvidia-cuDNN-cu12** apporté par PIP 

```
pip uninstall nvidia-cudnn-cu12
```
# Prerequis système
## Update et installations requises
``` 
    apt-get update 
    apt-get upgrade
    apt install software-properties-common -y
    apt-get install poppler-utils -y
    add-apt-repository ppa:alex-p/tesseract-ocr5
    apt-get install libc6 -y
    apt-get install poppler-utils -y
    apt-get install tesseract-ocr -y
    apt-get install tesseract-ocr-fra -y
    apt-get install tesseract-ocr-eng -y
    apt-get install tesseract-ocr-ita -y
    apt-get install tesseract-ocr-spa -y
    apt-get install tesseract-ocr-deu -y
    apt-get install tesseract-ocr-cos -y
    apt-get install tesseract-ocr-lat -y
    apt-get install automake libtool -y
    apt-get install libleptonica-dev -y
    apt-get install ffmpeg libsm6 libxext6  -y
    apt-get install ocrmypdf -y    

``` 
## JBIG2 
Installing the JBIG2 encoder
Most Linux distributions do not include a JBIG2 encoder since JBIG2 encoding was patented for a long time. All known JBIG2 US patents have expired as of 2017, but it is possible that unknown patents exist.

JBIG2 encoding is recommended for OCRmyPDF and is used to losslessly create smaller PDFs. If JBIG2 encoding is not available, lower quality CCITT encoding will be used for monochrome images.

JBIG2 decoding is not patented and is performed automatically by most PDF viewers. It is widely supported and has been part of the PDF specification since 2001.

JBIG encoding is automatically provided by these OCRmyPDF packages: - Docker image (both Ubuntu and Alpine) - Snap package - ArchLinux AUR package - Alpine Linux package - Homebrew on macOS

For all other platforms, you would need to build the JBIG2 encoder from source:
```
git clone https://github.com/agl/jbig2enc
cd jbig2enc
./autogen.sh
./configure && make
[sudo] make install
```
Dependencies include libtoolize and libleptonica, which on Ubuntu systems are packaged as libtool and libleptonica-dev. On Fedora (35) they are packaged as libtool and leptonica-devel. For this to work, please make sure to install autotools, automake, libtool, pkg-config and leptonica first if not already installed. Other dependencies might be required depending on your system.

```
[sudo] apt install autotools-dev automake libtool libleptonica-dev pkg-config
```
 
## Téléchargement modèles 
### SPACY 

``` python -m spacy download fr_core_news_lg ```



## GPU issue 
Si problème : Successful NUMA node read from SysFS had negative value (-1) 

```
for a in /sys/bus/pci/devices/*; do echo 0 |  tee -a $a/numa_node; done

```

# Exemples d'utilisation 

## Classification  

### Code 
```
import logging
import sys

from lxf.services.measure_time import measure_time_async
from lxf.services.try_safe import try_safe_execute_asyncio



from lxf.ai.classification.classifier import get_classification
from lxf.domain.predictions import  Predictions

import lxf.settings as settings 
from lxf.settings import set_looging_level, get_logging_level
set_logging_level(logging.DEBUG)
###################################################################

logger = logging.getLogger('test classifier')
fh = logging.FileHandler('./logs/test_classifier.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################

@measure_time_async
async def do_test(file_name) -> Predictions :
    """
    """
    return await get_classification(file_name=file_name,max_pages=10)


if __name__ == "__main__":
    sys.stdout.reconfigure(line_buffering=True) 
    pdf_path = "data/ODP.pdf"
    iban_pdf="data/RIBB.pdf"
    result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=iban_pdf) #asyncio.run(do_test(iban_pdf))
    print(result)    
    result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=pdf_path) #asyncio.run(do_test(pdf_path))
    print(result)

```

### Code 

```
import logging
import asyncio
import os
import sys



import lxf.settings as settings
from lxf.setting import set_logging_level, get_logging_level
set_logging_level(logging.DEBUG)
settings.enable_tqdm=False

from lxf.domain.loan import Pret
from lxf.extractors.finance import odp_extractor
from lxf.extractors.finance import iban_extractor

from lxf.services.try_safe import  try_safe_execute_async



###################################################################

logger = logging.getLogger('test_finance')
fh = logging.FileHandler('./logs/test_finance.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################

async def do_test_odp(file_path:str)->Pret:
    result = await try_safe_execute_async(logger,odp_extractor.extract_data,file_path=file_path)
    return result
    
async def do_test_iban(file_path:str)->str :
    """
    """
    result = await try_safe_execute_async(logger,iban_extractor.extract_data,file_path=file_path)
    return result

if __name__ == "__main__":
    sys.stdout.reconfigure(line_buffering=True) 
    pdf_path = "data/ODP.pdf"
    # pret:Pret=  asyncio.run(do_test_odp(file_path=pdf_path))
    # if pret!=None:
    #     print(pret.emprunteurs)
    iban_pdf="data/rib pm.pdf"
    txt = asyncio.run(do_test_iban(file_path=iban_pdf))
    print(txt)
    
```