Metadata-Version: 2.1
Name: pasqui
Version: 0.1.0
Summary: This python library is useful to perform serveral functions needed to structure unstructured text,
Home-page: https://github.com/NcabreraM/pasqui
Author: Natalia Cabrera-Morales
Author-email: natalia.cabrera.m@mail.pucv.cl
License: MIT
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pdfplumber
Requires-Dist: python-docx
Requires-Dist: openai
Requires-Dist: tiktoken
Requires-Dist: pandas
Requires-Dist: scipy
Requires-Dist: langchain
Requires-Dist: kor
Requires-Dist: requests
Requires-Dist: markdownify
Requires-Dist: scikit-learn
Requires-Dist: langchain-community
Requires-Dist: langchain-openai


# Pasqui

Pasqui is a Python library created in Google Colab. It is useful to perform serveral functions needed to structure unstructured text. It was created based on my dissertation work at University of Cambridge with the support of chatGPT and Gemini for coding.
Pasqui is designed to handly large amounts of long files, and gracefully deal with errors avoiding repeated processing. It works with both pdfs and docs. 

###It has 4 functions. 
* pasqui_conveting -> converts pdfs and docs into texts and moves them to a new folder.
* pasqui_embedding -> creates embeddings using cosine similarity.
* pasqui_summarising -> creates summaries based on customisable topics.
* pasqui_structuring -> creates structured data from unstructured text.

Pasqui requires kor package knowledge and google drive.
## Installation
```bash
pip install pasqui
