Metadata-Version: 2.1
Name: parsera
Version: 0.1.0
Summary: Lightweight library for scraping web-sites with LLMs
License: GPL-2.0-or-later
Author: Mikhail Zanka
Author-email: raznem@gmail.com
Requires-Python: >=3.10.dev0,<3.12.dev0
Classifier: License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: langchain (>=0.2.6,<0.3.0)
Requires-Dist: langchain-ollama (>=0.1.1,<0.2.0)
Requires-Dist: langchain-openai (>=0.1.8,<0.2.0)
Requires-Dist: markdownify (>=0.12.1,<0.13.0)
Requires-Dist: playwright (>=1.44.0,<2.0.0)
Requires-Dist: playwright-stealth (>=1.0.6,<2.0.0)
Description-Content-Type: text/markdown

# Parsera
Lightweight library for scraping web-sites with LLMs. 
You can check how it works on [Parsera website](https://parsera.org).

## Why Parsera?
Because it's simple and lightweight, with minimal token use it boosts speed and reduces expenses.

## Installation

```shell
pip install parsera
playwright install
```

## Basic usage

If you want to use OpenAI, remember to set up `OPENAI_API_KEY` env variable.
You can do this from python with:
```python
import os

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"
```

Next you can run a basic version that uses `gpt-4o-mini`
```python
from parsera import Parsera

url = "https://news.ycombinator.com/"
elements = {
    "Title": "News title",
    "Points": "Number of points",
    "Comments": "Number of comments",
}

scrapper = Parsera()
result = scrapper.run(url=url, elements=elements)
```

`result` variable will contain a json with a list of records:
```json
[
   {
      "Title":"Hacking the largest airline and hotel rewards platform (2023)",
      "Points":"104",
      "Comments":"24"
   },
    ...
]
```

## Run with local model
Install Ollama
```shell
pip install langchain-ollama
```

```python
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="llama3",
    temperature=0,
    # other params...
)

url = "https://news.ycombinator.com/"
elements = {
    "Title": "News title",
    "Points": "Number of points",
    "Comments": "Number of comments",
}
scrapper = Parsera(model=llm)
result = scrapper.run(url=url, elements=elements)
```


