Metadata-Version: 2.4
Name: formalyzer
Version: 0.0.2
Summary: Analyze PDF and web forms and fill in the forms
Home-page: https://github.com/drscotthawley/formalyzer
Author: Scott H. Hawley
Author-email: scott.hawley@belmont.edu
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python pdf
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: beautifulsoup4
Requires-Dist: playwright
Requires-Dist: claudette
Requires-Dist: lisette
Requires-Dist: pypdf
Requires-Dist: fastcore
Provides-Extra: dev
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# formalyzer


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Description:

Formalyzer will scrape the text from the PDF recc letter, and for each
URL in url_list, it will:

- launch a browser tab for that url
- fill in the form using what the LLM has gleaned from the recc letter
- attach the PDF via the form’s upload/attachment button

…and do no more.

The user will need to review the page and press the Submit button
manually.

### Requirements:

- Either `ollama` installed locally or `ANTHROPIC_API_KEY` environment
  variable set
- `beautifulsoup4, playwright, claudette, lisette, pypdf, fastcore`

## Usage

On MacOS, startup the Chrome browser looking to port 9222 by executing
this command in the terminal:

``` bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
```

Then you can run this command:

``` bash
formalyzer --debug <recc_info.txt> <recc_letter.pdf> <url_list.txt>
```

where `recc_info.txt` contains information about the recommender, their
name, their title, their address, phone number and email.
`urls_list.txt` is a file containing one URL per line.

### Installation

Install latest from the GitHub
[repository](https://github.com/drscotthawley/formalyzer):

``` sh
$ pip install git+https://github.com/drscotthawley/formalyzer.git
```

or from [pypi](https://pypi.org/project/formalyzer/):

``` sh
$ pip install formalyzer
```

After installing, users need to run `playwright install chromium` to
download the browser binaries.

# Demo

On MacOS, run these commands in Terminal:

1.  `/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug &`
2.  `cd example`
3.  `python -m http.server 8000 &`
4.  `export ANTHROPIC_API_KEY="__your_API_key_goes_here__"`
5.  `formalyzer --debug recc_info.txt sample_letter.pdf sample_urls.txt`

## Local LLM Execution

For [FERPA](https://studentprivacy.ed.gov/ferpa) compliance, running a
local model is preferable so that student data is not broadcast
elsewhere. I recommend using [`ollama`](https://ollama.com) and starting
with something medium-small like `qwen2.5:14b` (9 GB). Start up ollama:

``` bash
ollama serve & 
ollama pull qwen2.5:14b 
```

Then you can use the `--model` CLI flag, e.g. 

``` bash
formalyzer --debug --model 'ollama/qwen2.5:14b' recc_info.txt sample_letter.pdf sample_urls.txt
```

The quality of the form-filling will vary depending on the quality and
size of the model you get. Smaller models like `mistral` (4 GB) may
hallucinate many of the form field IDs, resulting in a mostly-blank form
in the end. For a huge (41 GB) model, try `ollama/qwen2:72b`.

## Developer Guide

### Install formalyzer in Development mode

``` sh
# make sure formalyzer package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to formalyzer
$ nbdev_prepare
```

### Documentation

Documentation can be found hosted on this GitHub
[repository](https://github.com/drscotthawley/formalyzer)’s
[pages](https://drscotthawley.github.io/formalyzer/). Additionally you
can find package manager specific guidelines on
[conda](https://anaconda.org/drscotthawley/formalyzer) and
[pypi](https://pypi.org/project/formalyzer/) respectively.

## TODO:

- Test with a less-than-superlative recc letter – to make sure it’s not
  just always selecting the top rating(s).
