Metadata-Version: 2.4
Name: formalyzer
Version: 0.0.1
Summary: Analyze PDF and web forms and fill in the forms
Home-page: https://github.com/drscotthawley/formalyzer
Author: Scott H. Hawley
Author-email: scott.hawley@belmont.edu
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: beautifulsoup4
Requires-Dist: playwright
Requires-Dist: claudette
Requires-Dist: lisette
Requires-Dist: pypdf
Requires-Dist: fastcore
Provides-Extra: dev
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# formalyzer


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Description:

Formalyzer will scrape the text from the PDF recc letter, and for each
URL in url_list, it will:

- launch a browser tab for that url
- fill in the form using what the LLM has gleaned from the recc letter
- attach the PDF via the form’s upload/attachment button

…and do no more.

The user will need to review the page and press the Submit button
manually.

### Requirements:

- Either `ollama` installed locally or `ANTHROPIC_API_KEY` environment
  variable set
- `beautifulsoup4, playwright, claudette, lisette, pypdf, fastcore`

## Usage

On MacOS, startup the Chrome browser looking to port 9222 by executing
this command in the terminal:

``` bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
```

Then you can run this command:

``` bash
formalyzer --debug <recc_info.txt> <recc_letter.pdf> <url_list.txt>
```

where `recc_info.txt` contains information about the recommender, their
name, their title, their address, phone number and email.
`urls_list.txt` is a file containing one URL per line.

### Installation

Install latest from the GitHub
[repository](https://github.com/drscotthawley/formalyzer):

``` sh
$ pip install git+https://github.com/drscotthawley/formalyzer.git
```

or from [conda](https://anaconda.org/drscotthawley/formalyzer)

``` sh
$ conda install -c drscotthawley formalyzer
```

or from [pypi](https://pypi.org/project/formalyzer/)

``` sh
$ pip install formalyzer
```

After installing, users need to run `playwright install chromium` to
download the browser binaries.

# Demo

Using `example/` data. On MacOS, from the main `formalyzer` package
directory:

1.  Start up Chrome:
    `/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug`
2.  Launch a local web server:
    `python -m http.server 8000 --directory example/`
3.  Set your `ANTHROPIC_API_KEY` shell environment variable.
4.  Run the script:
    `formalyzer --debug example/recc_info.txt example/sample_letter.pdf example/sample_urls.txt`

## Local LLM Execution

For FERPA compliance, running a local model is preferable so that
student data is not broadcast elsewhere. I recommend using `ollama` and
starting with something medium-small like `qwen2.5:14b` (9 GB). Start up
ollama:

    ollama serve & 
    ollama pull qwen2.5:14b 

Then you can use the `--model` CLI flag, e.g. 

    formalyzer --debug -model 'ollama/qwen2.5:14b' example/recc_info.txt example/sample_letter.pdf example/sample_urls.txt

The quality of the form-filling will vary depending on the quality and
size of the model you get. Smaller models like `mistral` (4 GB) may
hallucinate many of the form field IDs, resulting in a mostly-blank form
in the end. For a huge (41 GB) model, try `ollama/qwen2:72b`.

## Developer Guide

### Install formalyzer in Development mode

``` sh
# make sure formalyzer package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to formalyzer
$ nbdev_prepare
```

### Documentation

Documentation can be found hosted on this GitHub
[repository](https://github.com/drscotthawley/formalyzer)’s
[pages](https://drscotthawley.github.io/formalyzer/). Additionally you
can find package manager specific guidelines on
[conda](https://anaconda.org/drscotthawley/formalyzer) and
[pypi](https://pypi.org/project/formalyzer/) respectively.

## TODO:

- Test with a less-than-superlative recc letter – to make sure it’s not
  just always selecting the top rating(s).
- Enable switching from Anthropic API to local LLM and/or CoPilot API
  (if possible)
