Metadata-Version: 2.4
Name: data401-nlp
Version: 0.0.7
Summary: Interactive NLP course labs for Jupyter, Colab, and Deepnote
Home-page: https://github.com/su-dataAI/data401-nlp
Author: Lisa D Harper
Author-email: Lisa D Harper <lisa.harper@su.edu>
License: Apache-2.0
Project-URL: Homepage, https://github.com/su-dataAI/data401-nlp
Project-URL: Documentation, https://su-dataAI.github.io/data401-nlp
Project-URL: Repository, https://github.com/su-dataAI/data401-nlp
Project-URL: Bug Tracker, https://github.com/su-dataAI/data401-nlp/issues
Project-URL: Course Website, https://www.notion.so/Intro-to-Natural-Language-Processing-28b213a83886806982a5c03b425595c4
Keywords: nbdev,jupyter,nlp,education,spacy,transformers
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Education
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastcore>=1.8.16
Requires-Dist: nbformat>=5.0
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: httpx>=0.27.0
Requires-Dist: lisette>=0.0.15
Requires-Dist: dialoghelper>=0.1.6
Requires-Dist: spacy<3.9,>=3.7
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: nbdev>=2.3.0; extra == "dev"
Provides-Extra: nlp
Requires-Dist: spacy<3.9,>=3.7; extra == "nlp"
Requires-Dist: nltk<3.10,>=3.9; extra == "nlp"
Provides-Extra: transformers
Requires-Dist: transformers<4.50,>=4.45; extra == "transformers"
Requires-Dist: torch<2.5,>=2.1; extra == "transformers"
Provides-Extra: api
Requires-Dist: fastapi<0.128,>=0.100; extra == "api"
Requires-Dist: pydantic<3,>=2.0; extra == "api"
Provides-Extra: all
Requires-Dist: spacy<3.9,>=3.7; extra == "all"
Requires-Dist: nltk<3.10,>=3.9; extra == "all"
Requires-Dist: transformers<4.50,>=4.45; extra == "all"
Requires-Dist: torch<2.5,>=2.1; extra == "all"
Requires-Dist: fastapi<0.128,>=0.100; extra == "all"
Requires-Dist: pydantic<3,>=2.0; extra == "all"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python

# README


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

The course website is located
[here](https://www.notion.so/Intro-to-Natural-Language-Processing-28b213a83886806982a5c03b425595c4?source=copy_link).
Lecture materials, assignments, quizzes, etc. can be accessed at that
link. You will need an API key to submit notebooks and that will be
provided to you via email.

This site contains jupyter notebooks, data, and other code artifacts
associated with this course.

### Choosing a Notebook Environment

Most work will not require the use of GPUs. You can probably get away
with not using them at all, unless you have a particular desire to do
so.

#### Google Colab - single notebook experience

If you prefer:

- working within a single notebook
- are already comfortable with Google Colab
- don’t mind re-installing dependencies on re-start
- need access to GPUs

you may prefer Google Colab.

#### Deepnote

If you prefer:

- easy install, more persistence of dependencies
- large number of system integrations
- Dataframe charts, interactive widgets, dashboards, app deployment
- realtime collaboration

you may prefer Deepnote.

You will need to create a free account and then request an education
plan. To use GPUs or higher performance machines, you must add a payment
method - but you do not need to upgrade the plan.

All students will be given links to deepnote for labs.

#### Local JupyterLab / Notebook

If you are already comfortable in Jupyter in your local environment and:

- you want full control of your machine and environment
- persistence of dependencies
- and don’t mind dealing with management of your environment

you may prefer local Jupyter. The downside is that there is no GPU
access unless you know how to set up something like a [remote modal
function that uses GPU](https://modal.com/docs/guide/jupyter-notebooks).

## Installation

### For Students (Google Colab)

To use Colab and submit for credit:

- Download a notebook from GitHub
- Upload a local copy of the notebook to Colab
- Save a copy in Drive
- **Ensure the file name matches the variable NOTEBOOK_NAME in the
  section “Submit Notebook for Credit”**.

Saving to Drive and matching the filename are only required if you are
submitting for credit.

You will need to add the SUBMIT_API_KEY to environmental variables.

### For Deepnote

Every week, there will be new link posted for a Deepnote project. At
least the first time, when you click on the link you will be asked to
login or sign up to see the project. If you sign up, you’ll get a free
14-trial of the Team plan, and from there you can request the education
plan.

- When the project opens, click Duplicate (top right).
- This creates your own private copy of the lab.
- You will need to add the SUBMIT_API_KEY to environmental variables.

### For Local Development

If you want to run notebooks locally:

``` bash
# Clone the repository
git clone https://github.com/su-dataAI/data401-nlp.git
cd data401-nlp

# If you don't have uv you can:
#curl -LsSf https://astral.sh/uv/install.sh | sh (macOS/Linux) or pip install uv as a fallback

# Create a virtual environment using uv (requires Python 3.11+)
# If you want to use a 3.13+, you will need to upgrade torch to torch>=2.1,<2.6
uv venv --python 3.11

# Activate the virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install with all dependencies
uv pip install -e ".[dev,all]"

# Download spaCy model
python -m spacy download en_core_web_sm

# Start Jupyter Lab
jupyter lab

# Add .env file (root or nbs folder)

```

You will need to `git pull` when each new lab is posted.

### Installation Options

The package supports flexible installation based on your needs:

``` bash
# Minimal installation (core utilities only)
pip install data401-nlp

# With NLP tools (spaCy, NLTK)
pip install data401-nlp[nlp]

# With transformers and PyTorch
pip install data401-nlp[transformers]

# With API support (FastAPI, Pydantic)
pip install data401-nlp[api]

# Everything (recommended for students)
pip install data401-nlp[all]
```

### Platform Support

✅ Google Colab  
✅ Deepnote  
✅ Jupyter Lab  
✅ Local Python 3.11+

### Helper Modules

The package includes several helper modules to make your NLP work
easier:

- `data401_nlp.helpers.env` - Environment detection and API key loading
- `data401_nlp.helpers.spacy` - Automatic spaCy model management
- `data401_nlp.helpers.submit` - Assignment submission utilities
- `data401_nlp.helpers.llm` - LLM integration helpers

The helper libraries may be updated as the course proceeds.

## Contents

| Lab | Deepnote | GitHub |
|----|----|----|
| Intro (Jan 15) | [![Open in Deepnote](https://img.shields.io/badge/Open%20in%20Deepnote-1f6feb?logo=deepnote&style=flat-square.png)](https://deepnote.com/workspace/su-dataAI-5a86547a-7f7f-4c70-b15f-c51d900fa78f/project/DATA401-lab1-d4fdadd0-4a8e-4aa2-9e06-4ecca36849d6/notebook/data401nlp-nbs-01intro-d59ef27a7c1b43dc8115dd1a68d3d643) | [![Open In GitHub](https://img.shields.io/badge/Open%20in%20GitHub-gray?logo=github&style=flat-square.png)](https://github.com/su-dataAI/data401-nlp/blob/main/nbs/01_intro.ipynb) |
| EDA (Jan 27) | [![Open in Deepnote](https://img.shields.io/badge/Open%20in%20Deepnote-1f6feb?logo=deepnote&style=flat-square.png)](https://deepnote.com/workspace/su-dataAI-5a86547a-7f7f-4c70-b15f-c51d900fa78f/project/EDA-Notebook-6dc8735f-323c-49e8-9378-7c682c293600?utm_content=6dc8735f-323c-49e8-9378-7c682c293600) | [![Open In GitHub](https://img.shields.io/badge/Open%20in%20GitHub-gray?logo=github&style=flat-square.png)](https://github.com/su-dataAI/data401-nlp/blob/main/nbs/02_eda_spacy.ipynb) |
| Regex (3 Feb) | [![Open in Deepnote](https://img.shields.io/badge/Open%20in%20Deepnote-1f6feb?logo=deepnote&style=flat-square.png)](https://deepnote.com/workspace/su-dataAI-5a86547a-7f7f-4c70-b15f-c51d900fa78f/project/Regex-Notebook-a3e0b565-a83c-44e2-bb31-88ada85f69db?utm_content=a3e0b565-a83c-44e2-bb31-88ada85f69db) | [![Open In GitHub](https://img.shields.io/badge/Open%20in%20GitHub-gray?logo=github&style=flat-square.png)](https://github.com/su-dataAI/data401-nlp/blob/main/nbs/03_regex.ipynb) |
| EDA (3 Feb; optional part 2) | [![Open in Deepnote](https://img.shields.io/badge/Open%20in%20Deepnote-1f6feb?logo=deepnote&style=flat-square.png)](https://deepnote.com/workspace/su-dataAI-5a86547a-7f7f-4c70-b15f-c51d900fa78f/project/EDA-part-2-Notebook-fe4bf541-b906-4dc8-99d3-0a6df1989c1b?utm_content=fe4bf541-b906-4dc8-99d3-0a6df1989c1b) | [![Open In GitHub](https://img.shields.io/badge/Open%20in%20GitHub-gray?logo=github&style=flat-square.png)](https://github.com/su-dataAI/data401-nlp/blob/main/nbs/03_eda_spacy.ipynb) |
