Metadata-Version: 2.1
Name: seq_tool
Version: 0.0.4
Summary: A Python package implementing the Generalized Sequential Pattern (GSP) algorithm with concurrency support.
Author-email: Peyton Lyons <plyons14@fordham.edu>, Jonathan Mele <jmele3@fordham.edu>, Michael Sluck <msluck@fordham.edu>, Cody Chen <cchen187@fordham.edu>, Fiza Metla <fmetla@fordham.edu>, James Guest <jguest2@fordham.edu>, Mario Marku <mmarku@fordham.edu>
License: MIT License
        
        Copyright (c) 2024 Fordham EDM Lab
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://www.cis.fordham.edu/edmlab/software/course-sequence-analysis
Project-URL: Bug Tracker, https://github.com/Fordham-EDM-Lab/CSAT/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: bottleneck==1.3.7
Requires-Dist: numexpr==2.8.7
Requires-Dist: numpy==1.25.2
Requires-Dist: pandas==2.2.1
Requires-Dist: python-dateutil==2.8.2
Requires-Dist: pytz==2024.1
Requires-Dist: setuptools==68.2.2
Requires-Dist: six==1.16.0
Requires-Dist: wheel==0.41.2
Requires-Dist: dateparser==1.2.0

# seq_tool

The **seq_tool** is a Python package that implements the Generalized Sequential Pattern (GSP) algorithm. Originally developed as part of the Course Sequencing Analysis Tool (CSAT) to analyze and sequence student course data, the toolkit has been extended to support more generalized use cases. It is designed for applications where analyzing sequential patterns is essential, such as course sequencing or other data patterns.

The package supports grouping items based on a specified granularity using concurrency and provides both a command-line interface (CLI) and a graphical user interface (GUI).

## Features

- **GSP Algorithm**: Analyze sequential patterns using the Generalized Sequential Pattern (GSP) algorithm.
- **Granularity-Based Grouping**: Use concurrency to group items by a specified time granularity, such as semesters (quarters) or months.
- **Command-Line Interface**: Run the GSP algorithm from the terminal for efficient scripting and automation.
- **Graphical User Interface**: Easily configure and run the algorithm using an interactive graphical interface.

## Installation

Install from command-line via PyPi project:
```bash
pip install seq-tool
```

## Usage

### Command-Line Interface

You can run the GSP algorithm using the CLI. Here’s an example:

```bash
seq-cli -i data.csv -s 50,100 -c BIO,CHEM --mode separate -o results --concurrency
```

For more detailed instructions and examples, please refer to the [CSAT Manual](https://docs.google.com/document/d/1yb6dg26jO_m0ir80vgfoN9ED0RF3bohMhJi0B3aig8w/edit?usp=sharing).

### Graphical User Interface

Launch the GUI for an easy-to-use interface:

```bash
seq-gui
```

The GUI allows you to:
- Load your data file.
- Set support thresholds and categories.
- Group items based on granularity (e.g., semester or month).

## Requirements

- Python 3.10 or later
- Dependencies are automatically installed when you run `pip install seq-tool`.

## Data Requirements

To understand the required data format, refer to the [Data Dictionary](https://docs.google.com/spreadsheets/d/19fIA5eiZxCav0MiElDoTDvuyinyYroxuJF9LWmQxvNc/edit?usp=sharing).

### Example Datasets

Example datasets for testing and exploring the CSAT are available [here on Google Drive](https://drive.google.com/drive/folders/1hyjKf69IY1wbkWwSl0AzG-wJTITOXlIW?usp=sharing).

## Development Roadmap

- **Current**: Exploring runtime. Potentially find ways to optimize the algorithm to improve performance for large datasets, such as parallel execution.
- **Future**: Determining how to include the time (span?) to better understand the output.

## License

This project is licensed under the MIT License - see the LICENSE file for details.
