Metadata-Version: 2.4
Name: arman-bio-msa
Version: 0.1.0
Summary: Simple Multiple Sequence Alignment using Needleman-Wunsch Dynamic Programming
Author: Arman Shafiee
License-Expression: MIT
Project-URL: Homepage, https://github.com/armanshafiee/arman-bio-msa
Keywords: bioinformatics,sequence-alignment,needleman-wunsch,dynamic-programming,msa
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# arman-bio-msa

Simple Multiple Sequence Alignment using Needleman-Wunsch Dynamic Programming.

**Student Number:** 221201931  
**Algorithm Assignment:** `221201931 % 4 = 3` → Dynamic Programming

---

## Why Dynamic Programming?

Each student is assigned an algorithm based on `student_number % 4`.
Since `221201931 % 4 = 3`, the assigned algorithm is **Dynamic Programming**.

This package implements the **Needleman-Wunsch** algorithm, which is the most
well-known dynamic programming method for sequence alignment in bioinformatics.

## What is Multiple Sequence Alignment?

Multiple Sequence Alignment (MSA) means lining up three or more biological
sequences (DNA, RNA, or protein) so that similar characters appear in the same
column. Gaps (`-`) are inserted where needed. This helps scientists find
evolutionary relationships and important regions across species.

## What is Needleman-Wunsch?

Needleman-Wunsch is a dynamic programming algorithm that finds the **optimal
global alignment** between two sequences. It works in three steps:

1. **Initialize** a score matrix with gap penalties
2. **Fill** each cell by choosing the best of three options (diagonal, up, left)
3. **Traceback** from the bottom-right corner to build the aligned sequences

For multiple sequences, this package uses **progressive alignment**: align the
first two sequences, then add each remaining sequence one by one.

## Scoring System

| Event    | Score |
|----------|-------|
| Match    | +1    |
| Mismatch | -1    |
| Gap      | -2    |

Scores are customizable via function parameters.

---

## Installation

### Install from PyPI (after publishing):

```bash
pip install arman-bio-msa
```

### Install from TestPyPI:

```bash
pip install -i https://test.pypi.org/simple/ arman-bio-msa
```

### Install locally for development:

```bash
git clone https://github.com/armanshafiee/arman-bio-msa.git
cd arman-bio-msa
pip install -e .
```

---

## Usage

### Run the demo:

```bash
python main.py
```

### Use in your own code:

```python
from bio_msa import needleman_wunsch, multiple_alignment

# Pairwise alignment
matrix, aligned1, aligned2 = needleman_wunsch("AGCTG", "ACGTG")
print(aligned1)  # AGCTG
print(aligned2)  # ACGTG

# Multiple alignment
sequences = ["AGCTG", "ACGTG", "AGTC"]
result = multiple_alignment(sequences)
for seq in result:
    print(seq)
```

### Custom scoring:

```python
matrix, a1, a2 = needleman_wunsch("ATCG", "ACG", match=2, mismatch=-1, gap=-3)
```

---

## Example Output

```
Input Sequences:
  Seq1: AGCTG
  Seq2: ACGTG
  Seq3: AGTC

FINAL MULTIPLE ALIGNMENT
  Seq1: AGCTG
  Seq2: ACGTG
  Seq3: AG-TC
```

---

## Building and Publishing to PyPI

### Step 1: Install build tools

```bash
python -m pip install --upgrade build twine
```

### Step 2: Build the package

```bash
python -m build
```

This creates files in the `dist/` folder.

### Step 3: Upload to TestPyPI (for testing)

```bash
python -m twine upload --repository testpypi dist/*
```

### Step 4: Test installation from TestPyPI

```bash
pip install -i https://test.pypi.org/simple/ arman-bio-msa
```

### Step 5: Upload to real PyPI

```bash
python -m twine upload dist/*
```

### Step 6: Install from PyPI

```bash
pip install arman-bio-msa
```

---

## Project Structure

```
arman-bio-msa/
├── bio_msa/
│   ├── __init__.py        # Package exports
│   └── aligner.py         # Needleman-Wunsch algorithm
├── examples/
│   └── example_usage.py   # Usage examples
├── main.py                # Run the demo
├── pyproject.toml         # PyPI package config
├── README.md              # This file
├── report_draft.md        # Project report
├── LICENSE                # MIT License
└── .gitignore
```

## License

MIT License - see [LICENSE](LICENSE) for details.
