Metadata-Version: 2.4
Name: malwareDetector
Version: 0.2.0
Summary: Base framework for building malware detectors
Project-URL: Homepage, https://github.com/louiskyee/malwareDetector
Project-URL: Documentation, https://github.com/louiskyee/malwareDetector/wiki
Project-URL: Repository, https://github.com/louiskyee/malwareDetector.git
Author-email: PO-LIN LAI <bolin8017@gmail.com>
License: MIT
License-File: LICENCE.txt
Keywords: detector,framework,malware,security
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: pydantic-settings<3.0,>=2.0
Requires-Dist: pydantic<3.0,>=2.0
Requires-Dist: structlog<25.0,>=24.0
Requires-Dist: typer<1.0,>=0.9
Provides-Extra: dev
Requires-Dist: mypy<2.0,>=1.8; extra == 'dev'
Requires-Dist: pytest-cov<5.0,>=4.0; extra == 'dev'
Requires-Dist: pytest<9.0,>=8.0; extra == 'dev'
Requires-Dist: ruff<1.0,>=0.1; extra == 'dev'
Description-Content-Type: text/markdown

# malware-detector

A base framework for building malware detectors with modern Python.

* Source code: https://github.com/louiskyee/malwareDetector.git
* Wiki: https://github.com/louiskyee/malwareDetector/wiki
* PyPI: https://pypi.org/project/malware-detector/

## Features

- **Pydantic v2 Configuration** - Type-safe config with env vars and file support
- **Typer CLI** - Extensible command-line interface via factory function
- **Structured Logging** - Console and JSON output formats with structlog
- **Customizable Pipeline** - Define your own stages or use defaults
- **Type Hints** - Full typing support with py.typed marker

## Requirements

| Tool | Version |
|------|---------|
| Python | >= 3.12 |

## Installation

```bash
pip install malware-detector
```

Or with [uv](https://github.com/astral-sh/uv):

```bash
uv add malware-detector
```

## Quick Start

### Basic Usage

```python
from malware_detector import BaseDetector, BaseDetectorConfig

class MyDetector(BaseDetector):
    """My custom malware detector."""

    def stage_extract(self):
        self.log.info("extracting_features", input=str(self.config.path.input))
        # Extract features from dataset
        return self.config.folder.feature

    def stage_vectorize(self):
        # Convert features to vectors
        return self.config.folder.vectorize

    def stage_train(self):
        # Train the model
        return self.config.folder.model

    def stage_predict(self):
        # Run predictions
        return self.config.path.output


# Run the detector
detector = MyDetector()
detector.setup()  # Creates directories
results = detector.run()  # Runs all stages
```

### Run Specific Stages

```python
# Run only extract and vectorize
results = detector.run(stages=["extract", "vectorize"])
```

### Custom Pipeline

```python
class ClusteringDetector(BaseDetector):
    """Detector with custom pipeline stages."""

    default_stages = ["preprocess", "embed", "cluster", "export"]

    def stage_preprocess(self):
        ...

    def stage_embed(self):
        ...

    def stage_cluster(self):
        ...

    def stage_export(self):
        ...
```

## Configuration

### Custom Config

```python
from pydantic_settings import SettingsConfigDict
from malware_detector import BaseDetectorConfig

class MyConfig(BaseDetectorConfig):
    """Custom configuration with additional fields."""

    model_config = SettingsConfigDict(
        env_prefix="MY_DETECTOR_",
    )

    batch_size: int = 32
    model_name: str = "default"
    use_gpu: bool = True


class MyDetector(BaseDetector):
    config_class = MyConfig

    def stage_train(self):
        self.log.info("training", batch_size=self.config.batch_size)
        ...
```

### Environment Variables

```bash
export MALWARE_DETECTOR_CLASSIFY=true
export MALWARE_DETECTOR_PATH__INPUT="./my_dataset"
```

### Config File

Save as `config.toml`:

```toml
[path]
input = "./Dataset/program"
output = "./Predict/predict.json"

[folder]
dataset = "./Dataset/"
feature = "./Feature/"

classify = false
```

## CLI Integration

### Create CLI for Your Detector

```python
from malware_detector import create_cli
from my_detector import MyDetector

app = create_cli(MyDetector)

# Add custom commands
@app.command()
def evaluate():
    """Evaluate the trained model."""
    ...

if __name__ == "__main__":
    app()
```

### CLI Usage

```bash
# Generate default config
python -m my_detector init --output config.toml

# Run full pipeline
python -m my_detector run --config config.toml

# Run specific stages
python -m my_detector run --stages extract,vectorize

# JSON logging for production
python -m my_detector run --log-format json --log-level DEBUG
```

## Logging

```python
from malware_detector import configure_logging, get_logger

# Configure at startup
configure_logging(level="INFO", format="console")

# Get a logger
log = get_logger(__name__)
log.info("event_name", key="value", count=42)
```

Output formats:

```bash
# Console (development)
2024-01-19T10:30:00 [info] event_name    key=value count=42

# JSON (production)
{"event": "event_name", "key": "value", "count": 42, "timestamp": "..."}
```

## Migration from v0.1.x

| v0.1.x | v0.2.0 |
|--------|--------|
| `from malwareDetector.detector import detector` | `from malware_detector import BaseDetector` |
| `class MyDetector(detector)` | `class MyDetector(BaseDetector)` |
| `def extractFeature(self)` | `def stage_extract(self)` |
| `def vectorize(self)` | `def stage_vectorize(self)` |
| `def model(self, training)` | `def stage_train(self)` |
| `def predict(self)` | `def stage_predict(self)` |
| `config.json()` | `config.model_dump_json()` |
| `Config.parse_raw(data)` | `Config.model_validate_json(data)` |

## License

MIT
