Metadata-Version: 2.4
Name: fileslicer
Version: 0.1.0
Summary: Efficient file slicing and memory-mapped line iteration.
Project-URL: Documentation, https://github.com/FlavioAmurrioCS/fileslicer#readme
Project-URL: Issues, https://github.com/FlavioAmurrioCS/fileslicer/issues
Project-URL: Source, https://github.com/FlavioAmurrioCS/fileslicer
Author-email: Flavio Amurrio <25621374+FlavioAmurrioCS@users.noreply.github.com>
License-Expression: MIT
License-File: LICENSE.txt
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.9
Provides-Extra: tests
Requires-Dist: pytest; extra == 'tests'
Requires-Dist: tomli; (python_version < '3.11') and extra == 'tests'
Provides-Extra: types
Requires-Dist: mypy; extra == 'types'
Requires-Dist: pyrefly; extra == 'types'
Requires-Dist: pyright[nodejs]; extra == 'types'
Requires-Dist: pytest; extra == 'types'
Requires-Dist: tomli; (python_version < '3.11') and extra == 'types'
Requires-Dist: ty; extra == 'types'
Requires-Dist: typing-extensions; extra == 'types'
Description-Content-Type: text/markdown

# fileslicer

[![PyPI - Version](https://img.shields.io/pypi/v/fileslicer.svg)](https://pypi.org/project/fileslicer)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/fileslicer.svg)](https://pypi.org/project/fileslicer)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/FlavioAmurrioCS/fileslicer/main.svg)](https://results.pre-commit.ci/latest/github/FlavioAmurrioCS/fileslicer/main)

-----

**fileslicer** is a lightweight Python library for efficiently reading and splitting large files using memory mapping. It allows you to iterate over lines within a file slice and split files into chunks without loading the entire file into memory, making it ideal for processing very large files.

---

## Features

- Memory-efficient line iteration using `mmap`.
- Split large files into chunks while respecting newline boundaries.
- Simple and Pythonic API.
- Works with files of arbitrary size.

---

## Installation

Install via pip:

```bash
pip install fileslicer
````

---

## Usage

### Basic Example: Iterate over a file

```python
from fileslicer import FileSlice

# Create a FileSlice for an entire file
slice = FileSlice.from_file("large_file.txt")

# Iterate over lines in the slice
for line in slice.iter_lines():
    print(line.decode().strip())
```

### Split a File into Chunks

```python
from fileslicer import FileSlice

# Split a file into 4 chunks
chunks = FileSlice.split_file("large_file.txt", splits=4)

for chunk in chunks:
    print(f"Processing bytes {chunk.start_offset}-{chunk.end_offset}")
    for line in chunk.iter_lines():
        print(line.decode().strip())
```

### Create a Custom File Slice

```python
from fileslicer import FileSlice

# Only read bytes 1000 to 5000
slice = FileSlice("large_file.txt", 1000, 5000)

for line in slice.iter_lines():
    print(line.decode().strip())
```

---

## API

### `FileSlice`

* `FileSlice(file_path: str, start_offset: int, end_offset: int)`: Represents a slice of a file.

* `iter_lines() -> Generator[bytes]`: Iterate over lines in the file slice as bytes.

* `@staticmethod from_file(file_path: str) -> FileSlice`: Create a `FileSlice` covering the entire file.

* `@staticmethod split_file(file_path: str, splits: int) -> list[FileSlice]`: Split a file into multiple slices, aligned to newline boundaries.

---

## Why Use fileslicer?

Processing extremely large files with standard file reading can be slow and memory-intensive. **fileslicer** uses memory mapping to efficiently slice and iterate over file data without reading everything into memory. Inspired by the "1 Billion Row Challenge" in Python, it is perfect for data processing pipelines, log analysis, and ETL tasks.

---

## License

`fileslicer` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.
