Metadata-Version: 2.4
Name: gwseq_io_pp
Version: 0.0.3
Summary: Process BBI (bigWig/bigBed) and HiC files
Author-email: Arthur Gouhier <ajgouhier@gmail.com>
License-Expression: MIT
Project-URL: Repository, https://github.com/ajgouhier/gwseq_io
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Dynamic: license-file

## Installation

```
pip install gwseq-io-pp
```

Requires numpy.

## Usage

### Open bigWig and bigBed files

```python
reader = gwseq_io.open(path, *, zoom_correction)
```

Parameters:
- `zoom_correction` Scaling factor for automatic zoom level selection based on bin size. Only for bigWig files. 1/3 by default.

Attributes for bigWig and bigBed files:
- `main_header` General file formatting info.
- `zoom_headers` Zooms levels info (reduction level and location).
- `auto_sql` BED entries declaration (only in bigBed).
- `total_summary` Statistical summary of entire file values (coverage, sums and extremes).
- `chr_sizes` Chromosomes IDs and sizes.
- `type` Either "bigwig" or "bigbed".

### Read bigWig and bigBed signal

```python
values = reader.read_signal(chr_ids, starts, ends)
values = reader.read_signal(chr_ids, starts=starts, span=span)
values = reader.read_signal(chr_ids, ends=ends, span=span)
values = reader.read_signal(chr_ids, centers=centers, span=span)
```

Parameters:
- `chr_ids` `starts` `ends` `centers` Chromosomes ids, starts, ends and centers of locations. Both `starts` `ends` or one of `starts` `ends` `centers` (with `span`) may be specified.
- `span` Reading window in bp relative to locations `starts` `ends` `centers`. Only one reference may be specified if specified. Not by default.
- `bin_size` Reading bin size in bp. May vary in output if locations have variable spans or `bin_count` is specified. 1 by default.
- `bin_count` Output bin count. Inferred as max location span / bin size by default.
- `bin_mode` Method to aggregate bin values. Either "mean", "sum" or "count". "mean" by default.
- `full_bin` Extend locations ends to overlapping bins if true. Not by default.
- `def_value` Default value to use when no data overlap a bin. 0 by default.
- `zoom` BigWig zoom level to use. Use full data if -1. Auto-detect the best level if -2 by selecting the larger level whose bin size is lower than the third of `bin_size` (may be the full data). Full data by default.

Returns a numpy float32 array of shape (locations, bin count).

### Quantify bigWig and bigBed signal

```python
values = reader.quantify(chr_ids, starts, ends)
```

Parameters:
- `chr_ids` `starts` `ends` `centers` `span` `bin_size` `full_bin` `def_value` `zoom` Identical to `read_signal` method.
- `reduce` Method to aggregate values over span. Either "mean", "sd", "sem", "sum", "count", "min" or "max". "mean" by default.

Returns a numpy float32 array of shape (locations).

### Profile bigWig and bigBed signal

```python
values = reader.profile(chr_ids, starts, ends)
```

Parameters:
- `chr_ids` `starts` `ends` `centers` `span` `bin_size` `bin_count` `bin_mode` `full_bin` `def_value` `zoom` Identical to `read_signal` method.
- `reduce` Method to aggregate values over locations. Either "mean", "sd", "sem", "sum", "count", "min" or "max". "mean" by default.

Returns a numpy float32 array of shape (bin count).

### Read bigBed entries

```python
values = reader.read_entries(chr_ids, starts, ends)
```

Parameters:
- `chr_ids` `starts` `ends` `centers` `spans` Identical to `read_signal` method.

Returns a list (locations) of list of entries (dict with at least "chr", "start" and "end" keys).

### Convert bigWig to bedGraph or WIG

```python
reader.to_bedgraph(output_path)
reader.to_wig(output_path)
```

Parameters:
- `output_path` Path to output file.
- `chr_ids` Only extract data from these chromosomes. All by default.
- `zoom` Zoom level to use. Use full data if -1. Full data by default.

### Convert bigBed to BED

```python
reader.to_bed(output_path)
```

Parameters:
- `output_path` `chr_ids` Identical to `to_bedgraph` and `to_wig` methods.
- `col_count` Only write this number of columns (eg, 3 for chr, start and end). All by default.
