Metadata-Version: 2.1
Name: bibxml2
Version: 1.1.5
Summary: A simple converter of MARCXML/PICAXML to CSV/TSV/parquet
Home-page: https://github.com/hsci-r/bibxml2
License: MIT
Keywords: MARCXML,PICA XML,bibliographic data,data conversion
Author: Eetu Mäkelä
Author-email: eetu.makela@helsinki.fi
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: click (>=8.2.1)
Requires-Dist: fsspec (>=2025.5.1,<2026.0.0)
Requires-Dist: hsciutil (>=0.1.2)
Requires-Dist: lxml (>=5.4.0)
Requires-Dist: pyarrow (>=20.0.0,<21.0.0)
Requires-Dist: s3fs (>=2025.5.1,<2026.0.0)
Requires-Dist: tqdm (>=4.67.1)
Project-URL: Repository, https://github.com/hsci-r/bibxml2
Description-Content-Type: text/markdown

# bibxml2

A simple converter of (possibly compressed) MARCXML/PICAXML to (possibly compressed) CSV/TSV/parquet.

The resulting CSV/TSV/parquet has been designed to be easy to use as a data table, but also to retain all ordering informaation in the original when such is needed. The format is as follows:
`record_number,field_number,subfield_number,field_code,subfield_code,value`

Here, `record_number` identifies the MARC/PICA+ record, while `field_number` and `subfield_number` can be used for more exact filtering / reconstructing the original field structure/order if needed.

For MARC data fields, `ind1` and `ind2` values are reported as separate rows with the `subfield_code` being `i_1` or `i_2`, but only when non-empty.

## Installation

Install from pypi with e.g. `pipx install bibxml2`.

## Usage

```sh
Usage: marcxml2 [OPTIONS] [INPUT]...

  Convert from MARCXML (compressed) input files into (compressed) CSV/TSV/parquet

Options:
  -o, --output TEXT  Output CSV/TSV (compressed) / parquet file  [required]
  --help             Show this message and exit.
```

```sh
Usage: picaxml2csv [OPTIONS] [INPUT]...

  Convert from PICAXML (compressed) input files into (compressed) CSV/TSV/parquet

Options:
  -o, --output TEXT  Output CSV/TSV (compressed) / parquet file  [required]
  --help             Show this message and exit.
```

If the output file extension is `.parquet`, the output will be in parquet format, compressed with `zstd`, and with field typings maximally compatible with common R and Python ecosystems. Otherwise, compressed files will be read/written if the filename ends with an identifier recognised by fsspec. TSV format will be used if the output filename contains `.tsv`, otherwise CSV will be used.

