Metadata-Version: 2.1
Name: bibxml2
Version: 1.1.0
Summary: A simple converter of MARCXML/PICAXML to CSV/TSV/parquet
Home-page: https://github.com/hsci-r/bibxml2
License: MIT
Keywords: MARCXML,PICA XML,bibliographic data,data conversion
Author: Eetu Mäkelä
Author-email: eetu.makela@helsinki.fi
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: click (>=8.2.1)
Requires-Dist: fsspec (>=2025.5.1,<2026.0.0)
Requires-Dist: hsciutil (>=0.1.2)
Requires-Dist: lxml (>=5.4.0)
Requires-Dist: pyarrow (>=20.0.0,<21.0.0)
Requires-Dist: s3fs (>=2025.5.1,<2026.0.0)
Requires-Dist: tqdm (>=4.67.1)
Project-URL: Repository, https://github.com/hsci-r/bibxml2
Description-Content-Type: text/markdown

# marcxml2csv

A simple converter of (possibly gzipped) MARCXML/PICAXML to (possibly gzipped) CSV/TSV.

The resulting CSV/TSV has been designed to be easy to use as a data table, but also to retain all ordering informaation in the original when such is needed. The format is as follows:
`record_number,field_number,subfield_number,field_code,subfield_code,value`

Here, `record_number` identifies the MARC/PICA+ record, while `field_number` and `subfield_number` can be used for more exact filtering / reconstructing the original field flow if needed.

For the MARC leader and control fields, `subfield_number` will be empty.

For MARC data fields, `ind1` and `ind2` values are reported as separate rows with the `subfield_code` being `ind1` or `ind2`, but only when non-empty. The also have an empty `subfield_number`.

## Installation

Install from pypi with e.g. `pipx install marcxml2csv`.

## Usage

```
Usage: marcxml2csv [OPTIONS] [INPUT]...

  Convert from MARCXML (gz) input files into (gzipped) CSV/TSV

Options:
  -o, --output TEXT  Output CSV/TSV (gz) file  [required]
  --help             Show this message and exit.
```

```
Usage: picaxml2csv [OPTIONS] [INPUT]...

  Convert from PICAXML (gz) input files into (gzipped) CSV/TSV

Options:
  -o, --output TEXT  Output CSV/TSV (gz) file  [required]
  --help             Show this message and exit.
```

Files will be read/written using gzip if the filename ends with `.gz`. TSV format will be used if the output filename contains `.tsv`, otherwise CSV will be used.

