Metadata-Version: 2.4
Name: sciencebeam-judge
Version: 0.0.24
Summary: ScienceBeam Judge
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: configparser>=5.2.0
Requires-Dist: editdistance>=0.6.0
Requires-Dist: future>=0.18.2
Requires-Dist: lxml>=4.9.0
Requires-Dist: natsort>=8.0.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: regex>=2022.0.0
Requires-Dist: sciencebeam-alignment>=0.0.8
Requires-Dist: sciencebeam-utils>=0.1.4
Requires-Dist: tqdm>=4.62.0
Requires-Dist: typing-extensions>=4.0.0
Dynamic: license-file

# ScienceBeam Judge

ScienceBeam Judge implements a JATS/TEI conversion [evaluation](https://github.com/elifesciences/sciencebeam-judge/blob/main/docs/evaluation.md).
It can be configured to handle other similar document types.

## Installation

```bash
pip install sciencebeam-judge
```

## CLI

### Evaluation to CSV

```bash
python -m sciencebeam_judge.evaluation_pipeline \
  --target-file-list=<path to target file list> \
  [--target-file-column=<column name>] \
  --prediction-file-list=<path to prediction file list> \
  [--prediction-file-column=<column name>] \
  --output-path=<output directory> \
  [--limit=<max file pair count>] \
  [--cloud] \
  [--num_workers=<number of workers>]
```

The default configuration files ([xml-mapping.conf](https://github.com/elifesciences/sciencebeam-judge/blob/main/sciencebeam_judge/resources/xml-mapping.conf),
[evaluation.conf](https://github.com/elifesciences/sciencebeam-judge/blob/main/sciencebeam_judge/resources/evaluation.conf),
[evaluation.yml](https://github.com/elifesciences/sciencebeam-judge/blob/main/sciencebeam_judge/resources/evaluation.yml))
are bundled with the package and used automatically.
They can be overridden with `--xml-mapping`, `--evaluation-config`, or `--evaluation-yaml-config`.

The output path will contain the following files:

- `results-*.csv`: The detailed evaluation of every field
- `summary-*.csv`: The overall evaluation
- `grobid-formatted-summary-*.txt`: The summary formatted à la GROBID

### Extract Fields

```bash
python -m sciencebeam_judge.extract_fields \
    --xml-file=<path to xml file> \
    --fields=<comma separated list of fields>
```

## Configuration

### XML Mapping

The [xml-mapping.conf](https://github.com/elifesciences/sciencebeam-judge/blob/main/sciencebeam_judge/resources/xml-mapping.conf) configures how fields
should be extracted from the XML. The default configuration contains mapping for JATS and TEI.

### Evaluation Configuration

The [evaluation.conf](https://github.com/elifesciences/sciencebeam-judge/blob/main/sciencebeam_judge/resources/evaluation.conf) allows further evaluation
details to be configured. For example the *scoring type* defines how a field should be evaluated
(e.g. `string` or `list`).

An additional [evaluation.yml](https://github.com/elifesciences/sciencebeam-judge/blob/main/sciencebeam_judge/resources/evaluation.yml) has the same
function but allows for more structured configuration.
