Metadata-Version: 2.4
Name: cci-tag-scanner
Version: 2.6.2
Summary: This package provides a command line tool moles_esgf_tag to generate dataset tags for both MOLES and ESGF.
License: BSD 3
License-File: LICENSE
Author: Daniel Westwood
Author-email: daniel.westwood@stfc.ac.uk
Requires-Python: >=3.9,<4
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: ceda-directory-tree (>=1.2.0)
Requires-Dist: elasticsearch (>=8,<9)
Requires-Dist: jinja2 (>=3,<4)
Requires-Dist: netcdf4 (>=1.7.2,<2.0.0)
Requires-Dist: numpy (>=1.25,<2.0)
Requires-Dist: requests (>=2.32.3,<3.0.0)
Requires-Dist: six (>=1.16.0,<2.0.0)
Requires-Dist: tqdm (>=4.45.0,<5)
Requires-Dist: verboselogs (>=1.7)
Description-Content-Type: text/markdown

# CCI Tagger

![Static Badge](https://img.shields.io/badge/cci%20tagger%20workflow-8AD6F6)
![Current Git Release](https://img.shields.io/github/v/release/cedadev/cci-tag-scanner)
[![PyPI version](https://badge.fury.io/py/cci-tag-scanner.svg)](https://pypi.python.org/pypi/cci-tag-scanner/)

## CEDA Dependencies

![Static Badge](https://img.shields.io/badge/elasticsearch%20v8-3BBEB1)

## Overview

This package provides a command line tool moles_esgf_tag to generate dataset
tags for both MOLES and ESGF.

## Installation

Create a Python virtual environment:
**Must be Python 3**

```bash
python -m venv venv
source venv/bin/activate
```

Install the latest version of the library

```bash
git clone https://github.com/cedadev/cci-tag-scanner
cd cci-tag-scanner
pip install -e .
```

NOTE: As of 22nd Jan 2025 the `cci-tag-scanner` repository has been upgraded for use with Poetry version 2. This requires the use of an additional `requirements_fix.txt` patch while a solution for poetry dependencies in github is worked on. The above installation MUST be supplemented with:

```bash
pip install -r requirements_fix.txt
```
This is a temporary fix and will be removed when poetry is patched.

## Command Line Script

This script is to be used to check what the tagger outputs when fed with the JSON files. This can be used to build the JSON files and
check they are producing what you expect. This script also produces a moles_tags files to attach to this dataset.

### Usage

```
moles_esgf_tag [-h] (-d DATASET | -f FILE | -j JSON_FILE) [--file_count FILE_COUNT] [-v]
```

You can tag an individual dataset, or tag all the datasets listed in a file. By default a check sum will be produces for each file.

Arguments:

    -h, --help            show help message and exit

    -d DATASET, --dataset DATASET
                          the full path to the dataset that is to be tagged. This option is used to tag a single
                          dataset.

    -f FILE, --file FILE  the name of the file containing a list of datasets to process. This option is used for
                          tagging one or more datasets.

    -j, --json_file       Use the JSON file to provide a list of datasets and also provide the mappings
                          which are used by the tagging code. Useful to test datsets and specific mapping files.

    --file_count FILE_COUNT
                          how many .nc files to look at per dataset

    -v, --verbose         increase output verbosity. Add more vs to increase verbosity.


### Output

A number of files are produced as output:
*  __esgf_drs.json__ contains a list of DRS and associated files. Will also list all files which could not generate a DRS
*  __moles_tags.csv__ contains a list of dataset paths and vocabulary URLs
*  __error.log__ contains a log of errors. This is appended to on each run so if you want a clean start, you will need to delete the file.

### Examples

```bash
moles_esgf_tag -d /neodc/esacci/cloud/data/L3C/avhrr_noaa-16 -v
moles_esgf_tag -f datapath --file_count 2 -v
```

## Check tags

This code generates a directory with HTML pages which can be used to interrogate the opensearch elasticsearch indices to check that
the tags which you are expecting are being found. It also highlights files without DRSs

### Usage

```
cci_check_tags [--conf CONF] [--output OUTPUT]
```

Arguments:

        --conf          Specify the configuration file. Defaults to use %(default)s' 
                        DEFAULT: cci_tagger/conf/tag_check.conf
                        
        --output        Directory to place the output files.
                        DEFAULT: html 

### Output

* index.html            The main page listing all ECVs in the index
* ecv/<ecv_name>.html   ECV specific page. Lists all MOLES datasets in the index and displays details about them.

## Breaking Changes

### V2.0.0
- Removed default terms file
- Removed DRS version based on date


