Command line interface

The alexandria3k command can be invoked from the shell as follows.

alexandria3k: Relational interface to publication metadata

usage: alexandria3k [-h] [-a ATTACH_DATABASES [ATTACH_DATABASES ...]]
                    [-c COLUMNS [COLUMNS ...]] [-D DEBUG [DEBUG ...]]
                    [-d DATA_SOURCE [DATA_SOURCE ...]] [-E OUTPUT_ENCODING]
                    [-F FIELD_SEPARATOR] [-H] [-i [INDEX ...]]
                    [-L LIST_SCHEMA] [-l LINKED_RECORDS] [-o OUTPUT] [-P]
                    [-p POPULATE_DB_PATH] [-Q QUERY_FILE] [-q QUERY]
                    [-R ROW_SELECTION_FILE] [-r ROW_SELECTION] [-s SAMPLE]
                    [-x EXECUTE]

Named Arguments

-a, --attach-databases

Databases to attach for the row selection query

-c, --columns

Columns to populate using table.column or table.*

-D, --debug
Output debuggging information as specfied by the arguments.

files-read: Output counts of data files read; link: Record linking operations; log-sql: Output executed SQL statements; perf: Output performance timings; populated-counts: Dump counts of the populated database; populated-data: Dump the data of the populated database; populated-reports: Output query results from the populated database; progress: Report progress; stderr: Log to standard error; virtual-counts: Dump counts of the virtual database; virtual-data: Dump the data of the virtual database.

Default: []

-d, --data-source
Specify data set to be processed and its source.

The following data sets are supported: ASJC [<CSV-file> | <URL>] (defaults to internal table); Crossref <container-directory>; DOAJ [<CSV-file> | <URL>] (defaults to https://doaj.org/csv); funder-names [<CSV-file> | <URL>] (defaults to https://doi.crossref.org/funderNames?mode=list); journal-names [<CSV-file> | <URL>] (defaults to http://ftp.crossref.org/titlelist/titleFile.csv); ORCID <summaries.tar.gz-file> ROR <zip-file>;

-E, --output-encoding

Query output character encoding (use utf-8-sig for Excel)

Default: “utf-8”

-F, --field-separator

Character to use for separating query output fields

Default: “,”

-H, --header

Include a header in the query output

Default: False

-i, --index

SQL expressions that select the populated rows

-L, --list-schema
List the schema of the specified database. The following

names are supported: Crossref, ORCID, ROR, other, all

-l, --linked-records

Only add ORCID records that link to existing <persons> or <works>

-o, --output

Output file for query results

-P, --partition

Run the query over partitioned data slices. (Warning: arguments are run per partition.)

Default: False

-p, --populate-db-path

Populate the SQLite database in the specified path

-Q, --query-file

File containing query to run on the virtual tables

-q, --query

Query to run on the virtual tables

-R, --row-selection-file

File containing SQL expression that selects the populated rows

-r, --row-selection

SQL expression that selects the populated rows

-s, --sample

Python expression to sample the Crossref tables (e.g. random.random() < 0.0002)

Default: “True”

-x, --execute

Operation to execute on the data. This can be one of: link-aa-base-ror (link author affiliations to base-level research organizations); link-aa-top-ror (link author affiliations to top-level research organizations); link-works-asjcs (link works with Scopus All Science Journal Classification Codes — ASJCs).