Use overview

After downloading the Crossref data you can use alexandria3k through its Python API or as a command-line tool.

These are the things you can do with alexandria3k.

  • Directly run ad hoc SQL queries on the Crossref data

  • Populate SQLite databases with Crossref, ORCID, DOAJ, and other data

    • Select a horizontal subset of Crossref records

      • Through an SQL expression

      • By sampling a subset of the 26 thousand containers in the data set

    • Select a horizontal subset of ORCID records by only loading those associated with already populated Crossref records

    • Select a vertical subset of Crossref or ORCID columns

      • Using the Table.Column or Table.* notation

Populating a database can take minutes (for a small, e.g. experimental, subset), a few hours (to traverse the whole Crossref data set and obtain a few thousands of records), or a couple of days (to produce a large set, e.g. by selecting some columns).

After your populate an SQLite database and create suitable indexes, SQL queries often run in seconds.

You can find many example of studies conducted with command-line invocations in the examples directory. Consider using the hello world (work authors by decade) example as a starting point.