Alexandria3k documentation¶
The alexandria3k package supplies a library and a command-line tool providing efficient relational query access to diverse publication open data sets. The most important one is the entire Crossref data set (157 GB compressed, 1 TB uncompressed). This contains publication metadata from about 134 million publications from all major international publishers with full citation data for 60 million of them. In addition, the Crossref data set can be linked with the ORCID summary data set (25 GB compressed, 435 GB uuncompressed), containing about 78 million author records, as well as data sets of funder bodies, journal names, open access journals, and research organizations.
The alexandria3k package installation contains all elements required to run it. It does not require the installation, configuration, and maintenance of a third party relational or graph database. It can therefore be used out-of-the-box for performing reproducible publication research on the desktop.
Pre-print and citation¶
Details about the rationale, design, implementation, and use of this software can be found in the following paper.
Diomidis Spinellis. Open Reproducible Systematic Publication Research. arXiv:2301.13312, January 2023. doi 10.48550/arXiv.2301.13312
Package name derivation¶
The alexandria3k package is named after the Library of Alexandria, indicating how publication data can be processed in the third millenium AD.
Contents¶
- Installation
- Data downloading
- Use overview
- Command line execution examples
- Obtain list of available commands
- Show DOI and title of all publications
- Save DOI and title of 2021 publications in a CSV file suitable for Excel
- Count Crossref publications by year and type
- Sampling
- Database of COVID research
- Publications graph
- Record selection from external database
- Populate the database with author records from ORCID
- Populate the database with journal names
- Populate the database with funder names
- Work with Scopus All Science Journal Classification Codes (ASJC)
- Populate the database with data regarding open access journals
- Populate the database with the names of research organizations
- Link author affiliations with research organization names
- Python API examples
- Create a Crossref object
- Iterate through the DOI and title of all publications
- Create a dictionary of which 2021 publications were funded by each body
- Database of COVID research
- Reference graph
- Record selection from external database
- Populate the database from ORCID
- Populate the database with journal names
- Populate the database with funder names
- Populate the database with data regarding open access journals
- Work with Scopus All Science Journal Classification Codes (ASJC)
- Populate the database with the names of research organizations
- Link author affiliations with research organization names
- Relational schemas
- Command line interface
- Python user API
- Python plugin API
- Python utility API
- Development processes
- Plugin development
- FAQ: Frequently asked questions