A database for atoms

ASE has its own database that can be used for storing and retrieving atoms and associated data in a compact and convenient way.

Note

This is work in progress. Use at your own risk!

There are currently three back-ends:

JSON:
Simple human-readable text file with a .json extension.
SQLite3:
Self-contained, server-less, zero-configuration database. Lives in a file with a .db extension.
PostgreSQL:
Server based database.

The JSON and SQLite3 back-ends work “out of the box”, whereas PostgreSQL requires a server.

There is a command-line tool called ase-db that can be used to query and manipulate databases and also a Python interface.

What’s in the database?

Every row in the database contains:

  • all the information stored in the Atoms object (positions, atomic numbers, ...)
  • calculator name and parameters (if a calculator is present)
  • already calculated properties such as energy and forces (if a calculator is present)
  • key-value pairs (for finding the calculation again)
  • an integer ID (unique for each database) starting with 1 and always increasing for each new row
  • a unique ID which is a 128 bit random number which should be globally unique (at least in the lifetime of our universe)
  • constraints (if present)
  • user-name
  • creation and modification time

ase-db

The ase-db command-line tool can be used to query databases and for manipulating key-value pairs. Try:

$ ase-db --help

Example: Show all rows of SQLite database abc.db:

$ ase-db abc.out
id|age|user |formula|calculator|energy| fmax|pbc|volume|charge| mass
 1| 0s|jensj|H2     |emt       | 1.419|9.803|FFF| 1.000| 0.000|2.016
 2| 0s|jensj|H2     |emt       | 1.071|0.000|FFF| 1.000| 0.000|2.016
 3| 0s|jensj|H      |emt       | 3.210|0.000|FFF| 1.000| 0.000|1.008
Rows: 3
Keys: relaxed

Show all details for a single row:

$ ase-db abc.out relaxed=1 -l
name      |unit  |value
id        |      |2
age       |      |0.188 seconds
formula   |      |H2
user      |      |jensj
calculator|      |emt
energy    |eV    |1.07054126233
fmax      |eV/Ang|9.2526347333e-05
charge    ||e|   |0.0
mass      |au    |2.01588
unique id |      |64dc328835526a1d579aed9378bb9490
volume    |Ang^3 |1.0

Unit cell in Ang:
axis|periodic|          x|          y|          z
   1|      no|      1.000|      0.000|      0.000
   2|      no|      0.000|      1.000|      0.000
   3|      no|      0.000|      0.000|      1.000

Key-value pairs:
relaxed|True

Forces in ev/Ang:
   0|H |     0.000|    -0.000|    -0.000
   1|H |    -0.000|     0.000|     0.000

Data: abc

Querying

Here are some example query strings:

Cu contains copper
H<3 less than 3 hydrogen atoms
Cu,H<3 contains copper and has less than 3 hydrogen atoms
v3 has ‘v3’ key
abc=bla-bla has key ‘abc’ with value ‘bla-bla’
v3,abc=bla-bla both of the above
calculator=nwchem calculations done with NWChem
2.2<bandgap<3.0 ‘bandgap’ key has value between 2.2 and 3.0
natoms>=10 10 or more atoms
formula=H2O Exactly two hydrogens and one oxygen
id=2345 specific id
age<1h not older than 1 hour
age>1y older than 1 year
pbc=TTT Periodic boundary conditions along all three axes
pbc=TTF Periodic boundary conditions along the first two axes (F=False, T=True)

These names are special:

id integer identifier
natoms number of atoms
pbc Periodic boundary conditions
formula formula
energy potential energy
charge total charge
magmom total magnetic moment
calculator name of calculator
user who did it
age age of calculation (use s, m, h, d, w, M and y for second, minute, hour, day, week, month and year respectively)

Integration with other parts of ASE

ASE’s ase.io.read() function can also read directly from databases:

>>> from ase.io import read
>>> a = read('abc.db@42')
>>> a = read('abc.db@id=42')  # same thing
>>> b = read('abc.db@v3,abc=H')

Also the ase-gui program can read from databases using the same syntax.

Browse database with your web-browser

You can use your web-browser to look at and query databases like this:

$ ase-db abc.db -w
$ firefox http://0.0.0.0:5000/

Click individual rows to see details. See the CMR web-page for an example of how this works.

Python Interface

First, we connect() to the database:

>>> from ase.db import connect
>>> con = connect('abc.db')

or

>>> import ase.db
>>> con = ase.db.connect('abc.db')

Let’s do a calculation for a hydrogen molecule and write some results to a database:

>>> from ase import Atoms
>>> from ase.calculators.emt import EMT
>>> h2 = Atoms('H2', [(0, 0, 0), (0, 0, 0.7)])
>>> h2.calc = EMT()
>>> h2.get_forces()
array([[ 0.        ,  0.        , -9.80290573],
       [ 0.        ,  0.        ,  9.80290573]])

Write a row to the database with a key-value pair ('relaxed', False):

>>> con.write(h2, relaxed=False)
1

The write() method returns an integer id.

Do one more calculation and write results:

>>> from ase.optimize import BFGS
>>> BFGS(h2).run(fmax=0.01)
BFGS:   0  12:49:25        1.419427       9.8029
BFGS:   1  12:49:25        1.070582       0.0853
BFGS:   2  12:49:25        1.070544       0.0236
BFGS:   3  12:49:25        1.070541       0.0001
>>> con.write(h2, relaxed=True)
2

Loop over selected rows using the select() method:

>>> for row in con.select(relaxed=True):
...     print row.forces[0, 2], row.relaxed
...
-9.8029057329 False
-9.2526347333e-05 True

The select() method will generate Row objects that one can loop over.

Write the energy of an isolated hydrogen atom to the database:

>>> h = Atoms('H')
>>> h.calc = EMT()
>>> h.get_potential_energy()
3.21
>>> con.write(h)
3

Select a single row with the get() method:

>>> row = con.get(relaxed=1, calculator='emt')
>>> for key in row:
...    print('{0:22}: {1}'.format(key, row[key]))
...
pbc                   : [False False False]
relaxed               : True
calculator_parameters : {}
user                  : jensj
mtime                 : 15.3439399027
calculator            : emt
ctime                 : 15.3439399027
positions             : [[ ... ]]
id                    : 2
cell                  : [[ 1.  0.  0.] [ 0.  1.  0.] [ 0.  0.  1.]]
forces                : [[ ... ]]
energy                : 1.07054126233
unique_id             : bce90ff3ea7661690b54f9794c1d7ef6
numbers               : [1 1]

Calculate the atomization energy and update() a row in the database:

>>> e2 = row.energy
>>> e1 = con.get(H=1).energy
>>> ae = 2 * e1 - e2
>>> print(ae)
5.34945873767
>>> id = con.get(relaxed=1).id
>>> con.update(id, atomization_energy=ae)
1

Delete a single row:

>>> del con[con.get(relaxed=0).id]

or use the delete() method to delete several rows.

Dictionary representation of rows

The first 9 keys (from “id” to “positions”) are always present — the rest may be there:

key description datatype shape
id Local database id int  
unique_id Globally unique hexadecimal id str  
ctime Creation time float  
mtime Modification time float  
user User name str  
numbers Atomic numbers int (N,)
pbc Periodic boundary condition flags bool (3,)
cell Unit cell float (3, 3)
positions Atomic positions float (N, 3)
initial_magmoms Initial atomic magnetic moments float (N,)
initial_charges Initial atomic charges float (N,)
masses Atomic masses float (N,)
tags Tags int (N,)
momenta Atomic momenta float (N, 3)
constraints Constraints list of dict  
energy Total energy float  
forces Atomic forces float (N, 3)
stress Stress tensor float (6,)
dipole Electrical dipole float (3,)
charges Atomic charges float (N,)
magmom Magnetic moment float  
magmoms Atomic magnetic moments float (N,)
calculator Calculator name str  
calculator_parameters Calculator parameters dict  

Extracting Atoms objects from the database

If you want an Atoms object insted of a dictionary, you should use the get_atoms() method:

>>> h2 = con.get_atoms(H=2)

or if you want the original EMT calculator attached:

>>> h2 = con.get_atoms(H=2, attach_calculator=True)

Add additional data

When you write a row to a database using the write() method, you can add key-value pairs where the values can be strings, floating point numbers, integers and booleans:

>>> con.write(atoms, functional='LDA', distance=7.2)

More complicated data can be written like this:

>>> con.write(atoms, ..., data={'parents': [7, 34, 14], 'stuff': ...})

and accessed like this:

>>> row = con.get(...)
>>> row.data.parents
[7, 34, 14]

Row objects

There are three ways to get at the columns of a row:

  1. as attributes (row.key)
  2. indexing (row['key'])
  3. the get() method (row.get('key'))

The first two will fail if there is no key column whereas the last will just return None in that case. Use row.get('key', ...) to use another default value.

class ase.db.row.AtomsRow(dct)[source]
get(key, default=None)[source]

Return value of key if present or default if not.

key_value_pairs[source]

Return dict of key-value pairs.

count_atoms()[source]

Count atoms.

Return dict mapping chemical symbol strings to number of atoms.

constraints[source]

List of constraints.

data[source]

Data dict.

natoms[source]

Number of atoms.

formula[source]

Chemical formula string.

symbols[source]

List of chemical symbols.

fmax[source]

Maximum atomic force.

constrained_forces[source]

Forces after applying constraints.

smax[source]

Maximum stress tensor component.

mass[source]

Total mass.

volume[source]

Volume of unit cell.

charge[source]

Total charge.

toatoms(attach_calculator=False, add_additional_information=False)[source]

Create Atoms object.

More details

ase.db.core.connect(name, type='extract_from_name', create_indices=True, use_lock_file=True, append=True)[source]

Create connection to database.

name: str
Filename or address of database.
type: str
One of ‘json’, ‘db’, ‘postgresql’, ‘mysql’ (JSON, SQLite, PostgreSQL, MySQL/MariaDB). Default is ‘extract_from_name’, which will ... guess the type from the name.
use_lock_file: bool
You can turn this off if you know what you are doing ...
append: bool
Use append=False to start a new database.
class ase.db.core.Database(filename=None, create_indices=True, use_lock_file=False)[source]

Base class for all databases.

write(*args, **kwargs)[source]

Write atoms to database with key-value pairs.

atoms: Atoms object
Write atomic numbers, positions, unit cell and boundary conditions. If a calculator is attached, write also already calculated properties such as the energy and forces.
key_value_pairs: dict
Dictionary of key-value pairs. Values must be strings or numbers.
data: dict
Extra stuff (not for searching).

Key-value pairs can also be set using keyword arguments:

connection.write(atoms, name='ABC', frequency=42.0)

Returns integer id of the new row.

reserve(*args, **kwargs)[source]

Write empty row if not already present.

Usage:

id = conn.reserve(key1=value1, key2=value2, ...)

Write an empty row with the given key-value pairs and return the integer id. If such a row already exists, don’t write anything and return None.

get_atoms(selection=None, attach_calculator=False, add_additional_information=False, **kwargs)[source]

Get Atoms object.

selection: int, str or list
See the select() method.
attach_calculator: bool
Attach calculator object to Atoms object (default value is False).
add_additional_information: bool
Put key-value pairs and data into Atoms.info dictionary.

In addition, one can use keyword arguments to select specific key-value pairs.

get(selection=None, **kwargs)[source]

Select a single row and return it as a dictionary.

selection: int, str or list
See the select() method.
fancy: bool
return fancy dictionary with keys as attributes (this is the default).
select(selection=None, filter=None, explain=False, verbosity=1, limit=None, offset=0, sort=None, **kwargs)[source]

Select rows.

Return AtomsRow iterator with results. Selection is done using key-value pairs and the special keys:

formula, age, user, calculator, natoms, energy, magmom and/or charge.
selection: int, str or list

Can be:

  • an integer id
  • a string like ‘key=value’, where ‘=’ can also be one of ‘<=’, ‘<’, ‘>’, ‘>=’ or ‘!=’.
  • a string like ‘key’
  • comma separated strings like ‘key1<value1,key2=value2,key’
  • list of strings or tuples: [(‘charge’, ‘=’, 1)].
filter: function
A function that takes as input a row and returns True or False.
explain: bool
Explain query plan.
verbosity: int
Possible values: 0, 1 or 2.
limit: int or None
Limit selection.
update(*args, **kwargs)[source]

Update row(s).

ids: int or list of int
ID’s of rows to update.
delete_keys: list of str
Keys to remove.

Use keyword argumnts to add new keys-value pairs.

Returns number of key-value pairs added and removed.

delete(ids)[source]

Delete rows.