biom.table.Table.to_hdf5¶
- Table.to_hdf5(h5grp, generated_by, compress=True)¶
Store CSC and CSR in place
The resulting structure of this group is below. A few basic definitions, N is the number of observations and M is the number of samples. Data are stored in both compressed sparse row [R10] (CSR, for observation oriented operations) and compressed sparse column [R11] (CSC, for sample oriented operations).
Parameters: h5grp : h5py.Group or h5py.File
The HDF5 entity in which to write the BIOM formatted data.
generated_by : str
A description of what generated the table
compress : bool, optional
Defaults to True means fields will be compressed with gzip, False means no compression
See also
Notes
The expected HDF5 group structure is below. An example of an HDF5 file in DDL can be found here [R12].
- ./id : str, an arbitrary ID
- ./type : str, the table type (e.g, OTU table)
- ./format-url : str, a URL that describes the format
- ./format-version : two element tuple of int32, major and minor
- ./generated-by : str, what generated this file
- ./creation-date : str, ISO format
- ./shape : two element tuple of int32, N by M
- ./nnz : int32 or int64, number of non zero elems
- ./observation : Group
- ./observation/ids : (N,) dataset of str or vlen str
- ./observation/matrix : Group
- ./observation/matrix/data : (nnz,) dataset of float64
- ./observation/matrix/indices : (nnz,) dataset of int32
- ./observation/matrix/indptr : (M+1,) dataset of int32
- ./observation/metadata : Group
- [./observation/metadata/foo] : Optional, (N,) dataset of any valid HDF5 type in index order with IDs.
- ./observation/group-metadata : Group
- [./observation/group-metadata/foo] : Optional, (?,) dataset of group metadata that relates IDs
- [./observation/group-metadata/foo.attrs[‘data_type’]] : attribute of the foo dataset that describes contained type (e.g., newick)
- ./sample : Group
- ./sample/ids : (M,) dataset of str or vlen str
- ./sample/matrix : Group
- ./sample/matrix/data : (nnz,) dataset of float64
- ./sample/matrix/indices : (nnz,) dataset of int32
- ./sample/matrix/indptr : (N+1,) dataset of int32
- ./sample/metadata : Group
- [./sample/metadata/foo] : Optional, (M,) dataset of any valid HDF5 type in index order with IDs.
- ./sample/group-metadata : Group
- [./sample/group-metadata/foo] : Optional, (?,) dataset of group metadata that relates IDs
- [./sample/group-metadata/foo.attrs[‘data_type’]] : attribute of the foo dataset that describes contained type (e.g., newick)
The ‘?’ character on the dataset size means that it can be of arbitrary length.
The expected structure for each of the metadata datasets is a list of atomic type objects (int, float, str, ...), where the index order of the list corresponds to the index order of the relevant axis IDs. Special metadata fields have been defined, and they are stored in a specific way. Currently, the available special metadata fields are:
- taxonomy: (N, ?) dataset of str or vlen str
- KEGG_Pathways: (N, ?) dataset of str or vlen str
- collapsed_ids: (N, ?) dataset of str or vlen str
References
[R10] (1, 2) http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.sparse.csr_matrix.html [R11] (1, 2) http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.sparse.csc_matrix.html [R12] (1, 2) http://biom-format.org/documentation/format_versions/biom-2.0.html Examples
>>> from biom.util import biom_open >>> from biom.table import Table >>> from numpy import array >>> t = Table(array([[1, 2], [3, 4]]), ['a', 'b'], ['x', 'y']) >>> with biom_open('foo.biom', 'w') as f: ... t.to_hdf5(f, "example")