API Reference¶
This is the list of classes and functions available in SciDB-py.
SciDB Array Class¶
- class scidbpy.SciDBArray(datashape, interface, name, persistent=False)¶
SciDBArray class
It is not recommended to instantiate this class directly; use a convenience routine from SciDBInterface.
Methods
alias([name]) Return an alias of the array, optionally with a new name approxdc([index, scidb_syntax]) Return the number of distinct values of the array or along an axis. att(a) Return the attribute name of the array. attribute(a) Return the attribute name of the array. avg([index, scidb_syntax]) Return the average of the array or the average along an axis. compress(mask[, axis]) Extract a subset of entries along a given axis, contains_nulls([attr]) Return True if the array contains null values. contents(**kwargs) Return a string representation of the array contents copy([new_name, persistent]) Make a copy of the array in the database count([index, scidb_syntax]) Return the count of the array or the count along an axis. cumprod([axis]) Return the cumulative product over the array. cumsum([axis]) Return the cumulative sum over the array. cumulate(expression[, dimension]) Compute running operations along data (e.g., cumulative sums) dimension(d) Return the dimension name of the array eval([out, store]) If the array is backed by an unevaluated query, from_query(interface, query) Build a lazily-evaulated SciDB array from a query string groupby(by) Build a groupby object from this array head([n]) Extract and download the first few elements in the array issparse() Check whether array is sparse. max([index, scidb_syntax]) Return the maximum of the array or the maximum along an axis. mean([index, scidb_syntax]) Return the average of the array or the average along an axis. min([index, scidb_syntax]) Return the minimum of the array or the minimum along an axis. nonempty() Return the number of nonempty elements in the array. nonnull([attr]) Return the number of non-empty and non-null values. reap([ignore]) Delete this object from the database if it isn’t persistent. regrid(size[, aggregate]) Regrid the array using the specified aggregate rename(new_name[, persistent]) Rename the array in the database, optionally making the new array persistent. reshape(shape, **kwargs) Reshape data into a new array std([index, scidb_syntax]) Return the standard deviation of the array or along an axis. stdev([index, scidb_syntax]) Return the standard deviation of the array or along an axis. substitute(value) Reshape data into a new array, substituting a default for any nulls. sum([index, scidb_syntax]) Return the sum of the array or the sum along an axis. tail([n]) toarray([transfer_bytes]) Transfer data from database and store in a numpy array. todataframe([transfer_bytes]) Transfer array from database and store in a local Pandas dataframe tosparse([sparse_fmt, transfer_bytes]) Transfer array from database and store in a local sparse array. transpose(*axes) Permute the dimensions of an array. var([index, scidb_syntax]) Return the variance of the array or the variance along an axis. - T¶
Permute the dimensions of an array.
Parameters: axes : None, tuple of ints, or n ints
- None or no argument: reverses the order of the axes.
- tuple of ints: i in the j-th place in the tuple means a‘s i-th axis becomes a.transpose()‘s j-th axis.
- n ints: same as an n-tuple of the same ints (this form is intended simply as a “convenience” alternative to the tuple form)
Returns: out : ndarray
Copy of a, with axes suitably permuted.
- afl¶
An alias to the AFL namespace
- alias(name=None)¶
Return an alias of the array, optionally with a new name
- approxdc(index=None, scidb_syntax=False)¶
Return the number of distinct values of the array or along an axis.
The distinct count is an estimate only.
Parameters: index : int, optional
Axis along which to operate. By default, flattened input is used.
scidb_syntax : bool, optional (default=False)
If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)
Returns: A SciDB array :
- att(a)¶
Return the attribute name of the array.
Parameters: a : int
Index of the attribute to lookup
- attribute(a)¶
Return the attribute name of the array.
Parameters: a : int
Index of the attribute to lookup
- avg(index=None, scidb_syntax=False)¶
Return the average of the array or the average along an axis.
Parameters: index : int, optional
Axis along which to operate. By default, flattened input is used.
scidb_syntax : bool, optional (default=False)
If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)
Returns: A SciDB array :
- compress(mask, axis=0)¶
Extract a subset of entries along a given axis, where an input mask array is non-null
Parameters: array : SciDBArray
The array to filter
mask : SciDBArray
A 1-dimensional SciDBArray, whose non-null values indicate the entries to retain
axis : int
The axis of array along which to apply the mask. The shape of array along this axis must be the length of mask
- contains_nulls(attr=None)¶
Return True if the array contains null values.
Parameters: attr : None, int, or array_like
the attribute index/indices to check. If None, then check all.
Returns: contains_nulls : boolean
- contents(**kwargs)¶
Return a string representation of the array contents
- copy(new_name=None, persistent=False)¶
Make a copy of the array in the database
Parameters: new_name : string (optional)
if specifiedmust be a valid array name which does not already exist in the database.
persistent : boolean (optional)
specify whether the new array is persistent (default=False)
Returns: copy : SciDBArray
return a copy of the original array
- count(index=None, scidb_syntax=False)¶
Return the count of the array or the count along an axis.
The count is equal to the number of nonnull elements.
Parameters: index : int, optional
Axis along which to operate. By default, flattened input is used.
scidb_syntax : bool, optional (default=False)
If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)
Returns: A SciDB array :
- cumprod(axis=None)¶
Return the cumulative product over the array.
Parameters: axis : int, optional
The axis to multiply over. The default multiplies over the flattened array
Returns: prods : SciDBArray
A new array, with the same shape (but flattened if axis=None)
- cumsum(axis=None)¶
Return the cumulative sum over the array.
Parameters: axis : int, optional
The axis to sum over. The default sums over the flattened array
Returns: sums : SciDBArray
A new array, with the same shape (but flattened if axis=None)
- cumulate(expression, dimension=0)¶
Compute running operations along data (e.g., cumulative sums)
Parameters: expression: str :
A valid SciDB expression
dimension : int or str (optional, default=0)
Which dimension to accumulate over
Returns: arr : SciDBArray
A new array of the same shape.
Examples
>>> x = sdb.arange(12).reshape((3, 4)) >>> x.cumulate('sum(f0)').toarray() array([[ 0, 1, 2, 3], [ 4, 6, 8, 10], [12, 15, 18, 21]])
- dimension(d)¶
Return the dimension name of the array
Parameters: d : int
The index of the dimension to lookup
- eval(out=None, store=True, **kwargs)¶
If the array is backed by an unevaluated query, evaluate the query and store the result in the database
This changes array.name from a query string to a stored array name. Calling eval() on an array that is already backed by a stored array does nothing.
Parameters: out : SciDBArray (optional)
An optional pre-existing array to store the evaluation into.
- classmethod from_query(interface, query)¶
Build a lazily-evaulated SciDB array from a query string
Parameters: interface : SciDBInterface
The database connection to use
query : str
The query string to wrap
Returns: array : SciDBArray
- groupby(by)¶
Build a groupby object from this array
Parameters: by : string or list of strings
Names of attributes and dimensions to group by
Returns: groups : scidbpy.aggregation.GroupBy instance
An object that can be used, e.g., to perform aggregations over each group. See scidbpy.aggregation.GroupBy documentation for more information.
- head(n=5)¶
Extract and download the first few elements in the array
Parameters: n : int (optional, default=5)
The number of elements to retrieve
Returns: head : SciDBArray
The first N elements in the array, downloaded as a Pandas dataframe (if pandas is installed) or a Numpy array
- issparse()¶
Check whether array is sparse.
- max(index=None, scidb_syntax=False)¶
Return the maximum of the array or the maximum along an axis.
Parameters: index : int, optional
Axis along which to operate. By default, flattened input is used.
scidb_syntax : bool, optional (default=False)
If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)
Returns: A SciDB array :
- mean(index=None, scidb_syntax=False)¶
Return the average of the array or the average along an axis.
Parameters: index : int, optional
Axis along which to operate. By default, flattened input is used.
scidb_syntax : bool, optional (default=False)
If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)
Returns: A SciDB array :
Notes
Identical to SciDBArray.avg()
- min(index=None, scidb_syntax=False)¶
Return the minimum of the array or the minimum along an axis.
Parameters: index : int, optional
Axis along which to operate. By default, flattened input is used.
scidb_syntax : bool, optional (default=False)
If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)
Returns: A SciDB array :
- nonempty()¶
Return the number of nonempty elements in the array.
Nonempty refers to the sparsity of an array, and thus includes in the count elements with values which are set to NULL.
See also
- nonnull(attr=0)¶
Return the number of non-empty and non-null values.
This query must be done for each attribute: the default is the first attribute.
Parameters: attr : None, int or array_like
the attribute or attributes to query. If None, then query all attributes.
Returns: nonnull : array_like
the nonnull count for each attribute. The returned value is the same shape as the input attr.
See also
- persistent¶
Controls whether the array is deleted when the database is reaped
- reap(ignore=False)¶
Delete this object from the database if it isn’t persistent.
Parameters: ignore : bool (default False)
If False and the array is persistent, then reap raises an error If True and the array is persistent, reap does nothing
Raises: SciDBForbidden if ``persistent=True`` and ``ignore=False` :
- regrid(size, aggregate=u'avg')¶
Regrid the array using the specified aggregate
Parameters: size : int or tuple of ints
Specify the size of the regridding along each dimension. If a single integer, then use the same regridding along each dimension.
aggregate : string
specify the aggregation function to use when creating the new grid. Default is ‘avg’. Possible values are: [‘avg’, ‘sum’, ‘min’, ‘max’, ‘count’, ‘stdev’, ‘var’, ‘approxdc’]
Returns: A : scidbarray
The re-gridded version of the array. The size of dimension i is ceil(self.shape[i] / size[i])
- rename(new_name, persistent=False)¶
Rename the array in the database, optionally making the new array persistent.
Parameters: new_name : string
must be a valid array name which does not already exist in the database.
persistent : boolean (optional)
specify whether the new array is persistent (default=False)
Returns: self : SciDBArray
return a pointer to self
- reshape(shape, **kwargs)¶
Reshape data into a new array
Parameters: shape : tuple or int
The shape of the new array. Must be compatible with the current shape
**kwargs : :
additional keyword arguments will be passed to SciDBDatashape
Returns: arr : SciDBArray
new array of the specified shape
- schema¶
Return the array schema
- std(index=None, scidb_syntax=False)¶
Return the standard deviation of the array or along an axis.
Parameters: index : int, optional
Axis along which to operate. By default, flattened input is used.
scidb_syntax : bool, optional (default=False)
If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)
Returns: A SciDB array :
Notes
Identical to SciDBArray.stdev()
- stdev(index=None, scidb_syntax=False)¶
Return the standard deviation of the array or along an axis.
Parameters: index : int, optional
Axis along which to operate. By default, flattened input is used.
scidb_syntax : bool, optional (default=False)
If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)
Returns: A SciDB array :
- substitute(value)¶
Reshape data into a new array, substituting a default for any nulls.
Parameters: value : value to replace nulls (required)
Returns: arr : SciDBArray
new non-nullable array
Notes
This is currently limited to single-attribute arrays. Use the raw AFL substutute operator for multi-attribute arrays
- sum(index=None, scidb_syntax=False)¶
Return the sum of the array or the sum along an axis.
Parameters: index : int, optional
Axis along which to operate. By default, flattened input is used.
scidb_syntax : bool, optional (default=False)
If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)
Returns: A SciDB array :
- toarray(transfer_bytes=False)¶
Transfer data from database and store in a numpy array.
Parameters: transfer_bytes : DEPRECATED
Unused
Returns: arr : np.ndarray
The dense array containing the data.
Notes
If the array is backed by a query, the query is evaluated and stored in the database
- todataframe(transfer_bytes=True)¶
Transfer array from database and store in a local Pandas dataframe
For multidimensional arrays, the dimension values are added as additional columns in the dataframe.
Parameters: transfer_bytes : boolean
if True (default), then transfer data as bytes rather than as ASCII.
Returns: arr : pd.DataFrame
The dataframe object containing the data in the array.
- tosparse(sparse_fmt=u'recarray', transfer_bytes=True)¶
Transfer array from database and store in a local sparse array.
Parameters: transfer_bytes : boolean
if True (default), then transfer data as bytes rather than as ASCII. This is more accurate, but requires two passes over the data (one for indices, one for values).
sparse_format : string or None
Specify the sparse format to use. Available formats are: - ‘recarray’ : a record array containing the indices and
values for each data point. This is valid for arrays of any dimension and with any number of attributes.
- [‘coo’|’csc’|’csr’|’dok’|’lil’] : a scipy sparse matrix. These are valid only for 2-dimensional arrays with a single attribute.
Returns: arr : ndarray or sparse matrix
The sparse representation of the data
- transpose(*axes)¶
Permute the dimensions of an array.
Parameters: axes : None, tuple of ints, or n ints
- None or no argument: reverses the order of the axes.
- tuple of ints: i in the j-th place in the tuple means a‘s i-th axis becomes a.transpose()‘s j-th axis.
- n ints: same as an n-tuple of the same ints (this form is intended simply as a “convenience” alternative to the tuple form)
Returns: out : ndarray
Copy of a, with axes suitably permuted.
- var(index=None, scidb_syntax=False)¶
Return the variance of the array or the variance along an axis.
Parameters: index : int, optional
Axis along which to operate. By default, flattened input is used.
scidb_syntax : bool, optional (default=False)
If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)
Returns: A SciDB array :
SciDB Interface¶
- scidbpy.interface.connect(url=None)¶
Connect to a SciDB instance
Parameters: url : str (optional)
Connection URL. If not provided, will fall back to the SCIDB_URL environment variable (if present), or http://127.0.0.1:8080
Base Class¶
- class scidbpy.interface.SciDBInterface¶
Methods
acos(A) Element-wise trigonometric inverse cosine approxdc(A[, index, scidb_syntax]) Array or axis unique element estimate. arange([start,] stop[, step,][, dtype]) Return evenly spaced values within a given interval. asin(A) Element-wise trigonometric inverse sine atan(A) Element-wise trigonometric inverse tangent avg(A[, index, scidb_syntax]) Array or axis average. ceil(A) Element-wise ceiling function cos(A) Element-wise trigonometric cosine count(A[, index, scidb_syntax]) Array or axis count. cross_join(A, B, *dims) Perform a cross-join on arrays A and B. dot(A, B) Compute the matrix product of A and B exp(A) Element-wise natural exponent floor(A) Element-wise floor function from_array(A[, instance_id]) Initialize a scidb array from a numpy array from_dataframe(A[, instance_id]) Initialize a scidb array from a pandas dataframe from_sparse(A[, instance_id]) Initialize a scidb array from a sparse array identity(n[, dtype, sparse]) Return a 2-dimensional square identity matrix of size n isnan(A) Element-wise nan test function join(*args) Perform a series of array joins on the arguments and return the result. linspace(start, stop[, num, endpoint, retstep]) Return evenly spaced numbers over a specified interval. list_arrays([parsed, n]) List the arrays currently in the database log(A) Element-wise natural logarithm log10(A) Element-wise base-10 logarithm max(A[, index, scidb_syntax]) Array or axis maximum. mean(A[, index, scidb_syntax]) Array or axis mean. merge(A, B) Merge two arrays min(A[, index, scidb_syntax]) Array or axis minimum. new_array([shape, dtype, persistent]) Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query. ones(shape[, dtype]) Return an array of ones query(query, *args, **kwargs) Perform a query on the database. randint(shape[, dtype, lower, upper, persistent]) Return an array of random integers between lower and upper random(shape[, dtype, lower, upper, persistent]) Return an array of random floats between lower and upper reap() Reap all arrays created via new_array sin(A) Element-wise trigonometric sine sqrt(A) Element-wise square root std(A[, index, scidb_syntax]) Array or axis standard deviation. stdev(A[, index, scidb_syntax]) Array or axis standard deviation. substitute(A, value) Replace null values in an array sum(A[, index, scidb_syntax]) Array or axis sum. svd(A[, return_U, return_S, return_VT]) Compute the Singular Value Decomposition of the array A: tan(A) Element-wise trigonometric tangent toarray(A[, transfer_bytes]) Convert a SciDB array to a numpy array todataframe(A[, transfer_bytes]) Convert a SciDB array to a pandas dataframe tosparse(A[, sparse_fmt, transfer_bytes]) Convert a SciDB array to a sparse representation var(A[, index, scidb_syntax]) Array or axis variance. wrap_array(scidbname[, persistent]) Create a new SciDBArray object that references an existing SciDB zeros(shape[, dtype]) Return an array of zeros - acos(A)¶
Element-wise trigonometric inverse cosine
- approxdc(A, index=None, scidb_syntax=False)¶
Array or axis unique element estimate.
see SciDBArray.approxdc()
- arange([start, ]stop, [step, ]dtype=None, **kwargs)¶
Return evenly spaced values within a given interval.
Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the behavior is equivalent to the Python range function, but returns an ndarray rather than a list.
When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.
Parameters: start : number, optional
Start of interval. The interval includes this value. The default start value is 0.
stop : number
End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.
step : number, optional
Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified, start must also be given.
dtype : dtype
The type of the output array. If dtype is not given, it is inferred from the type of the input arguments.
**kwargs : :
Additional arguments are passed to SciDBDatashape when creating the output array.
Returns: arange : SciDBArray
Array of evenly spaced values.
For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.
- asin(A)¶
Element-wise trigonometric inverse sine
- atan(A)¶
Element-wise trigonometric inverse tangent
- avg(A, index=None, scidb_syntax=False)¶
Array or axis average.
see SciDBArray.avg()
- ceil(A)¶
Element-wise ceiling function
- cos(A)¶
Element-wise trigonometric cosine
- count(A, index=None, scidb_syntax=False)¶
Array or axis count.
see SciDBArray.count()
- cross_join(A, B, *dims)¶
Perform a cross-join on arrays A and B.
Parameters: A, B : SciDBArray
*dims : tuples
The remaining arguments are tuples of dimension indices which should be joined.
- dot(A, B)¶
Compute the matrix product of A and B
Parameters: A : SciDBArray
A must be a two-dimensional matrix of shape (n, p)
B : SciDBArray
B must be a two-dimensional matrix of shape (p, m)
Returns: C : SciDBArray
The wrapper of the SciDB Array, of shape (n, m), consisting of the matrix product of A and B
- exp(A)¶
Element-wise natural exponent
- floor(A)¶
Element-wise floor function
- from_array(A, instance_id=0, **kwargs)¶
Initialize a scidb array from a numpy array
Parameters: A : array_like (numpy array or sparse array)
input array from which the scidb array will be created
instance_id : integer
the instance ID used in loading (default=0; see SciDB documentation)
**kwargs : :
Additional keyword arguments are passed to new_array()
Returns: arr : SciDBArray
SciDB Array object built from the input array
- from_dataframe(A, instance_id=0, **kwargs)¶
Initialize a scidb array from a pandas dataframe
Parameters: A : pandas dataframe
data from which the scidb array will be created.
instance_id : integer
the instance ID used in loading (default=0; see SciDB documentation)
**kwargs : :
Additional keyword arguments are passed to new_array()
Returns: arr : SciDBArray
SciDB Array object built from the input array
- from_sparse(A, instance_id=0, **kwargs)¶
Initialize a scidb array from a sparse array
Parameters: A : sparse array
sparse input array from which the scidb array will be created. Note that this array will internally be converted to COO format.
instance_id : integer
the instance ID used in loading (default=0; see SciDB documentation)
**kwargs : :
Additional keyword arguments are passed to new_array()
Returns: arr : SciDBArray
SciDB Array object built from the input array
- identity(n, dtype=u'double', sparse=False, **kwargs)¶
Return a 2-dimensional square identity matrix of size n
Parameters: n : integer
the number of rows and columns in the matrix
dtype : string or list
The data type of the array
sparse : boolean
specify whether to create a sparse array (default=False)
**kwargs : :
Additional keyword arguments are passed to SciDBDataShape.
Returns: arr : SciDBArray
A SciDBArray containint an [n x n] identity matrix
- isnan(A)¶
Element-wise nan test function
- join(*args)¶
Perform a series of array joins on the arguments and return the result.
- linspace(start, stop, num=50, endpoint=True, retstep=False, **kwargs)¶
Return evenly spaced numbers over a specified interval.
Returns num evenly spaced samples, calculated over the interval [start, stop ].
The endpoint of the interval can optionally be excluded.
Parameters: start : scalar
The starting value of the sequence.
stop : scalar
The end value of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced samples, so that stop is excluded. Note that the step size changes when endpoint is False.
num : int, optional
Number of samples to generate. Default is 50.
endpoint : bool, optional
If True, stop is the last sample. Otherwise, it is not included. Default is True.
retstep : bool, optional
If True, return (samples, step), where step is the spacing between samples.
**kwargs : :
additional keyword arguments are passed to SciDBDataShape
Returns: samples : SciDBArray
There are num equally spaced samples in the closed interval [start, stop] or the half-open interval [start, stop) (depending on whether endpoint is True or False).
step : float (only if retstep is True)
Size of spacing between samples.
- list_arrays(parsed=True, n=0)¶
List the arrays currently in the database
Parameters: parsed : boolean
If True (default), then parse the results into a dictionary of array names as keys, schema as values
n : integer
the maximum number of arrays to list. If n=0, then list all
Returns: array_list : string or dictionary
The list of arrays. If parsed=True, then the result is returned as a dictionary.
- log(A)¶
Element-wise natural logarithm
- log10(A)¶
Element-wise base-10 logarithm
- max(A, index=None, scidb_syntax=False)¶
Array or axis maximum.
see SciDBArray.max()
- mean(A, index=None, scidb_syntax=False)¶
Array or axis mean.
see SciDBArray.mean()
- merge(A, B)¶
Merge two arrays
- min(A, index=None, scidb_syntax=False)¶
Array or axis minimum.
see SciDBArray.min()
- new_array(shape=None, dtype=u'double', persistent=False, **kwargs)¶
Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query.
Parameters: shape : int or tuple (optional)
The shape of the array to create. If not specified, no array will be created and a name will simply be reserved for later use. WARNING: if shape=None and persistent=False, an error will result when the array goes out of scope, unless the name is used to create an array on the server.
dtype : string (optional)
the datatype of the array. This is only referenced if shape is specified. Default is ‘double’.
persistent : boolean (optional)
whether the created array should be persistent, i.e. survive in SciDB past when the object wrapper goes out of scope. Default is False.
**kwargs : (optional)
If shape is specified, additional keyword arguments are passed to SciDBDataShape. Otherwise, these will not be referenced.
Returns :
——- :
arr : SciDBArray
wrapper of the new SciDB array instance.
- ones(shape, dtype=u'double', **kwargs)¶
Return an array of ones
Parameters: shape : tuple or int
The shape of the array
dtype : string or list
The data type of the array
**kwargs : :
Additional keyword arguments are passed to SciDBDataShape.
Returns: arr: SciDBArray :
A SciDBArray consisting of all ones.
- query(query, *args, **kwargs)¶
Perform a query on the database.
This wraps a query constructor which allows the creation of sophisticated SciDB queries which act on arrays wrapped by SciDBArray objects. See Notes below for details.
Parameters: query : string
The query string, with curly-braces to indicate insertions
*args, **kwargs : :
Values to be inserted (see below).
- randint(shape, dtype=u'uint32', lower=0, upper=2147483647, persistent=False, **kwargs)¶
Return an array of random integers between lower and upper
Parameters: shape : tuple or int
The shape of the array
dtype : string or list
The data type of the array
lower : float
The lower bound of the random sample (default=0)
upper : float
The upper bound of the random sample (default=2147483647)
persistent : bool
Whether the array is persistent (default=False)
**kwargs : :
Additional keyword arguments are passed to SciDBDataShape.
Returns: arr: SciDBArray :
A SciDBArray consisting of random integers, uniformly distributed between lower and upper.
- random(shape, dtype=u'double', lower=0, upper=1, persistent=False, **kwargs)¶
Return an array of random floats between lower and upper
Parameters: shape : tuple or int
The shape of the array
dtype : string or list
The data type of the array
lower : float
The lower bound of the random sample (default=0)
upper : float
The upper bound of the random sample (default=1)
persistent : bool
Whether the new array is persistent (default=False)
**kwargs : :
Additional keyword arguments are passed to SciDBDataShape.
Returns: arr: SciDBArray :
A SciDBArray consisting of random floating point numbers, uniformly distributed between lower and upper.
- reap()¶
Reap all arrays created via new_array
- sin(A)¶
Element-wise trigonometric sine
- sqrt(A)¶
Element-wise square root
- std(A, index=None, scidb_syntax=False)¶
Array or axis standard deviation.
see SciDBArray.std()
- stdev(A, index=None, scidb_syntax=False)¶
Array or axis standard deviation.
see SciDBArray.stdev()
- substitute(A, value)¶
Replace null values in an array
See SciDBArray.substitute()
- sum(A, index=None, scidb_syntax=False)¶
Array or axis sum.
see SciDBArray.sum()
- svd(A, return_U=True, return_S=True, return_VT=True)¶
Compute the Singular Value Decomposition of the array A:
A = U.S.V^T
Parameters: A : SciDBArray
The array for which the SVD will be computed. It should be a 2-dimensional array with a single value per cell. Currently, the svd routine requires non-overlapping chunks of size 32.
return_U, return_S, return_VT : boolean
if any is True, then return the associated array. All are True by default
Returns: [U], [S], [VT] : SciDBArrays
Arrays storing the singular values and vectors of A.
- tan(A)¶
Element-wise trigonometric tangent
- toarray(A, transfer_bytes=True)¶
Convert a SciDB array to a numpy array
- todataframe(A, transfer_bytes=True)¶
Convert a SciDB array to a pandas dataframe
- tosparse(A, sparse_fmt=u'recarray', transfer_bytes=True)¶
Convert a SciDB array to a sparse representation
- var(A, index=None, scidb_syntax=False)¶
Array or axis variance.
see SciDBArray.var()
- wrap_array(scidbname, persistent=True)¶
Create a new SciDBArray object that references an existing SciDB array
Parameters: scidbname : string
Wrap an existing scidb array referred to by scidbname. The SciDB array object persistent value will be set to True, and the object shape, datashape and data type values will be determined by the SciDB array.
persistent : boolean
If True (default) then array will not be deleted when this variable goes out of scope. Warning: if persistent is set to False, data could be lost!
- zeros(shape, dtype=u'double', **kwargs)¶
Return an array of zeros
Parameters: shape : tuple or int
The shape of the array
dtype : string or list
The data type of the array
**kwargs : :
Additional keyword arguments are passed to SciDBDataShape.
Returns: arr: SciDBArray :
A SciDBArray consisting of all zeros.
Shim Interface¶
- class scidbpy.interface.SciDBShimInterface(hostname)¶
HTTP interface to SciDB via shim [1]_
Parameters: hostname : string
A URL pointing to a running shim/SciDB session
[1] https://github.com/Paradigm4/shim :
Methods
acos(A) Element-wise trigonometric inverse cosine approxdc(A[, index, scidb_syntax]) Array or axis unique element estimate. arange([start,] stop[, step,][, dtype]) Return evenly spaced values within a given interval. asin(A) Element-wise trigonometric inverse sine atan(A) Element-wise trigonometric inverse tangent avg(A[, index, scidb_syntax]) Array or axis average. ceil(A) Element-wise ceiling function cos(A) Element-wise trigonometric cosine count(A[, index, scidb_syntax]) Array or axis count. cross_join(A, B, *dims) Perform a cross-join on arrays A and B. dot(A, B) Compute the matrix product of A and B exp(A) Element-wise natural exponent floor(A) Element-wise floor function from_array(A[, instance_id]) Initialize a scidb array from a numpy array from_dataframe(A[, instance_id]) Initialize a scidb array from a pandas dataframe from_sparse(A[, instance_id]) Initialize a scidb array from a sparse array identity(n[, dtype, sparse]) Return a 2-dimensional square identity matrix of size n isnan(A) Element-wise nan test function join(*args) Perform a series of array joins on the arguments and return the result. linspace(start, stop[, num, endpoint, retstep]) Return evenly spaced numbers over a specified interval. list_arrays([parsed, n]) List the arrays currently in the database log(A) Element-wise natural logarithm log10(A) Element-wise base-10 logarithm max(A[, index, scidb_syntax]) Array or axis maximum. mean(A[, index, scidb_syntax]) Array or axis mean. merge(A, B) Merge two arrays min(A[, index, scidb_syntax]) Array or axis minimum. new_array([shape, dtype, persistent]) Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query. ones(shape[, dtype]) Return an array of ones query(query, *args, **kwargs) Perform a query on the database. randint(shape[, dtype, lower, upper, persistent]) Return an array of random integers between lower and upper random(shape[, dtype, lower, upper, persistent]) Return an array of random floats between lower and upper reap() Reap all arrays created via new_array sin(A) Element-wise trigonometric sine sqrt(A) Element-wise square root std(A[, index, scidb_syntax]) Array or axis standard deviation. stdev(A[, index, scidb_syntax]) Array or axis standard deviation. substitute(A, value) Replace null values in an array sum(A[, index, scidb_syntax]) Array or axis sum. svd(A[, return_U, return_S, return_VT]) Compute the Singular Value Decomposition of the array A: tan(A) Element-wise trigonometric tangent toarray(A[, transfer_bytes]) Convert a SciDB array to a numpy array todataframe(A[, transfer_bytes]) Convert a SciDB array to a pandas dataframe tosparse(A[, sparse_fmt, transfer_bytes]) Convert a SciDB array to a sparse representation var(A[, index, scidb_syntax]) Array or axis variance. wrap_array(scidbname[, persistent]) Create a new SciDBArray object that references an existing SciDB zeros(shape[, dtype]) Return an array of zeros - acos(A)¶
Element-wise trigonometric inverse cosine
- approxdc(A, index=None, scidb_syntax=False)¶
Array or axis unique element estimate.
see SciDBArray.approxdc()
- arange([start, ]stop, [step, ]dtype=None, **kwargs)¶
Return evenly spaced values within a given interval.
Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the behavior is equivalent to the Python range function, but returns an ndarray rather than a list.
When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.
Parameters: start : number, optional
Start of interval. The interval includes this value. The default start value is 0.
stop : number
End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.
step : number, optional
Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified, start must also be given.
dtype : dtype
The type of the output array. If dtype is not given, it is inferred from the type of the input arguments.
**kwargs : :
Additional arguments are passed to SciDBDatashape when creating the output array.
Returns: arange : SciDBArray
Array of evenly spaced values.
For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.
- asin(A)¶
Element-wise trigonometric inverse sine
- atan(A)¶
Element-wise trigonometric inverse tangent
- avg(A, index=None, scidb_syntax=False)¶
Array or axis average.
see SciDBArray.avg()
- ceil(A)¶
Element-wise ceiling function
- cos(A)¶
Element-wise trigonometric cosine
- count(A, index=None, scidb_syntax=False)¶
Array or axis count.
see SciDBArray.count()
- cross_join(A, B, *dims)¶
Perform a cross-join on arrays A and B.
Parameters: A, B : SciDBArray
*dims : tuples
The remaining arguments are tuples of dimension indices which should be joined.
- dot(A, B)¶
Compute the matrix product of A and B
Parameters: A : SciDBArray
A must be a two-dimensional matrix of shape (n, p)
B : SciDBArray
B must be a two-dimensional matrix of shape (p, m)
Returns: C : SciDBArray
The wrapper of the SciDB Array, of shape (n, m), consisting of the matrix product of A and B
- exp(A)¶
Element-wise natural exponent
- floor(A)¶
Element-wise floor function
- from_array(A, instance_id=0, **kwargs)¶
Initialize a scidb array from a numpy array
Parameters: A : array_like (numpy array or sparse array)
input array from which the scidb array will be created
instance_id : integer
the instance ID used in loading (default=0; see SciDB documentation)
**kwargs : :
Additional keyword arguments are passed to new_array()
Returns: arr : SciDBArray
SciDB Array object built from the input array
- from_dataframe(A, instance_id=0, **kwargs)¶
Initialize a scidb array from a pandas dataframe
Parameters: A : pandas dataframe
data from which the scidb array will be created.
instance_id : integer
the instance ID used in loading (default=0; see SciDB documentation)
**kwargs : :
Additional keyword arguments are passed to new_array()
Returns: arr : SciDBArray
SciDB Array object built from the input array
- from_sparse(A, instance_id=0, **kwargs)¶
Initialize a scidb array from a sparse array
Parameters: A : sparse array
sparse input array from which the scidb array will be created. Note that this array will internally be converted to COO format.
instance_id : integer
the instance ID used in loading (default=0; see SciDB documentation)
**kwargs : :
Additional keyword arguments are passed to new_array()
Returns: arr : SciDBArray
SciDB Array object built from the input array
- identity(n, dtype=u'double', sparse=False, **kwargs)¶
Return a 2-dimensional square identity matrix of size n
Parameters: n : integer
the number of rows and columns in the matrix
dtype : string or list
The data type of the array
sparse : boolean
specify whether to create a sparse array (default=False)
**kwargs : :
Additional keyword arguments are passed to SciDBDataShape.
Returns: arr : SciDBArray
A SciDBArray containint an [n x n] identity matrix
- isnan(A)¶
Element-wise nan test function
- join(*args)¶
Perform a series of array joins on the arguments and return the result.
- linspace(start, stop, num=50, endpoint=True, retstep=False, **kwargs)¶
Return evenly spaced numbers over a specified interval.
Returns num evenly spaced samples, calculated over the interval [start, stop ].
The endpoint of the interval can optionally be excluded.
Parameters: start : scalar
The starting value of the sequence.
stop : scalar
The end value of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced samples, so that stop is excluded. Note that the step size changes when endpoint is False.
num : int, optional
Number of samples to generate. Default is 50.
endpoint : bool, optional
If True, stop is the last sample. Otherwise, it is not included. Default is True.
retstep : bool, optional
If True, return (samples, step), where step is the spacing between samples.
**kwargs : :
additional keyword arguments are passed to SciDBDataShape
Returns: samples : SciDBArray
There are num equally spaced samples in the closed interval [start, stop] or the half-open interval [start, stop) (depending on whether endpoint is True or False).
step : float (only if retstep is True)
Size of spacing between samples.
- list_arrays(parsed=True, n=0)¶
List the arrays currently in the database
Parameters: parsed : boolean
If True (default), then parse the results into a dictionary of array names as keys, schema as values
n : integer
the maximum number of arrays to list. If n=0, then list all
Returns: array_list : string or dictionary
The list of arrays. If parsed=True, then the result is returned as a dictionary.
- log(A)¶
Element-wise natural logarithm
- log10(A)¶
Element-wise base-10 logarithm
- max(A, index=None, scidb_syntax=False)¶
Array or axis maximum.
see SciDBArray.max()
- mean(A, index=None, scidb_syntax=False)¶
Array or axis mean.
see SciDBArray.mean()
- merge(A, B)¶
Merge two arrays
- min(A, index=None, scidb_syntax=False)¶
Array or axis minimum.
see SciDBArray.min()
- new_array(shape=None, dtype=u'double', persistent=False, **kwargs)¶
Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query.
Parameters: shape : int or tuple (optional)
The shape of the array to create. If not specified, no array will be created and a name will simply be reserved for later use. WARNING: if shape=None and persistent=False, an error will result when the array goes out of scope, unless the name is used to create an array on the server.
dtype : string (optional)
the datatype of the array. This is only referenced if shape is specified. Default is ‘double’.
persistent : boolean (optional)
whether the created array should be persistent, i.e. survive in SciDB past when the object wrapper goes out of scope. Default is False.
**kwargs : (optional)
If shape is specified, additional keyword arguments are passed to SciDBDataShape. Otherwise, these will not be referenced.
Returns :
——- :
arr : SciDBArray
wrapper of the new SciDB array instance.
- ones(shape, dtype=u'double', **kwargs)¶
Return an array of ones
Parameters: shape : tuple or int
The shape of the array
dtype : string or list
The data type of the array
**kwargs : :
Additional keyword arguments are passed to SciDBDataShape.
Returns: arr: SciDBArray :
A SciDBArray consisting of all ones.
- query(query, *args, **kwargs)¶
Perform a query on the database.
This wraps a query constructor which allows the creation of sophisticated SciDB queries which act on arrays wrapped by SciDBArray objects. See Notes below for details.
Parameters: query : string
The query string, with curly-braces to indicate insertions
*args, **kwargs : :
Values to be inserted (see below).
- randint(shape, dtype=u'uint32', lower=0, upper=2147483647, persistent=False, **kwargs)¶
Return an array of random integers between lower and upper
Parameters: shape : tuple or int
The shape of the array
dtype : string or list
The data type of the array
lower : float
The lower bound of the random sample (default=0)
upper : float
The upper bound of the random sample (default=2147483647)
persistent : bool
Whether the array is persistent (default=False)
**kwargs : :
Additional keyword arguments are passed to SciDBDataShape.
Returns: arr: SciDBArray :
A SciDBArray consisting of random integers, uniformly distributed between lower and upper.
- random(shape, dtype=u'double', lower=0, upper=1, persistent=False, **kwargs)¶
Return an array of random floats between lower and upper
Parameters: shape : tuple or int
The shape of the array
dtype : string or list
The data type of the array
lower : float
The lower bound of the random sample (default=0)
upper : float
The upper bound of the random sample (default=1)
persistent : bool
Whether the new array is persistent (default=False)
**kwargs : :
Additional keyword arguments are passed to SciDBDataShape.
Returns: arr: SciDBArray :
A SciDBArray consisting of random floating point numbers, uniformly distributed between lower and upper.
- reap()¶
Reap all arrays created via new_array
- sin(A)¶
Element-wise trigonometric sine
- sqrt(A)¶
Element-wise square root
- std(A, index=None, scidb_syntax=False)¶
Array or axis standard deviation.
see SciDBArray.std()
- stdev(A, index=None, scidb_syntax=False)¶
Array or axis standard deviation.
see SciDBArray.stdev()
- substitute(A, value)¶
Replace null values in an array
See SciDBArray.substitute()
- sum(A, index=None, scidb_syntax=False)¶
Array or axis sum.
see SciDBArray.sum()
- svd(A, return_U=True, return_S=True, return_VT=True)¶
Compute the Singular Value Decomposition of the array A:
A = U.S.V^T
Parameters: A : SciDBArray
The array for which the SVD will be computed. It should be a 2-dimensional array with a single value per cell. Currently, the svd routine requires non-overlapping chunks of size 32.
return_U, return_S, return_VT : boolean
if any is True, then return the associated array. All are True by default
Returns: [U], [S], [VT] : SciDBArrays
Arrays storing the singular values and vectors of A.
- tan(A)¶
Element-wise trigonometric tangent
- toarray(A, transfer_bytes=True)¶
Convert a SciDB array to a numpy array
- todataframe(A, transfer_bytes=True)¶
Convert a SciDB array to a pandas dataframe
- tosparse(A, sparse_fmt=u'recarray', transfer_bytes=True)¶
Convert a SciDB array to a sparse representation
- var(A, index=None, scidb_syntax=False)¶
Array or axis variance.
see SciDBArray.var()
- wrap_array(scidbname, persistent=True)¶
Create a new SciDBArray object that references an existing SciDB array
Parameters: scidbname : string
Wrap an existing scidb array referred to by scidbname. The SciDB array object persistent value will be set to True, and the object shape, datashape and data type values will be determined by the SciDB array.
persistent : boolean
If True (default) then array will not be deleted when this variable goes out of scope. Warning: if persistent is set to False, data could be lost!
- zeros(shape, dtype=u'double', **kwargs)¶
Return an array of zeros
Parameters: shape : tuple or int
The shape of the array
dtype : string or list
The data type of the array
**kwargs : :
Additional keyword arguments are passed to SciDBDataShape.
Returns: arr: SciDBArray :
A SciDBArray consisting of all zeros.
Visualization and Analysis¶
- class scidbpy.aggregation.GroupBy(array, by)¶
Perform a GroupBy operation on an array
The interface of this class mimics a subset of the functionality of Pandas’ groupby.
Notes
GroupBy operations are currently restricted in the following ways:
- GroupBy items must be names of attributes or dimensions
- Non-integer attributes cannot be used as a groupby item
- Dimensions cannot be used in aggregate calls
These limitations will be addressed in the 14.9 release of SciDB-Py
Examples
>>> x = sdb.afl.build('<a:int32>[i=0:100,1000,0]', 'iif(i > 50, 1, 0)') >>> y = sdb.afl.build('<b:int32>[i=0:100,1000,0]', 'i % 30') >>> z = sdb.join(x, y) >>> grp = z.groupby('a') >>> grp.aggregate('sum(b)').todataframe() a b_sum 0 0 645 1 1 715
Multiple aggregation functions can be provided with a dict:
>>> grp.aggregate({'s':'sum(b)', 'm':'max(b)'}).todataframe() a s m 0 0 645 29 1 1 715 29
Methods
aggregate(mappings) Peform an aggregation over each group - aggregate(mappings)¶
Peform an aggregation over each group
Parameters: mappings : string or dictionary
If a string, a single SciDB expression to apply to each group If a dict, mapping several attribute names to expression strings
Returns: agg : SciDBArray
A new SciDBArray, obtained by applying the aggregations to the groups of the input array.
- scidbpy.aggregation.histogram(X, bins=10, att=None, range=None, plot=False, **kwargs)¶
Build a 1D histogram from a SciDBArray.
Parameters: X : SciDBArray
The array to compute a histogram for
att : str (optional)
The attribute of the array to consider. Defaults to the first attribute.
bins : int (optional)
The number of bins
range : [min, max] (optional)
The lower and upper limits of the histogram. Defaults to data limits.
plot : bool
If True, plot the results with matplotlib
histtype : ‘bar’ | ‘step’ (default=’bar’)
If plotting, the kind of hisogram to draw. See matplotlib.hist for more details.
kwargs : optional
Additional keywords passed to matplotlib
Returns: (counts, edges [, artists]) :
- edges is a NumPy array of edge locations (length=bins+1)
- counts is the number of data betwen [edges[i], edges[i+1]] (length=bins)
- artists is a list of the matplotlib artists created if plot=True