API Reference

This is the list of classes and functions available in SciDB-py.

SciDB Array Class

class scidbpy.SciDBArray(datashape, interface, name, persistent=False)

SciDBArray class

It is not recommended to instantiate this class directly; use a convenience routine from SciDBInterface.

Methods

alias([name]) Return an alias of the array, optionally with a new name
approxdc([index, scidb_syntax]) Return the number of distinct values of the array or along an axis.
att(a) Return the attribute name of the array.
attribute(a) Return the attribute name of the array.
avg([index, scidb_syntax]) Return the average of the array or the average along an axis.
compress(mask[, axis]) Extract a subset of entries along a given axis,
contains_nulls([attr]) Return True if the array contains null values.
contents(**kwargs) Return a string representation of the array contents
copy([new_name, persistent]) Make a copy of the array in the database
count([index, scidb_syntax]) Return the count of the array or the count along an axis.
cumprod([axis]) Return the cumulative product over the array.
cumsum([axis]) Return the cumulative sum over the array.
cumulate(expression[, dimension]) Compute running operations along data (e.g., cumulative sums)
dimension(d) Return the dimension name of the array
eval([out, store]) If the array is backed by an unevaluated query,
from_query(interface, query) Build a lazily-evaulated SciDB array from a query string
groupby(by) Build a groupby object from this array
head([n]) Extract and download the first few elements in the array
issparse() Check whether array is sparse.
max([index, scidb_syntax]) Return the maximum of the array or the maximum along an axis.
mean([index, scidb_syntax]) Return the average of the array or the average along an axis.
min([index, scidb_syntax]) Return the minimum of the array or the minimum along an axis.
nonempty() Return the number of nonempty elements in the array.
nonnull([attr]) Return the number of non-empty and non-null values.
reap([ignore]) Delete this object from the database if it isn’t persistent.
regrid(size[, aggregate]) Regrid the array using the specified aggregate
rename(new_name[, persistent]) Rename the array in the database, optionally making the new array persistent.
reshape(shape, **kwargs) Reshape data into a new array
std([index, scidb_syntax]) Return the standard deviation of the array or along an axis.
stdev([index, scidb_syntax]) Return the standard deviation of the array or along an axis.
substitute(value) Reshape data into a new array, substituting a default for any nulls.
sum([index, scidb_syntax]) Return the sum of the array or the sum along an axis.
tail([n])
toarray([transfer_bytes]) Transfer data from database and store in a numpy array.
todataframe([transfer_bytes]) Transfer array from database and store in a local Pandas dataframe
tosparse([sparse_fmt, transfer_bytes]) Transfer array from database and store in a local sparse array.
transpose(*axes) Permute the dimensions of an array.
var([index, scidb_syntax]) Return the variance of the array or the variance along an axis.
T

Permute the dimensions of an array.

Parameters:

axes : None, tuple of ints, or n ints

  • None or no argument: reverses the order of the axes.
  • tuple of ints: i in the j-th place in the tuple means a‘s i-th axis becomes a.transpose()‘s j-th axis.
  • n ints: same as an n-tuple of the same ints (this form is intended simply as a “convenience” alternative to the tuple form)
Returns:

out : ndarray

Copy of a, with axes suitably permuted.

afl

An alias to the AFL namespace

alias(name=None)

Return an alias of the array, optionally with a new name

approxdc(index=None, scidb_syntax=False)

Return the number of distinct values of the array or along an axis.

The distinct count is an estimate only.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array :

att(a)

Return the attribute name of the array.

Parameters:

a : int

Index of the attribute to lookup

attribute(a)

Return the attribute name of the array.

Parameters:

a : int

Index of the attribute to lookup

avg(index=None, scidb_syntax=False)

Return the average of the array or the average along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array :

compress(mask, axis=0)

Extract a subset of entries along a given axis, where an input mask array is non-null

Parameters:

array : SciDBArray

The array to filter

mask : SciDBArray

A 1-dimensional SciDBArray, whose non-null values indicate the entries to retain

axis : int

The axis of array along which to apply the mask. The shape of array along this axis must be the length of mask

contains_nulls(attr=None)

Return True if the array contains null values.

Parameters:

attr : None, int, or array_like

the attribute index/indices to check. If None, then check all.

Returns:

contains_nulls : boolean

contents(**kwargs)

Return a string representation of the array contents

copy(new_name=None, persistent=False)

Make a copy of the array in the database

Parameters:

new_name : string (optional)

if specifiedmust be a valid array name which does not already exist in the database.

persistent : boolean (optional)

specify whether the new array is persistent (default=False)

Returns:

copy : SciDBArray

return a copy of the original array

count(index=None, scidb_syntax=False)

Return the count of the array or the count along an axis.

The count is equal to the number of nonnull elements.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array :

cumprod(axis=None)

Return the cumulative product over the array.

Parameters:

axis : int, optional

The axis to multiply over. The default multiplies over the flattened array

Returns:

prods : SciDBArray

A new array, with the same shape (but flattened if axis=None)

See also

cumsum, cumulate

cumsum(axis=None)

Return the cumulative sum over the array.

Parameters:

axis : int, optional

The axis to sum over. The default sums over the flattened array

Returns:

sums : SciDBArray

A new array, with the same shape (but flattened if axis=None)

See also

cumprod, cumulate

cumulate(expression, dimension=0)

Compute running operations along data (e.g., cumulative sums)

Parameters:

expression: str :

A valid SciDB expression

dimension : int or str (optional, default=0)

Which dimension to accumulate over

Returns:

arr : SciDBArray

A new array of the same shape.

See also

cumsum, cumprod

Examples

>>> x = sdb.arange(12).reshape((3, 4))
>>> x.cumulate('sum(f0)').toarray()
array([[ 0,  1,  2,  3],
      [ 4,  6,  8, 10],
      [12, 15, 18, 21]])
dimension(d)

Return the dimension name of the array

Parameters:

d : int

The index of the dimension to lookup

eval(out=None, store=True, **kwargs)

If the array is backed by an unevaluated query, evaluate the query and store the result in the database

This changes array.name from a query string to a stored array name. Calling eval() on an array that is already backed by a stored array does nothing.

Parameters:

out : SciDBArray (optional)

An optional pre-existing array to store the evaluation into.

classmethod from_query(interface, query)

Build a lazily-evaulated SciDB array from a query string

Parameters:

interface : SciDBInterface

The database connection to use

query : str

The query string to wrap

Returns:

array : SciDBArray

groupby(by)

Build a groupby object from this array

Parameters:

by : string or list of strings

Names of attributes and dimensions to group by

Returns:

groups : scidbpy.aggregation.GroupBy instance

An object that can be used, e.g., to perform aggregations over each group. See scidbpy.aggregation.GroupBy documentation for more information.

head(n=5)

Extract and download the first few elements in the array

Parameters:

n : int (optional, default=5)

The number of elements to retrieve

Returns:

head : SciDBArray

The first N elements in the array, downloaded as a Pandas dataframe (if pandas is installed) or a Numpy array

issparse()

Check whether array is sparse.

max(index=None, scidb_syntax=False)

Return the maximum of the array or the maximum along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array :

mean(index=None, scidb_syntax=False)

Return the average of the array or the average along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array :

Notes

Identical to SciDBArray.avg()

min(index=None, scidb_syntax=False)

Return the minimum of the array or the minimum along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array :

nonempty()

Return the number of nonempty elements in the array.

Nonempty refers to the sparsity of an array, and thus includes in the count elements with values which are set to NULL.

See also

nonnull

nonnull(attr=0)

Return the number of non-empty and non-null values.

This query must be done for each attribute: the default is the first attribute.

Parameters:

attr : None, int or array_like

the attribute or attributes to query. If None, then query all attributes.

Returns:

nonnull : array_like

the nonnull count for each attribute. The returned value is the same shape as the input attr.

See also

nonempty

persistent

Controls whether the array is deleted when the database is reaped

reap(ignore=False)

Delete this object from the database if it isn’t persistent.

Parameters:

ignore : bool (default False)

If False and the array is persistent, then reap raises an error If True and the array is persistent, reap does nothing

Raises:

SciDBForbidden if ``persistent=True`` and ``ignore=False` :

regrid(size, aggregate=u'avg')

Regrid the array using the specified aggregate

Parameters:

size : int or tuple of ints

Specify the size of the regridding along each dimension. If a single integer, then use the same regridding along each dimension.

aggregate : string

specify the aggregation function to use when creating the new grid. Default is ‘avg’. Possible values are: [‘avg’, ‘sum’, ‘min’, ‘max’, ‘count’, ‘stdev’, ‘var’, ‘approxdc’]

Returns:

A : scidbarray

The re-gridded version of the array. The size of dimension i is ceil(self.shape[i] / size[i])

rename(new_name, persistent=False)

Rename the array in the database, optionally making the new array persistent.

Parameters:

new_name : string

must be a valid array name which does not already exist in the database.

persistent : boolean (optional)

specify whether the new array is persistent (default=False)

Returns:

self : SciDBArray

return a pointer to self

reshape(shape, **kwargs)

Reshape data into a new array

Parameters:

shape : tuple or int

The shape of the new array. Must be compatible with the current shape

**kwargs : :

additional keyword arguments will be passed to SciDBDatashape

Returns:

arr : SciDBArray

new array of the specified shape

schema

Return the array schema

std(index=None, scidb_syntax=False)

Return the standard deviation of the array or along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array :

Notes

Identical to SciDBArray.stdev()

stdev(index=None, scidb_syntax=False)

Return the standard deviation of the array or along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array :

substitute(value)

Reshape data into a new array, substituting a default for any nulls.

Parameters:

value : value to replace nulls (required)

Returns:

arr : SciDBArray

new non-nullable array

Notes

This is currently limited to single-attribute arrays. Use the raw AFL substutute operator for multi-attribute arrays

sum(index=None, scidb_syntax=False)

Return the sum of the array or the sum along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array :

toarray(transfer_bytes=False)

Transfer data from database and store in a numpy array.

Parameters:

transfer_bytes : DEPRECATED

Unused

Returns:

arr : np.ndarray

The dense array containing the data.

Notes

If the array is backed by a query, the query is evaluated and stored in the database

todataframe(transfer_bytes=True)

Transfer array from database and store in a local Pandas dataframe

For multidimensional arrays, the dimension values are added as additional columns in the dataframe.

Parameters:

transfer_bytes : boolean

if True (default), then transfer data as bytes rather than as ASCII.

Returns:

arr : pd.DataFrame

The dataframe object containing the data in the array.

tosparse(sparse_fmt=u'recarray', transfer_bytes=True)

Transfer array from database and store in a local sparse array.

Parameters:

transfer_bytes : boolean

if True (default), then transfer data as bytes rather than as ASCII. This is more accurate, but requires two passes over the data (one for indices, one for values).

sparse_format : string or None

Specify the sparse format to use. Available formats are: - ‘recarray’ : a record array containing the indices and

values for each data point. This is valid for arrays of any dimension and with any number of attributes.

  • [‘coo’|’csc’|’csr’|’dok’|’lil’] : a scipy sparse matrix. These are valid only for 2-dimensional arrays with a single attribute.
Returns:

arr : ndarray or sparse matrix

The sparse representation of the data

transpose(*axes)

Permute the dimensions of an array.

Parameters:

axes : None, tuple of ints, or n ints

  • None or no argument: reverses the order of the axes.
  • tuple of ints: i in the j-th place in the tuple means a‘s i-th axis becomes a.transpose()‘s j-th axis.
  • n ints: same as an n-tuple of the same ints (this form is intended simply as a “convenience” alternative to the tuple form)
Returns:

out : ndarray

Copy of a, with axes suitably permuted.

var(index=None, scidb_syntax=False)

Return the variance of the array or the variance along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array :

SciDB Interface

scidbpy.interface.connect(url=None)

Connect to a SciDB instance

Parameters:

url : str (optional)

Connection URL. If not provided, will fall back to the SCIDB_URL environment variable (if present), or http://127.0.0.1:8080

Base Class

class scidbpy.interface.SciDBInterface

Methods

acos(A) Element-wise trigonometric inverse cosine
approxdc(A[, index, scidb_syntax]) Array or axis unique element estimate.
arange([start,] stop[, step,][, dtype]) Return evenly spaced values within a given interval.
asin(A) Element-wise trigonometric inverse sine
atan(A) Element-wise trigonometric inverse tangent
avg(A[, index, scidb_syntax]) Array or axis average.
ceil(A) Element-wise ceiling function
cos(A) Element-wise trigonometric cosine
count(A[, index, scidb_syntax]) Array or axis count.
cross_join(A, B, *dims) Perform a cross-join on arrays A and B.
dot(A, B) Compute the matrix product of A and B
exp(A) Element-wise natural exponent
floor(A) Element-wise floor function
from_array(A[, instance_id]) Initialize a scidb array from a numpy array
from_dataframe(A[, instance_id]) Initialize a scidb array from a pandas dataframe
from_sparse(A[, instance_id]) Initialize a scidb array from a sparse array
identity(n[, dtype, sparse]) Return a 2-dimensional square identity matrix of size n
isnan(A) Element-wise nan test function
join(*args) Perform a series of array joins on the arguments and return the result.
linspace(start, stop[, num, endpoint, retstep]) Return evenly spaced numbers over a specified interval.
list_arrays([parsed, n]) List the arrays currently in the database
log(A) Element-wise natural logarithm
log10(A) Element-wise base-10 logarithm
max(A[, index, scidb_syntax]) Array or axis maximum.
mean(A[, index, scidb_syntax]) Array or axis mean.
merge(A, B) Merge two arrays
min(A[, index, scidb_syntax]) Array or axis minimum.
new_array([shape, dtype, persistent]) Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query.
ones(shape[, dtype]) Return an array of ones
query(query, *args, **kwargs) Perform a query on the database.
randint(shape[, dtype, lower, upper, persistent]) Return an array of random integers between lower and upper
random(shape[, dtype, lower, upper, persistent]) Return an array of random floats between lower and upper
reap() Reap all arrays created via new_array
sin(A) Element-wise trigonometric sine
sqrt(A) Element-wise square root
std(A[, index, scidb_syntax]) Array or axis standard deviation.
stdev(A[, index, scidb_syntax]) Array or axis standard deviation.
substitute(A, value) Replace null values in an array
sum(A[, index, scidb_syntax]) Array or axis sum.
svd(A[, return_U, return_S, return_VT]) Compute the Singular Value Decomposition of the array A:
tan(A) Element-wise trigonometric tangent
toarray(A[, transfer_bytes]) Convert a SciDB array to a numpy array
todataframe(A[, transfer_bytes]) Convert a SciDB array to a pandas dataframe
tosparse(A[, sparse_fmt, transfer_bytes]) Convert a SciDB array to a sparse representation
var(A[, index, scidb_syntax]) Array or axis variance.
wrap_array(scidbname[, persistent]) Create a new SciDBArray object that references an existing SciDB
zeros(shape[, dtype]) Return an array of zeros
acos(A)

Element-wise trigonometric inverse cosine

approxdc(A, index=None, scidb_syntax=False)

Array or axis unique element estimate.

see SciDBArray.approxdc()

arange([start, ]stop, [step, ]dtype=None, **kwargs)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the behavior is equivalent to the Python range function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.

Parameters:

start : number, optional

Start of interval. The interval includes this value. The default start value is 0.

stop : number

End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.

step : number, optional

Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified, start must also be given.

dtype : dtype

The type of the output array. If dtype is not given, it is inferred from the type of the input arguments.

**kwargs : :

Additional arguments are passed to SciDBDatashape when creating the output array.

Returns:

arange : SciDBArray

Array of evenly spaced values.

For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.

asin(A)

Element-wise trigonometric inverse sine

atan(A)

Element-wise trigonometric inverse tangent

avg(A, index=None, scidb_syntax=False)

Array or axis average.

see SciDBArray.avg()

ceil(A)

Element-wise ceiling function

cos(A)

Element-wise trigonometric cosine

count(A, index=None, scidb_syntax=False)

Array or axis count.

see SciDBArray.count()

cross_join(A, B, *dims)

Perform a cross-join on arrays A and B.

Parameters:

A, B : SciDBArray

*dims : tuples

The remaining arguments are tuples of dimension indices which should be joined.

dot(A, B)

Compute the matrix product of A and B

Parameters:

A : SciDBArray

A must be a two-dimensional matrix of shape (n, p)

B : SciDBArray

B must be a two-dimensional matrix of shape (p, m)

Returns:

C : SciDBArray

The wrapper of the SciDB Array, of shape (n, m), consisting of the matrix product of A and B

exp(A)

Element-wise natural exponent

floor(A)

Element-wise floor function

from_array(A, instance_id=0, **kwargs)

Initialize a scidb array from a numpy array

Parameters:

A : array_like (numpy array or sparse array)

input array from which the scidb array will be created

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

**kwargs : :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

from_dataframe(A, instance_id=0, **kwargs)

Initialize a scidb array from a pandas dataframe

Parameters:

A : pandas dataframe

data from which the scidb array will be created.

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

**kwargs : :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

from_sparse(A, instance_id=0, **kwargs)

Initialize a scidb array from a sparse array

Parameters:

A : sparse array

sparse input array from which the scidb array will be created. Note that this array will internally be converted to COO format.

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

**kwargs : :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

identity(n, dtype=u'double', sparse=False, **kwargs)

Return a 2-dimensional square identity matrix of size n

Parameters:

n : integer

the number of rows and columns in the matrix

dtype : string or list

The data type of the array

sparse : boolean

specify whether to create a sparse array (default=False)

**kwargs : :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr : SciDBArray

A SciDBArray containint an [n x n] identity matrix

isnan(A)

Element-wise nan test function

join(*args)

Perform a series of array joins on the arguments and return the result.

linspace(start, stop, num=50, endpoint=True, retstep=False, **kwargs)

Return evenly spaced numbers over a specified interval.

Returns num evenly spaced samples, calculated over the interval [start, stop ].

The endpoint of the interval can optionally be excluded.

Parameters:

start : scalar

The starting value of the sequence.

stop : scalar

The end value of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced samples, so that stop is excluded. Note that the step size changes when endpoint is False.

num : int, optional

Number of samples to generate. Default is 50.

endpoint : bool, optional

If True, stop is the last sample. Otherwise, it is not included. Default is True.

retstep : bool, optional

If True, return (samples, step), where step is the spacing between samples.

**kwargs : :

additional keyword arguments are passed to SciDBDataShape

Returns:

samples : SciDBArray

There are num equally spaced samples in the closed interval [start, stop] or the half-open interval [start, stop) (depending on whether endpoint is True or False).

step : float (only if retstep is True)

Size of spacing between samples.

list_arrays(parsed=True, n=0)

List the arrays currently in the database

Parameters:

parsed : boolean

If True (default), then parse the results into a dictionary of array names as keys, schema as values

n : integer

the maximum number of arrays to list. If n=0, then list all

Returns:

array_list : string or dictionary

The list of arrays. If parsed=True, then the result is returned as a dictionary.

log(A)

Element-wise natural logarithm

log10(A)

Element-wise base-10 logarithm

max(A, index=None, scidb_syntax=False)

Array or axis maximum.

see SciDBArray.max()

mean(A, index=None, scidb_syntax=False)

Array or axis mean.

see SciDBArray.mean()

merge(A, B)

Merge two arrays

min(A, index=None, scidb_syntax=False)

Array or axis minimum.

see SciDBArray.min()

new_array(shape=None, dtype=u'double', persistent=False, **kwargs)

Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query.

Parameters:

shape : int or tuple (optional)

The shape of the array to create. If not specified, no array will be created and a name will simply be reserved for later use. WARNING: if shape=None and persistent=False, an error will result when the array goes out of scope, unless the name is used to create an array on the server.

dtype : string (optional)

the datatype of the array. This is only referenced if shape is specified. Default is ‘double’.

persistent : boolean (optional)

whether the created array should be persistent, i.e. survive in SciDB past when the object wrapper goes out of scope. Default is False.

**kwargs : (optional)

If shape is specified, additional keyword arguments are passed to SciDBDataShape. Otherwise, these will not be referenced.

Returns :

——- :

arr : SciDBArray

wrapper of the new SciDB array instance.

ones(shape, dtype=u'double', **kwargs)

Return an array of ones

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

**kwargs : :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray :

A SciDBArray consisting of all ones.

query(query, *args, **kwargs)

Perform a query on the database.

This wraps a query constructor which allows the creation of sophisticated SciDB queries which act on arrays wrapped by SciDBArray objects. See Notes below for details.

Parameters:

query : string

The query string, with curly-braces to indicate insertions

*args, **kwargs : :

Values to be inserted (see below).

randint(shape, dtype=u'uint32', lower=0, upper=2147483647, persistent=False, **kwargs)

Return an array of random integers between lower and upper

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

lower : float

The lower bound of the random sample (default=0)

upper : float

The upper bound of the random sample (default=2147483647)

persistent : bool

Whether the array is persistent (default=False)

**kwargs : :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray :

A SciDBArray consisting of random integers, uniformly distributed between lower and upper.

random(shape, dtype=u'double', lower=0, upper=1, persistent=False, **kwargs)

Return an array of random floats between lower and upper

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

lower : float

The lower bound of the random sample (default=0)

upper : float

The upper bound of the random sample (default=1)

persistent : bool

Whether the new array is persistent (default=False)

**kwargs : :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray :

A SciDBArray consisting of random floating point numbers, uniformly distributed between lower and upper.

reap()

Reap all arrays created via new_array

sin(A)

Element-wise trigonometric sine

sqrt(A)

Element-wise square root

std(A, index=None, scidb_syntax=False)

Array or axis standard deviation.

see SciDBArray.std()

stdev(A, index=None, scidb_syntax=False)

Array or axis standard deviation.

see SciDBArray.stdev()

substitute(A, value)

Replace null values in an array

See SciDBArray.substitute()

sum(A, index=None, scidb_syntax=False)

Array or axis sum.

see SciDBArray.sum()

svd(A, return_U=True, return_S=True, return_VT=True)

Compute the Singular Value Decomposition of the array A:

A = U.S.V^T

Parameters:

A : SciDBArray

The array for which the SVD will be computed. It should be a 2-dimensional array with a single value per cell. Currently, the svd routine requires non-overlapping chunks of size 32.

return_U, return_S, return_VT : boolean

if any is True, then return the associated array. All are True by default

Returns:

[U], [S], [VT] : SciDBArrays

Arrays storing the singular values and vectors of A.

tan(A)

Element-wise trigonometric tangent

toarray(A, transfer_bytes=True)

Convert a SciDB array to a numpy array

todataframe(A, transfer_bytes=True)

Convert a SciDB array to a pandas dataframe

tosparse(A, sparse_fmt=u'recarray', transfer_bytes=True)

Convert a SciDB array to a sparse representation

var(A, index=None, scidb_syntax=False)

Array or axis variance.

see SciDBArray.var()

wrap_array(scidbname, persistent=True)

Create a new SciDBArray object that references an existing SciDB array

Parameters:

scidbname : string

Wrap an existing scidb array referred to by scidbname. The SciDB array object persistent value will be set to True, and the object shape, datashape and data type values will be determined by the SciDB array.

persistent : boolean

If True (default) then array will not be deleted when this variable goes out of scope. Warning: if persistent is set to False, data could be lost!

zeros(shape, dtype=u'double', **kwargs)

Return an array of zeros

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

**kwargs : :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray :

A SciDBArray consisting of all zeros.

Shim Interface

class scidbpy.interface.SciDBShimInterface(hostname)

HTTP interface to SciDB via shim [1]_

Parameters:

hostname : string

A URL pointing to a running shim/SciDB session

[1] https://github.com/Paradigm4/shim :

Methods

acos(A) Element-wise trigonometric inverse cosine
approxdc(A[, index, scidb_syntax]) Array or axis unique element estimate.
arange([start,] stop[, step,][, dtype]) Return evenly spaced values within a given interval.
asin(A) Element-wise trigonometric inverse sine
atan(A) Element-wise trigonometric inverse tangent
avg(A[, index, scidb_syntax]) Array or axis average.
ceil(A) Element-wise ceiling function
cos(A) Element-wise trigonometric cosine
count(A[, index, scidb_syntax]) Array or axis count.
cross_join(A, B, *dims) Perform a cross-join on arrays A and B.
dot(A, B) Compute the matrix product of A and B
exp(A) Element-wise natural exponent
floor(A) Element-wise floor function
from_array(A[, instance_id]) Initialize a scidb array from a numpy array
from_dataframe(A[, instance_id]) Initialize a scidb array from a pandas dataframe
from_sparse(A[, instance_id]) Initialize a scidb array from a sparse array
identity(n[, dtype, sparse]) Return a 2-dimensional square identity matrix of size n
isnan(A) Element-wise nan test function
join(*args) Perform a series of array joins on the arguments and return the result.
linspace(start, stop[, num, endpoint, retstep]) Return evenly spaced numbers over a specified interval.
list_arrays([parsed, n]) List the arrays currently in the database
log(A) Element-wise natural logarithm
log10(A) Element-wise base-10 logarithm
max(A[, index, scidb_syntax]) Array or axis maximum.
mean(A[, index, scidb_syntax]) Array or axis mean.
merge(A, B) Merge two arrays
min(A[, index, scidb_syntax]) Array or axis minimum.
new_array([shape, dtype, persistent]) Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query.
ones(shape[, dtype]) Return an array of ones
query(query, *args, **kwargs) Perform a query on the database.
randint(shape[, dtype, lower, upper, persistent]) Return an array of random integers between lower and upper
random(shape[, dtype, lower, upper, persistent]) Return an array of random floats between lower and upper
reap() Reap all arrays created via new_array
sin(A) Element-wise trigonometric sine
sqrt(A) Element-wise square root
std(A[, index, scidb_syntax]) Array or axis standard deviation.
stdev(A[, index, scidb_syntax]) Array or axis standard deviation.
substitute(A, value) Replace null values in an array
sum(A[, index, scidb_syntax]) Array or axis sum.
svd(A[, return_U, return_S, return_VT]) Compute the Singular Value Decomposition of the array A:
tan(A) Element-wise trigonometric tangent
toarray(A[, transfer_bytes]) Convert a SciDB array to a numpy array
todataframe(A[, transfer_bytes]) Convert a SciDB array to a pandas dataframe
tosparse(A[, sparse_fmt, transfer_bytes]) Convert a SciDB array to a sparse representation
var(A[, index, scidb_syntax]) Array or axis variance.
wrap_array(scidbname[, persistent]) Create a new SciDBArray object that references an existing SciDB
zeros(shape[, dtype]) Return an array of zeros
acos(A)

Element-wise trigonometric inverse cosine

approxdc(A, index=None, scidb_syntax=False)

Array or axis unique element estimate.

see SciDBArray.approxdc()

arange([start, ]stop, [step, ]dtype=None, **kwargs)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the behavior is equivalent to the Python range function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.

Parameters:

start : number, optional

Start of interval. The interval includes this value. The default start value is 0.

stop : number

End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.

step : number, optional

Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified, start must also be given.

dtype : dtype

The type of the output array. If dtype is not given, it is inferred from the type of the input arguments.

**kwargs : :

Additional arguments are passed to SciDBDatashape when creating the output array.

Returns:

arange : SciDBArray

Array of evenly spaced values.

For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.

asin(A)

Element-wise trigonometric inverse sine

atan(A)

Element-wise trigonometric inverse tangent

avg(A, index=None, scidb_syntax=False)

Array or axis average.

see SciDBArray.avg()

ceil(A)

Element-wise ceiling function

cos(A)

Element-wise trigonometric cosine

count(A, index=None, scidb_syntax=False)

Array or axis count.

see SciDBArray.count()

cross_join(A, B, *dims)

Perform a cross-join on arrays A and B.

Parameters:

A, B : SciDBArray

*dims : tuples

The remaining arguments are tuples of dimension indices which should be joined.

dot(A, B)

Compute the matrix product of A and B

Parameters:

A : SciDBArray

A must be a two-dimensional matrix of shape (n, p)

B : SciDBArray

B must be a two-dimensional matrix of shape (p, m)

Returns:

C : SciDBArray

The wrapper of the SciDB Array, of shape (n, m), consisting of the matrix product of A and B

exp(A)

Element-wise natural exponent

floor(A)

Element-wise floor function

from_array(A, instance_id=0, **kwargs)

Initialize a scidb array from a numpy array

Parameters:

A : array_like (numpy array or sparse array)

input array from which the scidb array will be created

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

**kwargs : :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

from_dataframe(A, instance_id=0, **kwargs)

Initialize a scidb array from a pandas dataframe

Parameters:

A : pandas dataframe

data from which the scidb array will be created.

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

**kwargs : :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

from_sparse(A, instance_id=0, **kwargs)

Initialize a scidb array from a sparse array

Parameters:

A : sparse array

sparse input array from which the scidb array will be created. Note that this array will internally be converted to COO format.

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

**kwargs : :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

identity(n, dtype=u'double', sparse=False, **kwargs)

Return a 2-dimensional square identity matrix of size n

Parameters:

n : integer

the number of rows and columns in the matrix

dtype : string or list

The data type of the array

sparse : boolean

specify whether to create a sparse array (default=False)

**kwargs : :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr : SciDBArray

A SciDBArray containint an [n x n] identity matrix

isnan(A)

Element-wise nan test function

join(*args)

Perform a series of array joins on the arguments and return the result.

linspace(start, stop, num=50, endpoint=True, retstep=False, **kwargs)

Return evenly spaced numbers over a specified interval.

Returns num evenly spaced samples, calculated over the interval [start, stop ].

The endpoint of the interval can optionally be excluded.

Parameters:

start : scalar

The starting value of the sequence.

stop : scalar

The end value of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced samples, so that stop is excluded. Note that the step size changes when endpoint is False.

num : int, optional

Number of samples to generate. Default is 50.

endpoint : bool, optional

If True, stop is the last sample. Otherwise, it is not included. Default is True.

retstep : bool, optional

If True, return (samples, step), where step is the spacing between samples.

**kwargs : :

additional keyword arguments are passed to SciDBDataShape

Returns:

samples : SciDBArray

There are num equally spaced samples in the closed interval [start, stop] or the half-open interval [start, stop) (depending on whether endpoint is True or False).

step : float (only if retstep is True)

Size of spacing between samples.

list_arrays(parsed=True, n=0)

List the arrays currently in the database

Parameters:

parsed : boolean

If True (default), then parse the results into a dictionary of array names as keys, schema as values

n : integer

the maximum number of arrays to list. If n=0, then list all

Returns:

array_list : string or dictionary

The list of arrays. If parsed=True, then the result is returned as a dictionary.

log(A)

Element-wise natural logarithm

log10(A)

Element-wise base-10 logarithm

max(A, index=None, scidb_syntax=False)

Array or axis maximum.

see SciDBArray.max()

mean(A, index=None, scidb_syntax=False)

Array or axis mean.

see SciDBArray.mean()

merge(A, B)

Merge two arrays

min(A, index=None, scidb_syntax=False)

Array or axis minimum.

see SciDBArray.min()

new_array(shape=None, dtype=u'double', persistent=False, **kwargs)

Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query.

Parameters:

shape : int or tuple (optional)

The shape of the array to create. If not specified, no array will be created and a name will simply be reserved for later use. WARNING: if shape=None and persistent=False, an error will result when the array goes out of scope, unless the name is used to create an array on the server.

dtype : string (optional)

the datatype of the array. This is only referenced if shape is specified. Default is ‘double’.

persistent : boolean (optional)

whether the created array should be persistent, i.e. survive in SciDB past when the object wrapper goes out of scope. Default is False.

**kwargs : (optional)

If shape is specified, additional keyword arguments are passed to SciDBDataShape. Otherwise, these will not be referenced.

Returns :

——- :

arr : SciDBArray

wrapper of the new SciDB array instance.

ones(shape, dtype=u'double', **kwargs)

Return an array of ones

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

**kwargs : :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray :

A SciDBArray consisting of all ones.

query(query, *args, **kwargs)

Perform a query on the database.

This wraps a query constructor which allows the creation of sophisticated SciDB queries which act on arrays wrapped by SciDBArray objects. See Notes below for details.

Parameters:

query : string

The query string, with curly-braces to indicate insertions

*args, **kwargs : :

Values to be inserted (see below).

randint(shape, dtype=u'uint32', lower=0, upper=2147483647, persistent=False, **kwargs)

Return an array of random integers between lower and upper

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

lower : float

The lower bound of the random sample (default=0)

upper : float

The upper bound of the random sample (default=2147483647)

persistent : bool

Whether the array is persistent (default=False)

**kwargs : :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray :

A SciDBArray consisting of random integers, uniformly distributed between lower and upper.

random(shape, dtype=u'double', lower=0, upper=1, persistent=False, **kwargs)

Return an array of random floats between lower and upper

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

lower : float

The lower bound of the random sample (default=0)

upper : float

The upper bound of the random sample (default=1)

persistent : bool

Whether the new array is persistent (default=False)

**kwargs : :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray :

A SciDBArray consisting of random floating point numbers, uniformly distributed between lower and upper.

reap()

Reap all arrays created via new_array

sin(A)

Element-wise trigonometric sine

sqrt(A)

Element-wise square root

std(A, index=None, scidb_syntax=False)

Array or axis standard deviation.

see SciDBArray.std()

stdev(A, index=None, scidb_syntax=False)

Array or axis standard deviation.

see SciDBArray.stdev()

substitute(A, value)

Replace null values in an array

See SciDBArray.substitute()

sum(A, index=None, scidb_syntax=False)

Array or axis sum.

see SciDBArray.sum()

svd(A, return_U=True, return_S=True, return_VT=True)

Compute the Singular Value Decomposition of the array A:

A = U.S.V^T

Parameters:

A : SciDBArray

The array for which the SVD will be computed. It should be a 2-dimensional array with a single value per cell. Currently, the svd routine requires non-overlapping chunks of size 32.

return_U, return_S, return_VT : boolean

if any is True, then return the associated array. All are True by default

Returns:

[U], [S], [VT] : SciDBArrays

Arrays storing the singular values and vectors of A.

tan(A)

Element-wise trigonometric tangent

toarray(A, transfer_bytes=True)

Convert a SciDB array to a numpy array

todataframe(A, transfer_bytes=True)

Convert a SciDB array to a pandas dataframe

tosparse(A, sparse_fmt=u'recarray', transfer_bytes=True)

Convert a SciDB array to a sparse representation

var(A, index=None, scidb_syntax=False)

Array or axis variance.

see SciDBArray.var()

wrap_array(scidbname, persistent=True)

Create a new SciDBArray object that references an existing SciDB array

Parameters:

scidbname : string

Wrap an existing scidb array referred to by scidbname. The SciDB array object persistent value will be set to True, and the object shape, datashape and data type values will be determined by the SciDB array.

persistent : boolean

If True (default) then array will not be deleted when this variable goes out of scope. Warning: if persistent is set to False, data could be lost!

zeros(shape, dtype=u'double', **kwargs)

Return an array of zeros

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

**kwargs : :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray :

A SciDBArray consisting of all zeros.

Visualization and Analysis

class scidbpy.aggregation.GroupBy(array, by)

Perform a GroupBy operation on an array

The interface of this class mimics a subset of the functionality of Pandas’ groupby.

Notes

GroupBy operations are currently restricted in the following ways:

  • GroupBy items must be names of attributes or dimensions
  • Non-integer attributes cannot be used as a groupby item
  • Dimensions cannot be used in aggregate calls

These limitations will be addressed in the 14.9 release of SciDB-Py

Examples

>>> x = sdb.afl.build('<a:int32>[i=0:100,1000,0]', 'iif(i > 50, 1, 0)')
>>> y = sdb.afl.build('<b:int32>[i=0:100,1000,0]', 'i % 30')
>>> z = sdb.join(x, y)
>>> grp = z.groupby('a')
>>> grp.aggregate('sum(b)').todataframe()
   a  b_sum
0  0    645
1  1    715

Multiple aggregation functions can be provided with a dict:

>>> grp.aggregate({'s':'sum(b)', 'm':'max(b)'}).todataframe()
       a    s   m
    0  0  645  29
    1  1  715  29

Methods

aggregate(mappings) Peform an aggregation over each group
aggregate(mappings)

Peform an aggregation over each group

Parameters:

mappings : string or dictionary

If a string, a single SciDB expression to apply to each group If a dict, mapping several attribute names to expression strings

Returns:

agg : SciDBArray

A new SciDBArray, obtained by applying the aggregations to the groups of the input array.

scidbpy.aggregation.histogram(X, bins=10, att=None, range=None, plot=False, **kwargs)

Build a 1D histogram from a SciDBArray.

Parameters:

X : SciDBArray

The array to compute a histogram for

att : str (optional)

The attribute of the array to consider. Defaults to the first attribute.

bins : int (optional)

The number of bins

range : [min, max] (optional)

The lower and upper limits of the histogram. Defaults to data limits.

plot : bool

If True, plot the results with matplotlib

histtype : ‘bar’ | ‘step’ (default=’bar’)

If plotting, the kind of hisogram to draw. See matplotlib.hist for more details.

kwargs : optional

Additional keywords passed to matplotlib

Returns:

(counts, edges [, artists]) :

  • edges is a NumPy array of edge locations (length=bins+1)
  • counts is the number of data betwen [edges[i], edges[i+1]] (length=bins)
  • artists is a list of the matplotlib artists created if plot=True