Divisi architecture

(First written January 2008)

Introduction

Divisi ('SVD' backwards with 'i's) is built around a tensor-view stack. So for example, a sparse tensor stores raw numeric data. A layer above it presents a normalized view of that tensor. Another layer above that presents a labeled view, where numeric indices are mapped to informative labels.

Tensor

Overview

A Tensor is a dict-like mapping from tuples of integers to floats. Tensors implement the dict interface (including iterkeys and iteritems). They also implement a math interface similar to NumPy's matrix. Finally, they implement various methods to convert to other types of tensors.

Any Tensor should be safe to pickle.

Indexing

Indices are assumed to be contiguous positive integers starting at 0. i.e., a tensor with a single element (1000, 1000, 2) has dimensions 1001 by 1001 by 3.

The number of indices specified must always match the number of dimensions.

Some tensors (LabeledTensors) are instead indexed by other labels.

Retrieval

Retrieving items from Tensors uses dict syntax:

tensor[2, 4, 5]

Getting a single item (all indices specified) returns a float. Getting a slice (e.g., tensor[2, :, 5]) returns a tensor of the same kind, with fully specified dimensions collapsed. (For the example, the result would be a 1D tensor.)

You should not rely on whether or not a slice is a view on the original Tensor (like NumPy). (FIXME: this seems unreasonable.)

Attributes

ndim
number of dimensions
shape
dimensions. len(shape) = ndim

Representation

Multiple representations of the data in a Tensor are possible. Call a conversion method of one tensor to convert it to another. Examples to follow.

Math

The following math operations are implemented so far:

Not yet implemented, or partially implemented (though not difficult): * Addition and subtraction (return the denser of the two) * in-place addition forces type retention

Gotchas

No change notification is implemented. Copies or views on tensors may be shallower than you expect. Try to load all your data in first, then do the math on it.

DenseTensors are almost NumPy's ndarrays, but not quite. For example, operations between the two may not work. (Actually, most will, if the DenseTensor is on the left.) Get in the habit of wrapping ndarrays with DenseTensor(array).

Tensor storage classes

Views

Overview

A View is a different way to view a Tensor (or another View). Each View has a tensor attribute that holds the underlying tensor. Generally, a view should behave like just another tensor. If a method is not implemented by the view, the View base class delegates to the tensor.

NormalizedView

A NormalizedView presents a (read-only) normalized view of the underlying tensor. It maintains a norms attribute that stores the sum of squares along each specified dimension.

The normalization should be Euclidean norms along the specified dimensions.

The routines that convert to packed representations are special-cased to use the norms directly, for efficiency.

LabeledView

A LabeledView associates each numeric index of each dimension with a label, which can be of any type. Alternately, the labels for a dimension can be specified as None to indicate that no labeling is done on that dimension (i.e., numerical indices are still used).

All operations are passed down to the underlying tensor, then the result wrapped in a LabeledView. So all operations that would ordinarily return a tensor return a labeled view of the result.

Labels must match for all math operations.

The index, indices, label, and labels methods map between labels and indices.

FIXME: describe OrderedSet

UnfoldedView

FIXME

SVD math