yt Methods and Background

Warning

This is a subsection of the chapter in Matthew Turk’s thesis on the development and capabilities of yt. Some of the specifics may be out of date, but the mathematical and algorithmic descriptions are still valid.

Introduction

The construction and development of analysis tools acts as a rite of passage for computational astrophysics students. As scientists, the goal is always the same: understanding and examining output of a simulation and then processing it to produce some insight about natural phenomena. However, non-intuitive output formats and the relatively time-consuming process of constructing and testing modules to process these outputs delays the process of generating useful analysis methods and tools. Furthermore, this leads to non-standard analysis tools, which may conceal both bugs and creeping errors. Clearly, a flexible and freely-distributed means of analyzing and exploring data would serve to enhance the scientific process, easing the transition from generating data to understanding data.

Adaptive mesh refinement, in particular, usually consists of relatively intuitive but not straightforward data formats. The Enzo code, described in this work, relies on regular, cartesian grid patches consisting of computational elements, and has been used to study a wide variety of astrophysical problems. A competing code called Orion, also built on adaptive mesh refinement technology, is also used to study astrophysical problems, albeit utilizing different underlying solvers and physical models. Both start from the presupposition that at all locations in a given computational domain, all quantities governing physical processes are defined by a set of fields. These data fields act in conjunction to completely describe the state of the simulation.

Several sets of tools exist to handle adaptive mesh refinement data; in the astrophysical community alone, there are several to choose from. Jacques, written in IDL by Tom Abel (http://jacques.enzotools.org), acts as a visualization system for Enzo data. VisIt, developed at Lawrence Livermore National Lab, is a 3D visualization suite that can examine both Enzo and Orion data. Volume renderers, such as that presented in [vg06-kaehler], allow for interactive and immersive rendering of adaptive mesh refinement. Additionally, numerous home-grown scripts and modules, developed in isolation, have been designed over the lifetimes of these codes to handle analysis and visualization. However, what was missing was a flexible, lightweight solution built exclusively on freely available open source components. This would allow not only for largely unfettered redistribution, but also a complete examination of every component of the analysis process, from data to plot.

I present here an analysis toolkit I have created called yt, which is built exclusively on free and open source components and is unencumbered by heavyweight libraries and licensing servers. I have designed it to be highly modular, with a clear data analysis module distinct from the visualization and plotting modules, and a graphical user interface that builds on rather than supplanting the underlying application programming interface.

yt is written primarily in Python, with some computationally expensive routines written in C for speed. Python is an open source, freely-available object-oriented language designed for rapid development and ease of use. Python use is increasingly widespread, both inside and outside the scientific domain, for purposes as diverse as serving dynamic content online to symbolic math processing. Here we use it to provide transparently parallel analysis and visualization of adaptive mesh refinement simulations of astrophysical phenomena, a task to which it is ideally suited. Additionally, this allows the creation and usage of new analysis modules by users, which can be built on the foundations of the yt framework.

The definition of Free Software requires a number of freedoms: the freedom to use, the freedom to inspect, the freedom to give away, and the freedom to modify. These principles serve science well, and all of them serve to improve repeatability and non-locality of results. Not only are all of the components of yt Free Software, but the libraries it is built upon are Free Software. Enzo, as well, is Free Software and runs on exclusively Free Software operating systems. In this way, the development of yt helps to ensure the entire pipeline of analysis – from the simulation to the paper – is open and available to all parties.

Analysis Requirements

Astrophysical systems are inherently multi-scale, and the formation of primordial stars is a good example. Beginning with cosmological-scale perturbations in the background density of the universe, one must follow the evolution of gas parcels down to the mass scale of the moon to have any hope of resolving the inner structure and thus constrain the mass scale of these stars. The Enzo code, used in this work to simulate the formation of the first stars in the universe, is also used for simulating large-scale galaxy clusters [2007ApJ-671-27H], and galaxy formation and evolution [2009ApJ-696-96W]. These diverse applications require flexible analysis methods that work for broad but shallow refinement regions – as in a turbulence simulation – as well as narrow and deep refinement, as in a primordial star formation simulation.

Approaching from the standpoint of examining slices and projected regions in extremely deep adaptive mesh refinement datasets, yt was created to approach the problem of off-screen rendering and scriptable interfaces. To accomodate the relatively diverse computing environments on which Enzo is run, exclusively interactive visualization had to be replaced with a detached method more suited to remote visualization, ofttimes through a job execution queue on a computing cluster. By detaching the user interface from the analysis backend, the architecture was restructred to be a loosely federated system of components. Currently, yt is primarily a scripting interface for analysis and visualization, with limited data management capabilities. However, a full graphical user interface for interactive exploration, built on wxPython, remains a crucial part of the toolkit as a whole. These components all interact as modules, and thus can operate completely independently of each other.

Utilizing commodity Python-based packages, yt is a fully-featured, adaptable and versatile means of analyzing large-scale astrophysical data. It is based primarily on the library NumPy and it is mostly written in Python. It uses Matplotlib for visualization, and optionally PyTables and wxPython for various sub-tasks. Additionally, several core routines have been written in C for fast numerical computation, and a simple TVTK-based 3D visualization component has been implemented. A community of users and developers has grown around the project; it has been used in several published papers and is now distributed with the Enzo code itself.

The ultimate purpose of yt is to provide a high-level interface to data, which will have the side effect of enabling different entry points to yt itself. This interface includes the creation of publication-quality plots, as well as a concealment of difficult, multi-step operations. This allows the creation of multiple frontends, as well as recipe-based approaches to analysis script creation. To provide maximum flexibility, as well as a conceptual separation of the different components and tasks to which components can be directed, yt is packaged into several sub-packages, for data handling, data analysis, and plotting.

Community Engagement

From its beginning, yt has been exclusively free and open source software, and it will never require components that are not open source and freely available. This eliminates dependencies on licensing servers, as well as contributing back to the community any developed technology. The development has been driven, and will continue to be driven, by the pragmatic analysis needs of working scientists.

Furthermore, no implemented features will be hidden from the community at large. This philosophy has served the toolkit well already; the analysis toolkit has been examined by outsiders and minor bugs have been found and corrected. While this provision does not extend to components provided by others, it has served well for the development team, as all new features and components are developed in the open with peer review.

The development of yt takes place in a publicly accessible subversion repository with a Trac frontend. Cross-referenced and indexed documentation is available, and automatically updated as changes are made. The source code is entirely commented and extensive programming interface documentation is automatically generated. In order to ease the process of installation, a script is included to install the entire set of dependencies along with the toolkit; furthermore, installations of the toolkit are maintained at several different supercomputing centers, and a binary version for Mac OS X is provided.

This high-level of community involvement and, more importantly, outreach enables a broader set of diverse needs and desires to guide the long-term development. Enabling direct technology transfer between users, rather than requiring re-implementation, allows the the community to disentangle the coding process from the scientific process; simultaneously, by making all code public, inspectable and freely available, it can be openly improved and verified. The availability and relatively approachable nature of yt, in addition to the inclusion of many simple analysis tasks, reduces the barrier to entry for young scientists; furthermore, by orienting the analysis framework development as a community project, the learning curve for transforming simulation data into publications is greatly reduced.

Data Analysis Layer

The analysis layer, lagos, provides several features beyond data access, including extensive analytical capabilities. At its simplest level, lagos is used to access the parameters and data in a given data snapshot output from an AMR simulation. Objects are described by physical shapes and orientations, rather than the data structures dictated by the code. This enables an intuitive and physically meaningful entry point to data analysis, rather than a pragmatic approach based on the underlying simulation code base.

Physical Objects and Data Selection

One of the difficulties in dealing with rectilinear adaptive mesh refinement data is the fundamental disconnect between the geometries of the grid structure and the objects described by the simulation. One does not expect galaxies to form and be shaped as rectangular prisms; as such, access to physically-meaningful structures must be provided. Therefore, yt provides

  • Spheres
  • Rectangular prisms
  • Cylinders (disks)
  • Arbitrary regions based on logical operations
  • Topologically-connected sets of cells
  • Axis-orthogonal and arbitrary-angle rays
  • Axis-orthogonal and arbitrary-angle slices
  • Arbitrary fixed-resolution grids
  • Projected planes

Each of these regional descriptors is presented to the user as a single object, and when accessed the data is returned at the finest resolution available; all overlapping coarse grid cells are removed transparently. This was first implemented as physical structures resembling spheres were to be analyzed, followed by disk-like structures, each of which needed to be characterized and studied as a whole. By making available these intuitive and geometrically meaningful data selections, the underlying physical structures that they trace become more accessible to analysis and study.

By overloading the normal Python dictionary-like accessor methods, the objects mediate access to data fields defined at every cell. The simple command

>>> some_object["Density"]

initiates a procedure that begins by examining the current contents of the datastore of object some_object, proceeds to read the Density field from the disk from those grids from which it is culled, concatenates the individual fields into a single array, and then returns that to the user.

The abstraction layer is such that there are several means of interacting with these three-dimensional objects, each of which is conceptually unified, and which respects a given set of data protocols. Due to the flexibility of Python, as well as the versatility of NumPy, this functionality has been easily exposed in the form of multiple returned arrays of data, which are fast and easily manipulated. Below can be seen the calculation of the angular momentum vector of a sphere, and then the usage of that vector to construct a disk with a height relative to the radius.

sp = amr_hierarchy.sphere(center, radius)
print sp["Density"].min()
L_vec = sp.quantities["AngularMomentumVector"]()
my_disk = amr_hierarchy.disk(center, L_vec,
                             radius, radius/100.0)
print my_disk["Density"].min()

These objects handle cell-based data fields natively, but are also able to appropriately select and return particles contained within them. This has facilitated the inclusion of an off-the-shelf halo finder (discussed below) which allows users to quantify the clustering of particles within a region.

Object Storage

The construction of objects, as well as derived data fields, can often be a computationally expensive task; in particular, clumps found by the contouring algorithm (see Contour Finding) and the gravitational binding checks that are used to describe them require a relatively time-consuming set of steps. To save time and enable repeatable analysis, the storage of objects between sessions is essential. Python itself comes with an object serialization protocol called pickle that can handle most objects. However, by default the pickle protocol is greedy – it seeks to take all affiliated data. For a given yt object, this may include the entire hierarchy, the parameter file, all arrays associated with that object, and even user-space variables. Under the assumption that the data used to generate the fields within a given object will be available the next time the object is accessed, we can reduce the size and scope of the pickling process by designing a means of storing and retrieving these objects across sessions.

Implementing the __reduce__ method on an object allows the description of a pickling protocol. For all yt objects, this protocol has been specified as a description in physical space of the object itself; this usually constitutes replicating the arguments to the constructor – the radius and center of a sphere, for instance. For extracted objects based on endices of parent objects, the indices are stored as well. Once the protocol has been executed, binary data designed to reconstruct the object is stored – either in a single, standalone file or in the parameter file-affiliated data store, ending in the extension yt.

The biggest obstacle to retrieving an object is the affiliation of an object with a given parameter file. At their most base level, a parameter file can be described by a path. However, while this works for a single instantiation of a yt session, often between sessions data will be moved – between supercomputing centers, or across mounted external hard drives, or even within a given computing center to a different storage system. A means of addressing, or at least uniquely identifying, parameter files is necessary to ensure uniform access across instances of an analysis session. An absolute path, while unique, is not necessarily invariant. To this end, the basename (the final element in the absolute path), the simulation time, and the creation time of the simulation output (CurrentTimeIdentifier in Enzo) are used to identify a given static output. An MD5 hash is generated of these three items, which is then used as a key for the parameter file. By this means, collisions between different parameter files (rather than copies of a single parameter file) are made extremely unlikely.

Upon retrieval of the object, the key is handed to a parameter file storage object. This object keeps track of all instantiated parameter files; whenever a new parameter file object is instantiated, its hash is generated and compared against the set of existing parameter files. If a match is found, the current path is compared to the path being used during instantiation, and the path in the data store is updated as necessary. If the parameter file is new to the system, it is inserted. By this means, the locations of all known parameter files are kept as up to date as possible; this is by no means a foolproof system, but it works in most cases.

Grid Patches

The Enzo and Orion codes are based around “patch-based” refinement. For every set of cells flagged to be refined, a minimally-enclosing box is selected for refinement. This grid patch is then used as a container and as a computational element, and cell data output to disk is grouped into the parent grid patches. In addition to field and particle data, each possesses a set of attributes that describe its position, its relationship to other grids, and its cell spacing:

  • Parent(s)
  • Unique identifier
  • Level number
  • Left edge
  • Right edge
  • Dimensions
  • Children

The cell spacing is easily computed as dx_{i,j,k} = (LE_{i,j,k}-RE_{i,j,k})/D_{i,j,k} where i,j,k are the axes, LE is the left edge, RE is the right edge and D is the number of cells along that axis. The regions covered by grid patches are not uniquely covered; higher-level child grids overlap with cells in their parent grid, and often that data needs to be removed to ensure that only the highest resolution data is used for analysis purposes. For this purpose, yt provides an affiliated child_mask for every grid; this is a boolean array with identical dimensionality, but wherever a child grid covers a cell, that cell’s index in the mask is set to zero. Everywhere the grid contains the finest data available, the mask cells are set to one. This caching of the locations of child cells enables rapid selection of cells where the data is already the most refined available.

Data Fields

The model for handling data, and processing fundamental data fields into new fields describing derived quantities, is designed to be built on top of an object model. Presupposing the existence of object sphere, we can access the field Density by accessing it in a dictionary like fashion. On top of this, we can build automatically recursive field generators that depend on other fields. All fields, including derived fields, are allowed to be defined by either a component of a data file, or a function that transforms one or more other fields, thus allowing multiple layers of definition to exist, and allowing the user to extend the existing field set as needed.

By defining simple functions that automatically operate via array operations, generating derived fields is straightforward and fast. For instance, a field such as the magnitude of the velocity in a cell

V = \sqrt{v_x^2 + v_y^2 + v_z^2}

can be defined independently of the source of the data:

def VelocityMagnitude(field, data):
    return (data["x-velocity"]**2.0 +
            data["y-velocity"]**2.0 +
            data["z-velocity"]**2.0)**0.5

Each operation acts independently on each element of the source data fields; this preserves the abstraction of fields as undifferentiated sets of cells, when in fact those cells could be distributed spatially over the entire dataset, with varying cell widths and varying grid levels.

Once a function is defined, it is added to a global field container that contains not only the fields, but a set of metadata about each field – the unit specifier, the unit specifier for projected versions of that field, and any implicit or explicit requirements for that field. Field definitions can require that certain parameters be provided (such as a height vector, a center point, a bulk velocity and so on) or, most powerfully, that the data object has some given characteristic. This is typically applied to ensure that data is given in a spatial context; for finite difference solutions, such as calculating the gradient or divergence of a set of fields, yt allows the derived field to mandate that the input data provided in a three-dimensional structure. Furthermore, when specifying that some data object be provided in three dimensions, a number of buffer cells can be specified as well; the returned data structure will then have those buffer cells taken from neighboring grids. This enables higher-order methods to be used in the generation of fields, for instance when a given finite difference stencil extends beyond the computational domain of a single grid patch.

Two-Dimensional Data Representations

In order to make images and plots, yt has several different classes of two-dimensional data representations, all of which can be turned into images. Each of these objects generates a list of variable-resolution points, which are then passed into a C-based pixelization routine that transforms them into a fixed-resolution buffer, defined by a width, a height, and physical boundaries of the source data.

Slices

The simplest means of examining data is through the usage of grid-axis aligned slices through the dataset. This has several benefits - it is easy to calculate which grids and which cells are required to be read off disk (and most data formats allow for easy striding of data off disk, which reduces this operation’s IO overhead) and the process of stepping through a given dataset is relatively easy to automate.

To construct a set of data points representing a slice, all grids intersected by the slice are first examined, and then the index of the cell desired is generated

\rm{floor}(p - v_i)/dx

where p is the position of the slice, v_i is the coordinate of the left-edge of the grid along the axis of the slice and dx is the cell spacing of the grid along the axis of the slice. By this process we construct a set of data points defined as (x_p, dx_p, y_p, dy_p, v) where p indicates that this is in the image plane rather than in the global coordiantes of the simulation, and v is the value of the field selected; furthermore, every returned (x_p,dx_p,y_p,dy_p,v) point does not overlap with any points where dx < dx_p or dy < dy_p; thus each point is the finest resolution available.

To construct an image buffer, these cells are pixelized and placed into a fixed-resolution array, defined by (x_{p,\mathrm{min}},x_{p,\mathrm{max}}, y_{p,\mathrm{min}},
y_{p,\mathrm{max}}). Every pixel in the image plane is iterated over, and any cells that overlap with it are deposited into every pixel I_{ij} as:

\alpha = A_c / A_p \\
\alpha v \rightarrow I_{ij}

where \alpha is an attempt to anti-alias the output image plane, to account for misalignment in the image and world coordinate systems and A_c and A_p are the areas of the cell and pixel respectively. Anti-aliasing can be disabled, as well.

Projections

The nature of adaptive mesh refinement is such that one often wishes to examine either the sum of values along a given sight-line or a weighted-average along a given sight-line. yt provides an algorithm for generating line integrals in an adaptive fashion, such that every returned (x_p,dx_p,y_p,dy_p,v) point does not contain data from any points where dx < dx_p or dy < dy_p; the alternative being a binned histogram, where fixed-width cells are defined perpendicular to the line of sight and then data is filled into those cells. By providing this list of finest-resolution data points in a projected domain, images of any width can be constructed essentially instantaneously; conversely, however, the projection process takes longer, for reasons described below.

To obtain the finest points available, the grids are iterated over in order of the level of refinement – first the coarsest and then proceeding to the finest levels of refinement. The process of projecting a grid is slightly variant, dependent on the desired output from the projection. For weighted averages,

\begin{array}{lcl}
V_{ij} & = & \sum_n v_{ijn}w_{ijn}dl \\
W_{ij} & = & \sum_n w_{ijn}dl
\end{array}

where V_{ij} is the output value at every cell in the image plane, v_{ijn} is every cell in the grid’s data field, w_{ijn} is the weight field at every cell in the grid’s data field, and dl is the path length through a single cell. Note that because this process is conducted on a grid-by-grid basis, and the dl does not change within a given grid, this term can be moved outside of the sum. In the limit of an unweighted integration, W_{ij} is set to 1.0, rather than to the evaluation of the sum. Furthermore, a mask of child cells is reduced with a logical and operation along the axis of projection; any cell where this mask is “False” has data of a higher refinement level available to it. This grid is then compared against all grids on the same level of refinement with which it overlaps; the flattened x and y position arrays are compared via integer indexing and any collisions are combined. This process is repeated with data from coarser grids that has been identified as having subsequent data available to it; each coarse cell is then added to the r^2 cells on the current level of processing, where r is the refinement factor. At this point, all cells in the array of data for the current level where the reduced child mask is “True” are removed from subsequent processing, as they are part of the final output of the projection. All cells where the child mask is “False” are retained to be processed on the next level. In this manner, we create a cascading refinement process, where only two levels of refinement have to be compared at a given time.

When the entire data hierarchy has been processed, the final flattened arrays of V_p and W_p are divided to construct the output data value

v(x,y) = V(x,y)/W(x,y)

which is kept as the weighted average value along the axis of projection. In the case of direct integration, note that W(x,y) is in fact unity, so this is a pass-through operation. Once this process is completed, the projection object respects the same data protocol, and can be plotted in the same way, as an ordinary slice.

Cutting Planes

At some length scales in star formation problems, gas is likely to collapse into a disk, which is often not aligned with the axes of the simulation. By slicing along the axes, patterns such as spiral density waves could be missed, and ultimately go unexamined. In order to better visualize off-axis phenomena, yt is able to create images misaligned with the axes.

A cutting plane is an arbitrarily-aligned plane that transforms the intersected points into a new coordinate system such that they can be pixelized and made into a publication-quality plot. Identifying the data that is transformed into the image, at some arbitrary angle to the disk, is a two-step process.

A central point and a single normal vector are required; this normal vector is taken as normal to the desired image plane. This leaves a degree of freedom for rotation of the image plane about the normal vector and through the central point. A minimization procedure is conducted to determine the appropriate “North” vector in the image plane:

\begin{array}{lclcl}
\mathbf{p_x} & = &  \mathbf{a_0} & \mathbf{\times} & \mathbf{n} \\
\mathbf{p_y} & = &  \mathbf{n}   & \mathbf{\times} & \mathbf{p_x} \\
\mathbf{d}   & = & -\mathbf{c}   & \mathbf{\cdot}  & \mathbf{n}
\end{array}

where \mathbf{a_0} is the axis with which the normal vector (mathbf{n}) has the greatest cross product, \mathbf{c} is the vector to the center point of the plane, and \mathbf{d} is the inclination vector. From this we construct two matrices, the rotation matrix:

R = \left(\begin{array}{ccc}
p_{xi} & p_{xj} & p_{xk} \\
p_{yi} & p_{yj} & p_{yk} \\
n_{i} & n_{j} & n_{k}
\end{array}\right)

and its inverse, which are used to rotate coordiantes into and out of the image plane, respectively. Grids are identified as being intersected by the cutting plane through fast array operations on their boundaries. We define a new array, D, where

D_{ij} = \mathbf{v}_{ji} \mathbf{\cdot} \mathbf{d}

where the index i is over each grid and the index j refers to which of the eight grid vertices (mathbf{v}) of the grid is being examined. Grids are accepted if all three components of every D_{j} is of identical sign:

\mathrm{all}( D_j < 0 ) \mathrm{or~all} (D_j > 0).

Upon identification of the grids that are intersected by the cutting plane, we select data points by examining the distance of the cell-center to the plane, and selecting points where

||\mathbf{p} \mathbf{\cdot} \mathbf{n} + \mathbf{d} || < \frac{\sqrt{dx^2+dy^2+dz^2}}{2}.

This generates a small number of false positives (from regarding a cell as a sphere rather than a rectangular prism), which are removed during the pixelization step when creating a plot. Each data point is then rotated into the image plane via the rotation matrix:

\begin{array}{lcl}
\mathbf{p} \mathbf{\cdot} \mathbf{p}_x & \rightarrow & x_p \\
\mathbf{p} \mathbf{\cdot} \mathbf{p}_y & \rightarrow & y_p.
\end{array}

This technique requires a new pixelization routine, in order to ensure that the correct cells are taken and placed on the plot, which requires an additional set of checks to determine if the cell intersected with the image plane. The process here is similar to the standard pixelization procedure, described above, with the addition of the rotation step. Defining d = \sqrt{dx^2+dy^2+dz^2}, every data point where (x_p \pm d,y_p \pm d) is within the bounds of the image is examined by the pixelization routine for overlap of the data point with a pixel in the output buffer. Every potentially intersecting pixel is then iterated over and the coordinates (x_i, y_i, 0) of the image buffer are rotated via the inverse rotation matrix back to the world coordinates (x', y', z'). These are then compared against the (x, y, z) of this original datapoint. If all three conditions

\begin{array}{lcl}
|x-x'| & < & dx \\
|y-y'| & < & dy \\
|z-z'| & < & dz
\end{array}

are satisfied, the data value from the cell is deposited in that image buffer pixel. An unfortunate side effect of the relatively complicated pixelization procedure, as well as the strict intersection-based inclusion, is that the process of antialising is non-trivial and computationally expensive. As such, these images often appear quite jagged at cell-pixel boundaries. Additionally, utilizing the same transformation and pixelization process, overlaying velocity vectors is trivially accomplished and such a process is included in the toolkit.

Contour Finding

Visual inspection of simulations provides a simple method of identifying distinct hydrodynamic regions; however, a quantitative approach must be taken to describe those regions. Specifically, distinct collapsing regions can be identified by locating topologically-connected sets of cells. The nature of adaptive mesh refinement, wherein a given set of cells may be connected across grid and refinement boundaries, requires traversing grid and resolution boundaries.

Unfortunately, while locating connected sets inside a single-resolution grid is a straightforward but non-trivial problem in recursive programming, extending this in an efficient way to hierarchical datasets can be problematic. To that end, the algorithm implemented in yt checks on a grid-by-grid basis, utilizing a buffer zone of cells at the grid boundary to communicate set identification. The algorithm for identifying these sets is a recursive and iterative process:

  1. Identify grids to be considered, such as from AMRSphereBase object
  2. Give unique identification numbers to all finest-level cells within the desired contour (v_{\mathrm{min}} \leq v \leq v_{\mathrm{max}})
  3. Construct expandable queue of grids to be examined
    1. Give unique identification number to all coarse-cells in considered grid within desired contour (v_{\mathrm{min}} \leq v \leq v_{\mathrm{max}})
    2. Obtain buffer zone of one cell-width, including contour IDs
    3. Recursively examine all cells identified as contour members
      1. Update contour ID to be the maximum of 26 neighboring cells
      2. If current contour ID is greater than original contour ID, repeat until it is not
      3. Notify all neighboring cells with contour ID less than current contour ID to re-examine neighbors and update
    4. Flush contour IDs in buffer zone to originating grids
    5. If any buffer zones contour IDs have changed during this process, re-order queue such that the next grids to be examined are originating grids of changed contour IDs
  4. Reorder contour IDs such that the largest contours have the lowest numbers
  5. Return extracted contour objects

Any contour that crosses into the buffer zones mandates a reconsideration of all grids that intersect with the currently considered grid. This process is expensive, as it operates recursively, but ensures that all contours are automatically joined.

Once contours are identified, they are split into individual derived objects that are returned to the user. This presents an integrated interface for generating and analyzing topologically-connected sets of related cells. This method was used in [2009ApJ-691-441S] to study fragmentation of collapsing gas clouds, specifically to examine the gravitational boundedness of these clouds and the length and density scales at which fragmentation occurs.

To determine whether or not an object is bound, we evaluate the inequality

\sum_{i=1}^{N}\frac{m_iv_i^2}{2} < \sum_{i=1}^{N-1}\sum_{j=i+1}^{N}\frac{Gm_im_j}{r}

where n is the number of cells in the identified contour. The left hand side of this equation is the total kinetic energy in the object; if desired, the internal thermal energy (nkT / (gamma-1)) can also be added to this term. This code has been written to run either in a hand-coded C module or on the graphics processor, using NVIDIA’s CUDA framework (http://www.nvidia.com/cuda/) via the PyCUDA (http://mathema.tician.de/software/pycuda) package. Moving the calculation onto the graphics card speeds the calculation up nearly ideally by two orders of magnitude. This allows for binding checks on extremely large datasets in a mangeable amount of time.

Fixed Resolution Grids

The particular structures of multi-resolution data can impede certain classes of algorithms. To address this need, the creation of fixed-resolution (and three-dimensional) arrays of data must be easy and accessible. However, unless the entire region under consideration is contained within a single grid patch, it can be difficult to construct these arrays. The method included in yt for creating these “covering grids” is to select all grids within a given rectangular prism. These grids are then iterated over, starting on the coarsest level, and used to fill in each point in the new array. Only cells that intersect with the array are considered, and any grid cell that intersects with any cell within the covering grid is included, as long as the child mask for that cell indicates no finer data is available. By this method, the entire covering grid is filled in with the finest cells available to it. This can be utilized for generating ghost zones, as well as for minimum covering grids out of many single-resolution grids that are disjoint in the domain.

However, because coarse cells are duplicated across all cells in the (possibly finer-resolution) covering grid with which they intersect, this can lead to unwanted resolution artifacts. To combat this, a “smoothed” covering grid object is also available. This object is filled in completely at all levels l
< L where L is the level at which the covering grid is being extracted. Once a given level has been filled in, the grid is trilinearly interpolated to the next level, and then all new data points from grids at that level replace existing data points. This method is suitable for generating smoothed multi-resolution grids and constructing vertex-centered data, as used in Section Immersive Visualization with VTK.

Multi-dimensional Profiles

Distributions of data within the space of other variables are often necessary when examining and analyzing data. For instance, in a collapsing gas cloud, examining the average temperature with increasing radius from a central location provides a convenient means of examining the process of collapse, as well as the effective equation of state. To conduct this sort of analysis, typically a multi-dimensional histogram is constructed, wherein the values in every bin are weighted averages of some additional quantity. In yt, the term “profile” is used to describe any weighted average or distribution of a variable with respect to a second, independent variable. Such uses include a histogram of temperature with respect to density, a radial profile of molecular hydrogen fraction, and a radius, temperature, and velocity phase diagram. With the usage of the open-source, 3D rendering engine S2PLOT (http://astronomy.swin.edu.au/s2plot/index.php?title=S2PLOT), these profiles can have up to three independent variables.

One can imagine profiles serving two different purposes: to show the average value of a variable at a fixed location in the phase space of a set of independent variables, or for the distribution of a variable with respect to a set of independent variables. The first step is that of binning or histogramming. We define up to three axes of comparison, which will be designated x, y, and z, but should not be confused with the spatial axes. These are discretized into x_0 ... x_n where n is the number of bins along the specified axis. Indices j for each value among the set of points being profiled are then generated along each axis such that

x_{j} \leq v_i < x_{j+1}.

These indices are then used to calculated the weighted average in each bin:

V_j = \frac{\sum_{i=1}^{N} v_i w_i}{\sum_{i=1}^{N} w_i}

where V_j is now the average value in bin j in our weighted average, and the N points are selected such that their index along the considered axis is j. If we wish to examine multiple dimensions, we simply mandate that in all dimensions, the index of all the points used in the average is the index of the bin into which values are being placed. To conduct a non-averaged distribution, the weights are all set to 1.0 in the numerator, and the sum in the denominator is not calculated. This allows, for example, the examination of mass distribution in a plane defined by chemo-thermal quantities.

Parallel Analysis

As the capabilities of supercomputers grow, the size of datasets grows as well. Most standalone codes are not parallelized; the process is time-consuming, complicated, and error-prone. Therefore, the disconnect between simulation time and data analysis time has grown ever larger. In order to meet these changing needs, yt has been modified to run in parallel on multiple independent processing units on a single dataset. Specifically, utilizing the Message Passing Interface (MPI) via the MPI4Py (http://code.google.com/p/mpi4py/) module, a lightweight, NumPy-native wrapper that enables natural access to the C-based routines for interprocess communication, the code has been able to subdivide datasets into multiple decomposed regions that can then be analyzed independently and joined to provide a final result. A primary goal of this process has been to preserve at all times the API, such that the user can submit an unchanged serial script to a batch processing queue, and the toolkit will recognize it is being run in parallel and distribute tasks appropriately.

The tasks in yt that require parallel analysis can be divided into two broad categories: those tasks that act on data in an unordered, uncorrelated fashion (such as weighted histograms, summations, and some bulk property calculation), and those tasks that act on a decomposed domain (such as halo finding and projection).

Unordered Analysis

To parallelize unordered analysis tasks, a set of convenience functions have been implemented utilizing an initialize/finalize formalism; this abstracts the entirety of the analysis task as a transaction. Signaling the beginning and end of the analysis transaction sets in motion several procedures, defined by the analysis task itself, that handle the initialization of data objects and variables and that combine information across processors. These are abstracted by the base class ParallelAnalysisInterface, which implements several different methods useful for parallel analysis. By this means, the intrustion of parallel methods and algorithms into previously serial tasks is kept to a minimum; invasive changes are typically not necessary.

This transaction follows several steps:

  1. Call get_grids to obtain list of grids to process
  2. Iterator calls object._initialize_parallel
  3. Object processes each grid
  4. Iterator calls object._finalize_parallel and raises StopIteration.

Inside the routine get_grids the iterator decomposes the full collection of grids into chunks based on the organization of the datasets on disk. Implementation of the parallel analysis interface mandates that objects implement two gatekeeper functions, object._initialize_parallel and object._finalize_parallel. These two functions are allowed to broadcast and communicate with other processors. At the end of the finalization step, the object is expected to be identical on all processors. This enables scripts to be run identically in parallel and in serial. For unordered analysis, this process results in close-to-ideal scaling with the number of processors.

Upon initialization, ParallelAnalysisInterface determines which sets of data will be processed by which processors. In order to decompose a task across processors, a means of assigning grids to processors is required. For spatially oriented-tasks (such as projections) this is simple and accomplished through the decomposition of some spatial domain. For unordered analysis tasks, the clear means by which grids can be selected is through a minimization of file input overhead. The process of reading a single set of grid data from disk can be outlined as:

  1. Open file
  2. Seek to grid data position
  3. Read data
  4. Close file

However, in the case of “packed” Enzo data, as well as all Orion data, multiple grids are written to a single file. If we know the order in which these grids are written, we can consolidate several data reads into a single operation:

  1. Open file
  2. For each grid
    1. Seek to grid position
    2. Read each field
  3. Close file

If we know the means by which the grids and fields are ordered on disk, we can simplify the seeking requirements and instead read in large sweeps across the disk. By futher pre-allocating all necessary memory, this becomes a single operation that can be accomplished in one “sweep” across each file. By allocating as many grids from a single “grid output” file on a single processor, this procedure can be used to minimize file overhead on each processor.

Spatial Decomposition

MPI provides a means of decomposing an arbitrary region across a given number of processors. Because of the inherently spatial nature of the adaptive projection algorithm implemented in yt, parallelization requires decomposition with respect to the image plane; however, future revisions of the algorithm may allow for unordered grid projection. To project in parallel, the computational domain is divided such that the image plane is distributed equally among the processors; each component of the image plane is then used to construct rectangular prisms along the entire line of sight. Each processor is thus allocated a rectangular prism of dimensions

(L_i, L_j, L_d)

where the axes have been rotated such that the line of sight of the projection is the third dimension, L_i L_j is constant across processors, and L_d is the entire computational domain along the axis of projection. Following the projection algorithm, each processor will then have a final image plane set of points, as per usual:

(x_p, dx_p, y_p, dy_p, v)

but subject to the constraints that all points are contained within the rectangular prism as prescribed by the image plane decomposition. At the end of the projection step all processors join their image arrays, which are guaranteed to contain only unique points.

Enzo and Orion utilize different file formats, but both are designed to output a single file per processor with all constituent grids computed on that processor localized to that file. Unfortunately, both codes conduct “load balancing” operations on the computational domain, so processors are not necessarily guaranteed to have spatially localized grids; this results in the output format not being spatially decomposed, but rather unordered. As a result, this method of projection does not scale as well as desired, because each processor is likely to have to read grid datasets from many files. Despite that, the communication overhead is essentially irrelevant, because the processors only need to communicate the end of the projection process, to share their non-overlapping final result with all other processors in the computational group.

Halo Finding

In cosmological hydrodynamic simulations, dark matter particles and gas parcels are coupled through gravitational interaction. Furthermore, dark matter dominates gravitational interaction on all but the smallest scales. Dark matter particles act as a collisionless fluid, and are the first component of the simulation to collapse into identifiable structures; as such, they can be used effectively to identify regions of structure formation.

The HOP algorithm [eishut98] is an effective and tested means of identifying collapsed dark matter halos in a simulation, and has been a part of the Enzo code distribution for some time. Typically an Enzo simulation is allowed to execute to completion, an entire dataset is loaded into memory, and then the HOP algorithm processes the entire domain. This process is memory-intensive, and requires that the entire dataset be loaded into a single computer. It is not inherently parallel and thus does no domain decomposition. The output from this is a single list of halos and the associated densities, masses, particle identifiers, positions, and so on. The HOP algorithm works by assigning a density to every particle; each particle then “hops” to its most dense neighbor. Each set of particles sharing a most dense neighbor is then called a group, and any groups with a density below the minimum density threshold (a free parameter) is removed from the final list of groups. These groups are then rejoined along boundaries.

Including this code inside yt, as a means of abstracting away compilation and data access, was trivial; however, to do so the input to HOP was generalized to be an arbitrary three-dimensional data source. As a result, the HOP algorithm can now be applied on subsets of the domain. By decomposing the domain into multiple tiles with a buffer region, the HOP algorithm can be run on multiple processors, with a final “join” operation performed to construct a full halo list. Any halo whose most dense point is located within the buffer zone is cut, as those halos should be found on neighboring tiles.

However, the free parameter in this calculation is that of the size of the buffer zone. A balance must be struck between identification of objects and memory requirements; clearly, based on the means of identifying, if a halo happens to reside within the buffer zone of a tile and it is greater in spatial extent than that of the buffer zone, it will be truncated on both sides. This problem is mitigated by the particular set of problems where a parallel halo finder is needed. These problems, with more particles than can fit in the main memory of a standard HPC cluster node, are likely to be extremely large physical domain problems, with relatively small halos. In the circumstances where a large simulation has very large halos, greater than the size of the buffer zone, this method would be unsuitable, as it would split the identification of halos over the buffer zones. This situation could arise, for instance, in a relatively small physical domain simulation with extremely high resolution dark matter particles, where micro-halos could be missed by this technique.

The goal of having a parallel halo finder is to reduce the memory and processing time overhead for large simulations; by distributing the identification of dark matter halos across multiple, independent processors, we gain an increased efficiency, but we must construct creative means of communication. As such, the halo data container objects themselves have been transformed into “proxy” objects, transparently communicating requests for information.

Plotting and Visualization Layer

The plotting layer, yt.raven, can plot one-, two- and three-dimensional histograms of quantities, allowing for weighting and binning of those results. A set of pixelization routines have been written in C to provide a means of taking a set of variable-size pixels and constructing a uniform grid of values, suitable for fast plotting in Matplotlib. Applicable cases include non-axially perpendicular planes, allowing for oblique slices to be plotted and displayed with publication-quality rendering. Callbacks are available for overlaying analytic solutions, grid-patch boundaries, vectors, contours, and arbitrary annotation.

Constraints of Scale

In order to manage simulations consisting of hundreds of thousands of discrete grid patches – as well as their attendant grid cell values – bottlenecks have been located and eliminated using the cProfile module. Additionally, the practice of storing data about simulation outputs between instantiation of the Python objects has been extended; this speeds subsequent startups, and enables faster response times. Because very large hierarchies consume substantial time during the parsing and instantiation of attributes, a core set of data about the geometry and structure of the grid objects is stored in a fast array format, eliminating the need to repeatedly convert text values to internal floating point representation.

Enzo data is written in one of three ways, the most efficient way being via the Hierarchical Data Format (HDF5) with a single file per processor that the simulation was run on. To limit the effect that disk access has on the process of loading data, hand-written wrappers to the HDF5 have been inserted into the code. These wrappers are lightweight, and operate on a single file at a time, loading data in the order it has been written to the disk. The package PyTables was used for some time, but the instantiation of the object hierarchy was found to be too much overhead for the brief and well-directed access desired.

Frontends and Interfaces

yt was originally intended to be used from the command line, and images to be viewed either in a web browser or via an X11 connection that forwarded the output of an image viewer. However, a happy side-effect of this architecture, as well as the versatile Matplotlib “Canvas” interface, is that the yt API, designed to have a single interface to analysis tasks, is easily accessed and utilized by different interfaces. By ensuring that this API is stable and flexible, GUIs, web-interfaces, and command-line scripts can be constructed to perform common tasks.

Not all environments have access to the same level of interactivity. For large-scale datasets, being able to interact with the data through a scripting interface enables submission to a batch processing queue, which enables appropriate allocation of resources. For smaller datasets, the process of interactively exploring datasets via graphical user interfaces, exposing analytical techniques not available to an offline interface, is extremely worthwhile, as it can be highly immersive.

The canonical graphical user interface is written in wxPython, and presents to the user a hierarchical listing of data objects: static outputs from the simulation, as well as spatially-oriented objects derived from those outputs. The tabbed display pane shows visual representations of these objects in the form of embedded Matplotlib figures, as seen in Figure ref{fig:yt:reason_bds}.

An interface to the interactive Matplotlib pylab interface, via IPython, has been prepared. This enables the user to generate plots that are thematically linked, and thus display a uniform spatial extent. Further enhancements to this IPython interface, via the profile system, have been targeted for the next release.

Embedding yt Inside Enzo

An outstanding problem in the analysis of large scale data is that of the disk; while data can be written to the disk, read back, and then analyzed in an arbitrary fashion, this process is not only slow but requires substantial intermediate disk space for a substantial quantity of data that will undergo severely reductionist analysis. To address this problem, the typical solution is to insert analysis code, generation of derived quantities, images, and so forth, into the simulation code. However, the usual means of doing this is through either a substantial hand-written framework that attempts to account for every analysis task, or a limited framework that only handles very limited analysis tasks.

Furthermore, by enabling in-line analysis, the relative quantity of analysis output is substantially greater than that enabled by disk-mediated analysis. Removing numerous large files dumped to disk as a prerequisite for conducting analysis and generating visualization allows for a much more favorable ratio of data to analyzed data. For a typical Population III star formation simulation, the size of the data dumps can be as much as 10 gigabytes per timestep; however, the relative amount of information that can be gleaned from these outputs is significantly smaller. Using smaller data output mechanisms as well as more clever streaming methods can improve this ratio; however, by enabling in-line analysis, images of the evolution of a collapsing Population III halo can be output at every single update of the hydrodynamical time, allowing for true “movies” of star formation to be produced. By allowing for the creation and exporting of radial profiles and other analytical methods, this technique opens up vast avenues for analysis while simulations are being conducted, rather than afterward.

The Python/C API allows for passage of data in-memory to an instance of the Python interpreter; by embedding a Python interpreter within each running Enzo MPI task, Enzo is able to pass existing data to a newly spawned yt analysis task, and thus disintermediate the disk completely. While this currently works for many relatively simple tasks, it is not currently able to decompose data spatially; as we are constrained by the parallel nature of the Enzo domain decomposition, we attempt to avoid passing data between MPI tasks. This means if a grid is owned by MPI task 1, it will not be passed to MPI task 2 during the analysis stage.

Generalization to Other AMR Codes

As mentioned above, yt was designed to handle and analyze data output from the AMR code Enzo. The entire codebase has been ported to work equally well with data from other AMR codes, beginning with the Orion code in use at the University of California, Berkeley. However, different codes make separate sets of assumptions about outputted data, and this must be generalized to be non-Enzo specific. In this process, a balance had to be struck between generalizing data reading and specifications, as well as simplicity and speed. This led to a minimally invasive set of changes, which have been put into place.

The primary architectural change that had to be made was generalizing the means by which data fields were recognized and handled by yt. Orion, specifically, stores a different set of state vectors than Enzo. For instance, momentum replaces velocity. To accommodate this, while retaining identical sets of derived fields, a new hierarchy of derived field containers was created: the base set of fields that are “universal,” the Enzo-specific fields, and the Orion-specific fields. The code-specific field containers are responsible for accepting raw data output by the simulation and converting that into a format that the “universal” field set can understand. Unit conversion, as well as transformation of state vectors, and additionally dealing with different assumptions about cell-face and cell-centered field information. Implementing these field containers following the “Borg” design pattern, wherein all instances of a class share a single state, enabled all derived fields, regardless of how generated, to be shared across all data output types and instances.

In the future, yt will be expanded to handle and analyze other adaptive mesh refinement codes. Work has begun to port it to handle data output by the FLASH code; a major difficulty in doing so, however, is the handling of the FLASH data format. Unlike both the Enzo and Orion codes, FLASH uses an octree, cell-based refinement scheme. Two ways forward are obvious: either each refined cell is assigned its own grid patch, or a volume segementation algorithm can be xecuted to place rectangular prisms in refined regions, thus identifying grid patches.

By providing a unified interface to multiple, often competing, AMR codes, we will be able to utilize similar, if not identical, analysis scripts and algorithms, which will enable direct comparison of results between groups and across methods. Analyzing multiple datasets of identical phenomena at a single time with a single analysis framework is an important and powerful means of comparison across methods and scientific collaborations. Furthermore, utilizing identical means of data access allows for conversion of data between groups for subsequent analysis and re-simulation. Through this method, the results and methods of computation can be verified and compared.

Immersive Visualization with VTK

Visualizing multi-resolution three-dimensional datasets requires careful and detailed methods. While yt makes no claims to be a complete solution for such visualization, it provides hooks for exporting data as well as utilizing external libraries for three-dimensional visulization.

A VTK-based frontend has been implemented, utilizing the Traits technology and the TVTK library from Enthought, Inc (http://www.enthought.com/). Traits is a rapid application development environment that provides for semi-static typing of variables. This provides the ability to rapidly generate GUIs, as well as validation of input and notification based on change of state of variables.

The TVTK library provides for the construction of multi-resolution structured grid objects called vtkHierarchicalBoxDataSets, which are processed as a group rather than as discrete, unique elements. To enable this computation, I created a patch to expose the functionality of the vtkHierarchicalBoxDataSet to a scripting interface; this was then exposed to the user and interactive widgets provided for manipulation and creation of contour sets (using the marching cubes algorithm) and planes that cut the volume at arbitrary angles.

We are confined to a maximum of twelve levels due to the precision of the VTK positioning mechanism; attempting to position with finer than single-precision coordinates results in overlapping and indistinguishable elements. In order to expose the deepest hierarchies (with many levels of refinement) a subsection must be excised and presented to the library. This consists of an extraction of all grids confined by a box, defined by (x_0, y_0, z_0) ... (x_1, y_1, z_1) and K_n ...  K_{n+12}, where the coordinates define the left and right edges and the K variable refers to the level of the base grid presented to VTK. We scale the left and right edges of the grids in this subregion

\begin{array}{lcl}
(L - L_{\mathrm{min}}) & \rightarrow & L_s \\
r^{l-l_0}(R - L_{\mathrm{min}}) & \rightarrow & R_s
\end{array}

where L and R are the sets of (x_0, y_0, z_0) and (x_1, y_1, z_1), r is the refinement factor, l is the level, l_0 is the first level of the extraction and L_s and R_s are the final scaled values. The grids from the coarsest level are replaced with a smoothed minimal covering box, which may incorporate data from lower levels. This enables us to have a base “medium” into which the higher-resolution levels are placed, rather than multiple disjoint root-level grids. However, by providing this coordinate conversion in both directions, locations in the base data set can be referenced in a straightforward manner.

VTK does not provide the same quality of visualization for AMR data that other solutions do; however, it provides a valuable and flexible means of exploring data, and one that is free and open source. As such, it is currently the preferred direction for future ventures into immersive visualization with yt. Furthermore, because it is a base library with a structured approach to visualizing data, it can be used as a basis for more complicated rendering schemes. Unfortunately, because those schemes likely require a more complicated data structure, the overlap may be minimal.

The VTK camera system is straightforward and easy to manipulate; the interface between yt and VTK has been equipped with a means of recording, manipulating, playing back and saving camera paths based on points of motion. The user navigates from position to position by whatever means they desire, takes a “snapshot” of the current camera position and orientation, and then specifies how many points on the line they desire. By interpolating the rotation and translation between fixed camera positions, a smooth path of arbitrary frame frequency can be generated and exported to other systems of visualization. Currently only linear interpolation between points is supported; higher-order interpolation would produce smoother camera paths.

Community Involvement

I have conducted the vast majority of development on yt, accounting for 816 of the 968 version control “commits” of the 52,000 lines of code comprising yt (and several included but external packages) as of the end of February, 2009. However, in recent months, as distribution of the toolkit has increased and as the user base has increased, a substantial uptick in user involvement and submitted development has ocurred. In particular, several of the developments discussed here have been explored and implemented by users, including the streamlined halo analyzer, the light cone generation, the parallel halo finder, and the original implementation of the clump finding process, based on the contour finding primitives.

The public face to yt is that of a web page (http://yt.enzotools.org/), with integrated source control system, ticket and bug tracker, wiki pages, mailing list, recipe book, and “pastebin” of code snippets. By specifying a command line option to any script utilizing yt libraries, users can upload error messages and scripts to a central location, where they can be examined, commented on, improved, and discussed.

Currently yt is being developed at four different institutions across the United States, and has users in at least ten different institutions worldwide. The first official release (yt-1.0) was bundled with Enzo 1.5, and the next release is being prepared by a six-person team of developers writing documentation, fixing bugs, adding features, and providing support for other users. The availability of Python, the simplified all-in-one installation script and the growing user community are clearly factors in this growth of usage; hopefully, in the future, the project will become less centralized and more of a community effort.

Future Directions

As the capabilities of yt expand, the ability to extend it to perform new tasks expands as well. By publishing yt, and generalizing it to work on multiple AMR codebases, I hope it will foster collaboration and community efforts toward understanding astrophysical problems and physical processes, while enabling reproducible research. The roadmap for yt has several key milestones; the first of which will be a substantially rewritten set of documentation and the announcement of the general usability of the parallel analysis tasks. Further tasks include better, higher-level interfaces; an expanded scripting interface to yt, and in addition larger-scale “recipes” which would provide easier entry points to analysis and visualization.

One of the weakest aspects of yt is that of time-series analysis. Currently, individual parameter files must be examined and instantiated; this process has been eased by a variety of “recipes” for instantiation and analysis over a set, but unfortunately it is still hobbled by an awkward interface and the tethering of data objects to individual parameter files. By disconnecting data objects from the hierarchy, time-series analysis would become much more tractable; this would enable the construction of time series outputs, composed of multiple static outputs or a single set of “streaming” outputs. These time series objects could be affiliated with data objects assigned a fixed set of parameters defining their selection region, but a varying time component.

By extending the ability to generate synthetic observations, yt will become of greater use for the verification of astrophysical simulations. The ultimate product should be that of telescope-simulated images; ideally, these images could be subjected to identical scrutiny and analysis as those taken directly from telescopes. The prospects for utilizing the same framework for generation of simulated images as well as arbitrary analysis are exciting.

[vg06-kaehler]Kaehler, R., Wise, J., Abel, T., & Hege, H.-C. 2006, in Proceedings of the International Workshop on Volume Graphics 2006 (Boston: Eurographics / IEEE VGTC 2006), 103–110
[2007ApJ-671-27H]Hallman, E. J., O’Shea, B. W., Burns, J. O., Norman, M. L., Harkness, R., & Wagner, R. 2007, ApJ, 671, 27
[2009ApJ-696-96W]Wang, P. & Abel, T. 2009, ApJ, 696, 96
[2009ApJ-691-441S]Smith, B. D., Turk, M. J., Sigurdsson, S., O’Shea, B. W., & Norman, M. L. 2009, ApJ, 691, 441
[eishut98]Eisenstein, D. J. & Hut, P. 1998, ApJ, 498, 137