Analyzing in Parallel

Right now, analyzing in parallel is a tricky business. With the next major release of yt, we intend to include a significantly improved infrastructure for dispatching analysis via a queuing system.

Projecting in Parallel

Below is a script (parallel_projection_mpi4py.py) that shows a proof-of-concept parallel projection. It relies on MPI4Py and PyTables, and currently only works with a decomposition onto a square number of processors.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#
# This is a quick-and-dirty method of testing the parallel projection sketch
# I've created.  (Matt)
#
fn = "my_gigantic_data.dir/my_gigantic_data"

from mpi4py import MPI

import math, time
from yt.config import ytcfg
num_proc = MPI.COMM_WORLD.size
my_id = MPI.COMM_WORLD.rank
field_name = "Density"
time.sleep(my_id) # Offset our IO slightly

# Now a bit of boilerplate to ensure we're not doing any IO
# unless we're the root processor.
ytcfg["yt","logfile"] = "False"
ytcfg["lagos","ReconstructHierarchy"] = "False"
if my_id == 0:
    ytcfg["lagos","serialize"] = "True"
    ytcfg["lagos","onlydeserialize"] = "False"
else:
    ytcfg["lagos","onlydeserialize"] = "True"

from yt.mods import *
pf = get_pf()

# Domain decomposition.
x_edge = na.mgrid[0:1:(na.sqrt(num_proc) + 1)*1j]
y_edge = na.mgrid[0:1:(na.sqrt(num_proc) + 1)*1j]

xe_i = int(math.floor(my_id/na.sqrt(num_proc)))
ye_i = my_id % na.sqrt(num_proc)

# Note that here we are setting it to be projected along axis zero
LE = [0.0, x_edge[xe_i], y_edge[ye_i]]
RE = [1.0, x_edge[xe_i+1], y_edge[ye_i+1]]

reg = pf.h.region([.5,.5,.5],LE,RE) # center at 0.5 but only project sub-regions
# Record the corners of our region
open("LE_RE_%02i.txt" % my_id,"w").write("%s, %s\n" % (LE,RE))
proj = pf.h.proj(0,field_name,source=reg) # Actually *do* the projection here

if my_id == 0:
    # Now we collect!
    d = [proj.data]
    for i in range(1,num_proc):
        # Blocking receive
        d.append(MPI.COMM_WORLD.Recv(source=i, tag=0))
    new_proj = {}
    for key in proj.data.keys():
        new_proj[key] = na.concatenate([mm[key] for mm in d])
    proj_array = na.array([new_proj['px'],new_proj['py'],
                           new_proj['pdx'],new_proj['pdy'],
                           new_proj[field_name]])
    # We've now received all of our data and constructed an
    # array of the pixelization.  So, let's store it.
    import tables
    p = tables.openFile("result_mpi4py.h5","w")
    p.createArray("/","Test",proj_array)
else:
    # proj.data is where the dictionary of projection values is kept
    MPI.COMM_WORLD.Send(proj.data, dest=0, tag=0)

Analyzing Objects in Parallel

The process of analyzing objects in parallel if they don’t require in-line data comparison – for example, creating phase diagrams of multiple galaxies in a large scale cosmological simulation – is significantly simpler, and can be conducted in a very straightforward fashion.

The below example takes a list of HOP centers, generated by yt and then written out with the write_out() method, reads them in and then chooses which one to examine based on the processor number. Note that MPI4Py might be overkill here, but it suffices for our purposes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from mpi4py import MPI
from yt.mods import *

num_proc = MPI.COMM_WORLD.size
my_id = MPI.COMM_WORLD.rank

pf = get_pf()

hop_centers = []
for line in open("HOP.txt"):
    if line[0] == "#": continue
    # Maximum density location
    hop_centers.append( [float(i) for i in line.split()[4:7]] )
    # Maximum radius
    hop_radii.append(float(line.split()[-1]))

results = open("results_%04i.txt" % my_id)
# Now we want to start at my_id, jump by num_proc each step
# and hit the len(hop_centers).
for hop_id in na.mgrid[my_id:len(hop_centers):num_proc]:
    # This is where our analysis goes.
    sp = pf.h.sphere(hop_centers[hop_id], hop_radii[hop_id])
    axv = sp.quantities["WeightedAverageQuantity"]("x-velocity","CellMassMsun")
    results.write("%04i\t%0.9e\n" % hop_id, axv)
results.close()

Table Of Contents

Previous topic

Creating Derived Fields

Next topic

FAQ

This Page

Quick search