Halo Finding

Section author: Stephen Skory <sskory@physics.ucsd.edu>

There are two methods of finding particle haloes in yt. The recommended and default method is called HOP, a method described in Eisenstein and Hut (1998). A basic friends-of-friends (e.g. Efstathiou et al. (1985)) halo finder is also implemented, however at this time it should be considered experimental.

HOP

The version of HOP used in yt is an upgraded version of the publicly available HOP code. Support for 64-bit floats and integers has been added, as well as parallel analysis through spatial decomposition. HOP builds groups in this fashion:

  1. Estimates the local density at each particle using a smoothing kernel.
  2. Builds chains of linked particles by ‘hopping’ from one particle to its densest neighbor. A particle which is its own densest neighbor is the end of the chain.
  3. All chains that share the same densest particle are grouped together.
  4. Groups are included, linked together, or discarded depending on the user-supplied over density threshold parameter. The default is 160.0.

Please see the HOP method paper for full details.

Friends-of-Friends

The version of FoF in yt is based on the publicly available FoF code from the University of Washington. Like HOP, FoF supports parallel analysis through spatial decomposition. FoF is much simpler than HOP:

  1. From the total number of particles, and the volume of the region, the average inter-particle spacing is calculated.
  2. Pairs of particles closer together than some fraction of the average inter-particle spacing (the default is 0.2) are linked together. Particles can be paired with more than one other particle.
  3. The final groups are formed the networks of particles linked together by friends, hence the name.

Warning

The FoF halo finder in yt is not thoroughly tested! It is probably fine to use, but you are strongly encouraged to check your results against the data for errors.

Running HaloFinder

Running HOP on a dataset is straightforward

from yt.mods import *
pf = load("data0001")
halo_list = HaloFinder(pf)
:language: python

Running FoF is similar:

from yt.mods import *
pf = load("data0001")
halo_list = FOFHaloFinder(pf)

Halo Data Access

halo_list is a list of Halo class objects ordered by decreasing halo size. A Halo object has convenient ways to access halo data. This loop will print the location of the center of mass for each halo found

for halo in halo_list:
    print halo.center_of_mass()

All the methods are:

  • .center_of_mass() - the center of mass for the halo.
  • .maximum_density() - the maximum density in “HOP” units.
  • .maximum_density_location() - the location of the maximum density particle in the HOP halo.
  • .total_mass() - the mass of the halo in Msol (not Msol/h).
  • .bulk_velocity() - the velocity of the center of mass of the halo in simulation units.
  • .maximum_radius() - the distance from the center of mass to the most distant particle in the halo in simulation units.
  • .get_size() - the number of particles in the halo.
  • .get_sphere() - returns an an EnzoSphere object using the center of mass and maximum radius.

Note

For FOF the maximum density value is meaningless and is set to -1 by default. For FOF the maximum density location will be identical to the center of mass location.

The command

halo_list.write_out("HaloAnalysis.out")

will output the results of HOP or FoF to a text file named HaloAnalysis.out. The file contains each of the data values listed above except for .get_sphere().

For each halo the data for the particles in the halo can be accessed like this

for halo in halo_list:
    print halo["particle_index"]
    print halo["particle_position_x"] # in simulation units

Parallel Halo Analysis

Both the HOP and FoF halo finders can run in parallel using spatial decomposition. In order to run them in parallel it is helpful to understand how it works.

Below in the first plot (i) is a simplified depiction of three haloes labeled 1,2 and 3:

../_images/ParallelHaloFinder.png

Halo 3 is twice reflected around the periodic boundary conditions.

In (ii), the volume has been sub-divided into four equal subregions, A,B,C and D, shown with dotted lines. Notice that halo 2 is now in two different subregions, C and D, and that halo 3 is now in three, A, B and D. If the halo finder is run on these four separate subregions, halo 1 is be identified as a single halo, but haloes 2 and 3 are split up into multiple haloes, which is incorrect. The solution is to give each subregion padding to oversample into neighboring regions.

In (iii), subregion C has oversampled into the other three regions, with the periodic boundary conditions taken into account, shown by dot-dashed lines. The other subregions oversample in a similar way.

The halo finder is then run on each padded subregion independently and simultaneously. By oversampling like this, haloes 2 and 3 will both be enclosed fully in at least one subregion and identified completely.

Haloes identified with centers of mass inside the padded part of a subregion are thrown out, eliminating the problem of halo duplication. The centers for the three haloes are shown with stars. Halo 1 will belong to subregion A, 2 to C and 3 to B.

Parallel HaloFinder padding

To run with parallel halo finding, there is a slight modification to the script

from yt.mods import *
pf = load("data0001")
halo_list = HaloFinder(pf,padding=0.02)
# --or--
halo_list = FOFHaloFinder(pf,padding=0.02)

The padding parameter is in simulation units and defaults to 0.02. This parameter is how much padding is added to each of the six sides of a subregion. This value should be 2x-3x larger than the largest expected halo in the simulation. It is unlikely, of course, that the largest object in the simulation will be on a subregion boundary, but there is no way of knowing before the halo finder is run.

In general, a little bit of padding goes a long way, and too much just slows down the analysis and doesn’t improve the answer (but doesn’t change it). It may be worth your time to run the parallel halo finder at a few paddings to find the right amount, especially if you’re analyzing many similar datasets.

Comments

Feel free to leave comments! If you've got a GMail account, you can use https://www.google.com/accounts/o8/id as your OpenID URL.
comments powered by Disqus

Table Of Contents

Previous topic

Extensions

Next topic

Halo Profiling

This Page