Using a Creator to Calculate True Posteriors for a Galaxy Sample

This notebook demonstrates how to use a RAIL Creator to calculate true posteriors for galaxy samples drawn from the same Creator. Note that this notebook assumes you have already read through degradation-demo.ipynb.

Calculating posteriors is more complicated than drawing samples, because it requires more knowledge of the engine that underlies the Creator. In this example, we will use the same engine we used in Degradation demo: FlowEngine which wraps a normalizing flow from the pzflow package.

This notebook will cover three scenarios of increasing complexity:

  1. Calculating posteriors without errors
  2. Calculating posteriors while convolving errors
  3. Calculating posteriors with missing bands

1. Calculating posteriors without errors

For a basic first example, let's make a Creator with no degradation and draw a sample.

Now, let's calculate true posteriors for this sample. Note the important fact here: these are literally the true posteriors for the sample because pzflow gives us direct access to the probability distribution from which the sample was drawn!

When calculating posteriors, the Creator will always require data, which is a pandas DataFrame of the galaxies for which we are calculating posteriors. Because we are using a Creator built on FlowEngine, we also must provide grid, because FlowEngine calculates posteriors over a grid of redshift values.

Let's calculate posteriors for every galaxy in our sample:

Note that Creator returns the pdfs as a qp Ensemble:

Let's plot these pdfs:

The true posteriors are in blue, and the true redshifts are marked by the vertical black lines.

2. Calculating posteriors while convolving errors

Now, let's get a little more sophisticated.

Let's recreate the Creator we were using at the end of the Degradation demo.

I will make one change however: the LSST Error Model sometimes results in non-detections for faint galaxies. These non-detections are flagged with NaN. Calculating posteriors for galaxies with missing magnitudes is more complicated, so for now, I will add one additional QuantityCut to remove any galaxies with missing magnitudes. To see how to calculate posteriors for galaxies with missing magnitudes, see Section 3.

Now let's draw a degraded sample:

This sample has photometric errors that we would like to convolve in the redshift posteriors, so that the posteriors are fully consistent with the errors. We can perform this convolution by sampling from the error distributions, calculating posteriors, and averaging.

FlowEngine has this functionality already built in - we just have to provide err_samples to the get_posterior method.

Let's calculate posteriors with a variable number of error samples.

You can see the effect of convolving the errors. In particular, notice that without error convolution (1 sample), the redshift posterior is often totally inconsistent with the true redshift (marked by the vertical black line). As you convolve more samples, the posterior generally broadens and becomes consistent with the true redshift.

Also notice how the posterior continues to change as you convolve more and more samples. This suggests that you need to do a little testing to ensure that you have convolved enough samples.

Let's plot these same posteriors with even more samples to make sure they have converged:

WARNING: Running the next cell on your computer may exhaust your memory

Notice that two of these galaxies may take upwards of 10,000 samples to converge (convolving over 10,000 samples takes 0.5 seconds / galaxy on my computer)

3. Calculating posteriors with missing bands

Now let's finally tackle posterior calculation with missing bands.

First, lets make a sample that has missing bands. Let's use the same degrader as we used above, except without the final QuantityCut that removed non-detections:

You can see that galaxy 3 has a non-detection in the u band. FlowEngine can handle missing values by marginalizing over that value. By default, FlowEngine will marginalize over NaNs in the u band, using the grid u = np.linspace(25, 31, 10). This default grid should work in most cases, but you may want to change the flag for non-detections, use a different grid for the u band, or marginalize over non-detections in other bands. In order to do these things, you must supply FlowEngine with marginalization rules in the form of the marg_rules dictionary.

Let's imagine we want to use a different grid for u band marginalization. In order to determine what grid to use, we will create a histogram of non-detections in u band vs true u band magnitude (assuming year 10 LSST errors). This will tell me what are reasonable values of u to marginalize over.

Based on this histogram, I will marginalize over u band values from 27 to 31. Like how I tested different numbers of error samples above, here I will test different resolutions for the u band grid.

I will provide our new u band grid in the marg_rules dictionary, which will also include "flag" which tells FlowEngine what my flag for non-detections is. In this simple example, we are using a fixed grid for the u band, but notice that the u band rule takes the form of a function - this is because the grid over which to marginalize can be a function of any of the other variables in the row. If I wanted to marginalize over any other bands, I would need to include corresponding rules in marg_rules too.

For this example, I will only calculate pdfs for galaxy 3, which is the galaxy with a non-detection in the u band. Also, similarly to how I tested the error convolution with a variable number of samples, I will test the marginalization with varying resolutions for the marginalized grid.

Notice that the resolution with only 10 bins is sufficient for this marginalization.

In this example, only one of the bands featured a non-detection, but you can easily marginalize over more bands by including corresponding rules in the marg_rules dict. For example, let's artificially force a non-detection in the y band as well:

Note that marginalizing over multiple bands quickly gets expensive