Using Creators abd Degraders to Generate Galaxy Samples with Errors and Biases

This notebook demonstrates how to use a RAIL Creator to create galaxy samples, and how to use Degraders to add various errors and biases to the sample.

Note that in the parlance of the Creation Module, "degradation" is any post-processing that occurs to the "true" sample generated by the Creator's Engine. This can include adding photometric errors, applying quality cuts, introducing systematic biases, etc.

In this notebook, we will first learn how to draw samples from a RAIL Creator object. Then we will demonstrate how to use the following RAIL Degraders:

  1. LSSTErrorModel, which adds photometric errors
  2. QuantityCut, which applies cuts to the specified columns of the sample
  3. InvRedshiftIncompleteness, which introduces sample incompleteness
  4. LineConfusion, which introduces spectroscopic errors

Throughout the notebook, we will show how you can chain all these Degraders together to build a more complicated degrader. Hopefully, this will allow you to see how you can build your own degrader.

Note on generating redshift posteriors: regardless of what Degraders you apply, when you use a Creator to estimate posteriors, the posteriors will always be calculated with respect to the "true" distribution. This is the whole point of the Creation Module -- you can generate degraded samples for which we still have access to the true posteriors. For an example of how to calculate posteriors, see posterior-demo.ipynb.

"True" Creator

First, let's make a Creator that has no degradation. We can use it to generate a "true" sample, to which we can compare all the degraded samples below.

Note: When instantiating a Creator, you must supply an "engine". This can be any object with sample and get_posterior methods. In this example, we will use a normalizing flow from the pzflow package. However, everything in this notebook is totally agnostic to what the underlying engine is.

Degrader 1: LSSTErrorModel

Now, we will demonstrate the LSSTErrorModel, which adds photometric errors using a model similar to the model from Ivezic et al. 2019 (specifically, it uses the model from this paper, without making the high SNR assumption. To restore this assumption and therefore use the exact model from the paper, set highSNR=True.)

Let's create an error model with the default settings:

To see the details of the model, including the default settings we are using, you can just print the model:

Now let's add this error model as a degrader and draw some samples with photometric errors.

Notice some of the magnitudes are NaN's. These are non-detections. This means those observed magnitudes were beyond the 30mag limit that is default in LSSTErrorModel. You can change this limit and the corresponding flag by setting magLim=... and ndFlag=... in the constructor for LSSTErrorModel.

Let's plot the error as a function of magnitude

You can see that the photometric error increases as magnitude gets dimmer, just like you would expect. Notice, however, that we have galaxies as dim as magnitude 30. This is because the Flow produces a sample much deeper than the LSST 5-sigma limiting magnitudes. There are no galaxies dimmer than magnitude 30 because LSSTErrorModel sets magnitudes > 30 equal to NaN (the default flag for non-detections).

Degrader 2: QuantityCut

Recall how the sample above has galaxies as dim as magnitude 30. This is well beyond the LSST 5-sigma limiting magnitudes, so it will be useful to apply cuts to the data to filter out these super-dim samples. We can apply these cuts using the QuantityCut degrader. This degrader will cut out any samples that do not pass all of the specified cuts.

Let's create a Degrader that first adds photometric errors, then cuts at i<25.3, which is the LSST gold sample.

Now we can stick this into a Creator and draw a new sample

If you look at the i column, you will see there are no longer any samples with i > 25.3. You can also see that despite making the cut on the i band, there are still 100000 samples as requested. This is because after making the cut, the creator will draw more samples (and re-apply the cut) iteratively until you have as many samples as originally requested.

One more note: it is easy to use the QuantityCut degrader as a SNR cut on the magnitudes. The magnitude equation is $m = -2.5 \log(f)$. Taking the derivative, we have $$ dm = \frac{2.5}{\ln(10)} \frac{df}{f} = \frac{2.5}{\ln(10)} \frac{1}{\mathrm{SNR}}. $$ So if you want to make a cut on galaxies above a certain SNR, you can make a cut $$ dm < \frac{2.5}{\ln(10)} \frac{1}{\mathrm{SNR}}. $$ For example, an SNR cut on the i band would look like this: QuantityCut({"i_err": 2.5/np.log(10) * 1/SNR}).

Degrader 3: InvRedshiftIncompleteness

Next, we will demonstrate the InvRedshiftIncompleteness degrader. It applies a selection function, which keeps galaxies with probability $p_{\text{keep}}(z) = \min(1, \frac{z_p}{z})$, where $z_p$ is the ''pivot'' redshift. We'll use $z_p = 0.8$.

Let's plot the redshift distributions of the samples we have generated so far:

You can see that the Gold sample has significantly fewer high-redshift galaxies than the truth. This is because many of the high-redshift galaxies have i > 25.3.

You can further see that the Incomplete Gold sample has even fewer high-redshift galaxies. This is exactly what we expected from this degrader.

Degrader 4: LineConfusion

LineConfusion is a degrader that simulates spectroscopic errors resulting from the confusion of different emission lines.

For this example, let's use the degrader to simulate a scenario in which which 2% of [OII] lines are mistaken as [OIII] lines, and 1% of [OIII] lines are mistaken as [OII] lines. (note I do not know how realistic this scenario is!)

Let's plot the redshift distributions one more time

You can see that the redshift distribution of this new sample is essentially identical to the Incomplete Gold sample, with small perturbations that result from the line confusion.

However the real impact of this degrader isn't on the redshift distribution, but rather that it introduces erroneous spec-z's into the photo-z training sets! To see the impact of this effect, let's plot the true spec-z's as present in the Incomplete Gold sample, vs the spec-z's listed in the new sample with Oxygen Line Confusion.

Now we can clearly see the spec-z errors! The galaxies above the line y=x are the [OII] -> [OIII] galaxies, while the ones below are the [OIII] -> [OII] galaxies.