This notebook demonstrates how to use a RAIL Creator to create galaxy samples, and how to use Degraders to add various errors and biases to the sample.
Note that in the parlance of the Creation Module, "degradation" is any post-processing that occurs to the "true" sample generated by the Creator's Engine. This can include adding photometric errors, applying quality cuts, introducing systematic biases, etc.
In this notebook, we will first learn how to draw samples from a RAIL Creator object. Then we will demonstrate how to use the following RAIL Degraders:
Throughout the notebook, we will show how you can chain all these Degraders together to build a more complicated degrader. Hopefully, this will allow you to see how you can build your own degrader.
Note on generating redshift posteriors: regardless of what Degraders you apply, when you use a Creator to estimate posteriors, the posteriors will always be calculated with respect to the "true" distribution. This is the whole point of the Creation Module -- you can generate degraded samples for which we still have access to the true posteriors. For an example of how to calculate posteriors, see posterior-demo.ipynb
.
import matplotlib.pyplot as plt
from pzflow.examples import get_example_flow
from rail.creation import Creator, engines
from rail.creation.degradation import (
InvRedshiftIncompleteness,
LineConfusion,
LSSTErrorModel,
QuantityCut,
)
Found classifier FZBoost Found classifier randomPZ Found classifier simpleNN Found classifier trainZ Found classifier BPZ_lite
First, let's make a Creator that has no degradation. We can use it to generate a "true" sample, to which we can compare all the degraded samples below.
Note: When instantiating a Creator, you must supply an "engine". This can be any object with sample
and get_posterior
methods. In this example, we will use a normalizing flow from the pzflow package. However, everything in this notebook is totally agnostic to what the underlying engine is.
flowEngine = engines.FlowEngine(get_example_flow())
creator_truth = Creator(flowEngine)
nsamples = int(1e5)
samples_truth = creator_truth.sample(nsamples, seed=0)
samples_truth
redshift | u | g | r | i | z | y | |
---|---|---|---|---|---|---|---|
0 | 0.451103 | 24.605827 | 23.834845 | 22.912357 | 22.591476 | 22.393414 | 22.230196 |
1 | 2.033514 | 27.444181 | 26.927683 | 26.448366 | 26.190765 | 25.758520 | 25.448624 |
2 | 0.885046 | 27.303091 | 26.685753 | 25.883558 | 25.018911 | 24.628029 | 24.507807 |
3 | 1.223162 | 29.681143 | 28.815626 | 27.767424 | 26.827097 | 26.036310 | 25.390919 |
4 | 1.137052 | 25.943348 | 25.584700 | 25.058485 | 24.588055 | 23.897148 | 23.626883 |
... | ... | ... | ... | ... | ... | ... | ... |
99995 | 1.532579 | 26.071077 | 25.934502 | 25.870176 | 25.689196 | 25.437038 | 25.028156 |
99996 | 1.402254 | 25.270113 | 25.027506 | 24.835527 | 24.525967 | 24.327824 | 23.861839 |
99997 | 1.443935 | 25.197727 | 24.845901 | 24.503637 | 24.097057 | 23.905466 | 23.463631 |
99998 | 0.455160 | 27.134581 | 26.261440 | 25.277927 | 25.044682 | 24.943140 | 24.702433 |
99999 | 1.904842 | 25.892120 | 25.575823 | 25.231709 | 24.959017 | 24.594250 | 24.459044 |
100000 rows × 7 columns
Now, we will demonstrate the LSSTErrorModel
, which adds photometric errors using a model similar to the model from Ivezic et al. 2019 (specifically, it uses the model from this paper, without making the high SNR assumption. To restore this assumption and therefore use the exact model from the paper, set highSNR=True
.)
Let's create an error model with the default settings:
errorModel = LSSTErrorModel()
To see the details of the model, including the default settings we are using, you can just print the model:
errorModel
LSSTErrorModel parameters: Model for bands: u, g, r, i, z, y Exposure time = 30.0 s Number of years of observations = 10.0 Mean visits per year per band: u: 5.6, g: 8.0, r: 18.4, i: 18.4, z: 16.0, y: 16.0 Airmass = 1.2 Irreducible system error = 0.005 Extended source model: add 0.0mag to 5-sigma depth for point sources Magnitudes dimmer than 30.0 are set to nan gamma for each band: u: 0.038, g: 0.039, r: 0.039, i: 0.039, z: 0.039, y: 0.039 The following 5-sigma limiting mags are calculated using the parameters that follow them: u: 23.83, g: 24.90, r: 24.47, i: 24.03, z: 23.46, y: 22.53 Cm for each band: u: 23.09, g: 24.42, r: 24.44, i: 24.32, z: 24.16, y: 23.73 Median zenith sky brightness in each band: u: 22.99, g: 22.26, r: 21.2, i: 20.48, z: 19.6, y: 18.61 Median zenith seeing FWHM (in arcseconds) for each band: u: 0.81, g: 0.77, r: 0.73, i: 0.71, z: 0.69, y: 0.68 Extinction coefficient for each band: u: 0.491, g: 0.213, r: 0.126, i: 0.096, z: 0.069, y: 0.17
Now let's add this error model as a degrader and draw some samples with photometric errors.
creator_w_errs = Creator(flowEngine, errorModel)
samples_w_errs = creator_w_errs.sample(nsamples, seed=0)
samples_w_errs
redshift | u | u_err | g | g_err | r | r_err | i | i_err | z | z_err | y | y_err | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.451103 | 24.539248 | 0.054035 | 23.826449 | 0.010416 | 22.910150 | 0.006367 | 22.598023 | 0.006677 | 22.397399 | 0.008235 | 22.217051 | 0.013745 |
1 | 2.033514 | 27.403727 | 0.577453 | 26.862999 | 0.137031 | 26.499604 | 0.098301 | 26.336443 | 0.125568 | 25.900845 | 0.149732 | 25.832399 | 0.306688 |
2 | 0.885046 | 26.830118 | 0.376312 | 26.601625 | 0.109230 | 25.916894 | 0.058758 | 25.010664 | 0.038982 | 24.551018 | 0.045734 | 24.511641 | 0.100463 |
3 | 1.223162 | 28.052901 | 0.892946 | 29.889666 | 1.255874 | 28.660651 | 0.570704 | 26.867039 | 0.197650 | 26.455512 | 0.239014 | 25.572268 | 0.248270 |
4 | 1.137052 | 25.806430 | 0.162855 | 25.498221 | 0.041253 | 25.031609 | 0.026866 | 24.573688 | 0.026538 | 23.881309 | 0.025358 | 23.669743 | 0.047728 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
99995 | 1.532579 | 25.983929 | 0.189257 | 25.990208 | 0.063784 | 25.892612 | 0.057506 | 25.658539 | 0.069260 | 25.411353 | 0.097887 | 24.816786 | 0.131042 |
99996 | 1.402254 | 25.394546 | 0.114290 | 25.038815 | 0.027551 | 24.877175 | 0.023496 | 24.524525 | 0.025427 | 24.309297 | 0.036917 | 23.858735 | 0.056446 |
99997 | 1.443935 | 25.221049 | 0.098280 | 24.879893 | 0.024009 | 24.526246 | 0.017449 | 24.087664 | 0.017525 | 23.916065 | 0.026135 | 23.430038 | 0.038589 |
99998 | 0.455160 | 27.848529 | 0.783198 | 26.243931 | 0.079812 | 25.279064 | 0.033382 | 25.055208 | 0.040551 | 24.895795 | 0.062105 | 24.840928 | 0.133806 |
99999 | 1.904842 | 25.953900 | 0.184528 | 25.582208 | 0.044436 | 25.222360 | 0.031755 | 24.963906 | 0.037401 | 24.550420 | 0.045710 | 24.541656 | 0.103138 |
100000 rows × 13 columns
Notice some of the magnitudes are NaN's. These are non-detections. This means those observed magnitudes were beyond the 30mag limit that is default in LSSTErrorModel
.
You can change this limit and the corresponding flag by setting magLim=...
and ndFlag=...
in the constructor for LSSTErrorModel
.
Let's plot the error as a function of magnitude
fig, ax = plt.subplots(figsize=(5, 4), dpi=100)
for band in "ugrizy":
# pull out the magnitudes and errors
mags = samples_w_errs[band].to_numpy()
errs = samples_w_errs[band + "_err"].to_numpy()
# sort them by magnitude
mags, errs = mags[mags.argsort()], errs[mags.argsort()]
# plot errs vs mags
ax.plot(mags, errs, label=band)
ax.legend()
ax.set(xlabel="Magnitude (AB)", ylabel="Error (mags)")
plt.show()
WARNING:matplotlib.font_manager:findfont: Font family ['serif'] not found. Falling back to DejaVu Sans.
You can see that the photometric error increases as magnitude gets dimmer, just like you would expect. Notice, however, that we have galaxies as dim as magnitude 30. This is because the Flow produces a sample much deeper than the LSST 5-sigma limiting magnitudes. There are no galaxies dimmer than magnitude 30 because LSSTErrorModel sets magnitudes > 30 equal to NaN (the default flag for non-detections).
Recall how the sample above has galaxies as dim as magnitude 30. This is well beyond the LSST 5-sigma limiting magnitudes, so it will be useful to apply cuts to the data to filter out these super-dim samples. We can apply these cuts using the QuantityCut
degrader. This degrader will cut out any samples that do not pass all of the specified cuts.
Let's create a Degrader that first adds photometric errors, then cuts at i<25.3, which is the LSST gold sample.
def goldCutWithErrs(data, seed=None):
# apply the error model from before
data = errorModel(data, seed)
# now make a cut on observed i band
data = QuantityCut({"i": 25.3})(data, seed)
return data
Now we can stick this into a Creator and draw a new sample
creator_gold_w_errs = Creator(flowEngine, degrader=goldCutWithErrs)
samples_gold_w_errs = creator_gold_w_errs.sample(nsamples, seed=0)
samples_gold_w_errs
redshift | u | u_err | g | g_err | r | r_err | i | i_err | z | z_err | y | y_err | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.451103 | 24.539248 | 0.054035 | 23.826449 | 0.010416 | 22.910150 | 0.006367 | 22.598023 | 0.006677 | 22.397399 | 0.008235 | 22.217051 | 0.013745 |
1 | 0.885046 | 26.830118 | 0.376312 | 26.601625 | 0.109230 | 25.916894 | 0.058758 | 25.010664 | 0.038982 | 24.551018 | 0.045734 | 24.511641 | 0.100463 |
2 | 1.137052 | 25.806430 | 0.162855 | 25.498221 | 0.041253 | 25.031609 | 0.026866 | 24.573688 | 0.026538 | 23.881309 | 0.025358 | 23.669743 | 0.047728 |
3 | 0.660576 | 24.143084 | 0.038177 | 23.429509 | 0.008149 | 22.390284 | 0.005597 | 21.556377 | 0.005318 | 21.260679 | 0.005560 | 21.059267 | 0.006766 |
4 | 0.822728 | 26.503874 | 0.290627 | 26.244331 | 0.079840 | 25.232896 | 0.032051 | 24.199898 | 0.019251 | 23.788815 | 0.023407 | 23.799845 | 0.053571 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
99995 | 0.801033 | 26.667531 | 0.331239 | 25.480348 | 0.040607 | 24.006963 | 0.011597 | 22.823948 | 0.007361 | 22.176000 | 0.007342 | 21.899974 | 0.010847 |
99996 | 0.585408 | 22.388322 | 0.009584 | 22.102159 | 0.005420 | 21.467800 | 0.005137 | 21.024005 | 0.005137 | 20.867761 | 0.005298 | 20.753331 | 0.006097 |
99997 | 0.318370 | 24.685661 | 0.061461 | 24.043790 | 0.012164 | 23.439348 | 0.008037 | 23.263246 | 0.009428 | 23.159505 | 0.013887 | 23.083905 | 0.028448 |
99998 | 0.275493 | 22.900322 | 0.013618 | 21.669356 | 0.005217 | 20.837771 | 0.005053 | 20.612420 | 0.005073 | 20.335039 | 0.005129 | 20.348336 | 0.005575 |
99999 | 1.016788 | 26.154334 | 0.218258 | 25.798425 | 0.053819 | 25.409622 | 0.037463 | 24.971336 | 0.037648 | 24.307060 | 0.036844 | 24.065535 | 0.067805 |
100000 rows × 13 columns
If you look at the i column, you will see there are no longer any samples with i > 25.3. You can also see that despite making the cut on the i band, there are still 100000 samples as requested. This is because after making the cut, the creator will draw more samples (and re-apply the cut) iteratively until you have as many samples as originally requested.
One more note: it is easy to use the QuantityCut degrader as a SNR cut on the magnitudes. The magnitude equation is $m = -2.5 \log(f)$. Taking the derivative, we have
$$
dm = \frac{2.5}{\ln(10)} \frac{df}{f} = \frac{2.5}{\ln(10)} \frac{1}{\mathrm{SNR}}.
$$
So if you want to make a cut on galaxies above a certain SNR, you can make a cut
$$
dm < \frac{2.5}{\ln(10)} \frac{1}{\mathrm{SNR}}.
$$
For example, an SNR cut on the i band would look like this: QuantityCut({"i_err": 2.5/np.log(10) * 1/SNR})
.
Next, we will demonstrate the InvRedshiftIncompleteness
degrader. It applies a selection function, which keeps galaxies with probability $p_{\text{keep}}(z) = \min(1, \frac{z_p}{z})$, where $z_p$ is the ''pivot'' redshift. We'll use $z_p = 0.8$.
def incompleteGoldWithErrs(data, seed=None):
# apply error model and make cut on observed i band
data = goldCutWithErrs(data, seed)
# introduce redshift incompleteness
data = InvRedshiftIncompleteness(0.8)(data, seed)
return data
creator_incomplete_gold_w_errs = Creator(flowEngine, degrader=incompleteGoldWithErrs)
samples_incomplete_gold_w_errs = creator_incomplete_gold_w_errs.sample(nsamples, seed=0)
Let's plot the redshift distributions of the samples we have generated so far:
fig, ax = plt.subplots(figsize=(5, 4), dpi=100)
zmin = 0
zmax = 2.5
hist_settings = {
"bins": 50,
"range": (zmin, zmax),
"density": True,
"histtype": "step",
}
ax.hist(samples_truth["redshift"], label="Truth", **hist_settings)
ax.hist(samples_gold_w_errs["redshift"], label="Gold", **hist_settings)
ax.hist(samples_incomplete_gold_w_errs["redshift"], label="Incomplete Gold", **hist_settings)
ax.legend(title="Sample")
ax.set(xlim=(zmin, zmax), xlabel="Redshift", ylabel="Galaxy density")
plt.show()
You can see that the Gold sample has significantly fewer high-redshift galaxies than the truth. This is because many of the high-redshift galaxies have i > 25.3.
You can further see that the Incomplete Gold sample has even fewer high-redshift galaxies. This is exactly what we expected from this degrader.
LineConfusion
is a degrader that simulates spectroscopic errors resulting from the confusion of different emission lines.
For this example, let's use the degrader to simulate a scenario in which which 2% of [OII] lines are mistaken as [OIII] lines, and 1% of [OIII] lines are mistaken as [OII] lines. (note I do not know how realistic this scenario is!)
def confusedIncompleteGoldWithErrs(data, seed=None):
# apply error model, make cut on i band, introduce incompleteness
data = incompleteGoldWithErrs(data, seed)
# Oxygen lines (in angstroms)
OII = 3727
OIII = 5007
# 2% OII -> OIII confusion
data = LineConfusion(true_wavelen=OII, wrong_wavelen=OIII, frac_wrong=0.02)(data, seed)
# 1% OIII -> OII confusion
data = LineConfusion(true_wavelen=OIII, wrong_wavelen=OII, frac_wrong=0.01)(data, seed)
return data
creator_conf_inc_gold_w_errs = Creator(flowEngine, degrader=confusedIncompleteGoldWithErrs)
samples_conf_inc_gold_w_errs = creator_conf_inc_gold_w_errs.sample(nsamples, seed=0)
Let's plot the redshift distributions one more time
fig, ax = plt.subplots(figsize=(5, 4), dpi=100)
zmin = 0
zmax = 2.5
hist_settings = {
"bins": 50,
"range": (zmin, zmax),
"density": True,
"histtype": "step",
}
ax.hist(samples_truth["redshift"], label="Truth", **hist_settings)
ax.hist(samples_gold_w_errs["redshift"], label="Gold", **hist_settings)
ax.hist(samples_incomplete_gold_w_errs["redshift"], label="Incomplete Gold", **hist_settings)
ax.hist(samples_conf_inc_gold_w_errs["redshift"], label="Confused Incomplete Gold", **hist_settings)
ax.legend(title="Sample")
ax.set(xlim=(zmin, zmax), xlabel="Redshift", ylabel="Galaxy density")
plt.show()
You can see that the redshift distribution of this new sample is essentially identical to the Incomplete Gold sample, with small perturbations that result from the line confusion.
However the real impact of this degrader isn't on the redshift distribution, but rather that it introduces erroneous spec-z's into the photo-z training sets! To see the impact of this effect, let's plot the true spec-z's as present in the Incomplete Gold sample, vs the spec-z's listed in the new sample with Oxygen Line Confusion.
fig, ax = plt.subplots(figsize=(6, 6), dpi=85)
ax.scatter(samples_incomplete_gold_w_errs["redshift"], samples_conf_inc_gold_w_errs["redshift"],
marker=".", s=1)
ax.set(
xlim=(0, 2.5), ylim=(0, 2.5),
xlabel="True spec-z (in Incomplete Gold sample)",
ylabel="Spec-z listed in the Confused sample",
)
plt.show()
Now we can clearly see the spec-z errors! The galaxies above the line y=x are the [OII] -> [OIII] galaxies, while the ones below are the [OIII] -> [OII] galaxies.