Goldenspike, an example of an end-to-end analysis using RAIL¶

author: Sam Schmidt, Eric Charles, Alex Malz, John Franklin Crenshaw, others...
last run successfully: April 15, 2022

This notebook demonstrates how to use a the various RAIL Modules to draw synthetic samples of fluxes by color, apply physical effects to them, train photo-Z estimators on the samples, test and validate the preformance of those estimators, and to use the RAIL summarization modules to obtain n(z) estimates based on the p(z) estimates.

Creation¶

Note that in the parlance of the Creation Module, "degradation" is any post-processing that occurs to the "true" sample generated by the create Engine. This can include adding photometric errors, applying quality cuts, introducing systematic biases, etc.

In this notebook, we will draw both test and training samples from a RAIL Engine object. Then we will demonstrate how to use RAIL degraders to apply effects to those samples.

Training and Estimation¶

The RAIL Informer modules "train" or "inform" models used to estimate p(z) given band fluxes (and potentially other information).

The RAIL Estimation modules then use those same models to actually apply the model and extract the p(z) estimates.

p(z) Validation¶

The RAIL Validator module applies various metrics

p(z) to n(z) Summarization¶

The RAIL Summarization modules convert per-galaxy p(z) posteriors to ensemble n(z) estimates.

Imports¶

In [1]:
# Prerquisites: os, numpy, pathlib, pzflow, tables_io
import os
import numpy as np
from pathlib import Path
from pzflow.examples import get_galaxy_data
import tables_io
In [2]:
# Various rail modules
import rail
from rail.creation.degradation import LSSTErrorModel, InvRedshiftIncompleteness, LineConfusion, QuantityCut
from rail.creation.engines.flowEngine import FlowModeler, FlowCreator, FlowPosterior
from rail.core.data import TableHandle
from rail.core.stage import RailStage
from rail.core.utilStages import ColumnMapper, TableConverter

from rail.estimation.algos.bpz_lite import Inform_BPZ_lite, BPZ_lite
from rail.estimation.algos.knnpz import Inform_KNearNeighPDF, KNearNeighPDF
from rail.estimation.algos.flexzboost import Inform_FZBoost, FZBoost

from rail.estimation.algos.naiveStack import NaiveStack
from rail.estimation.algos.pointEstimateHist import PointEstimateHist

from rail.evaluation.evaluator import Evaluator

RAIL now uses ceci as a back-end, which takes care of a lot of file I/O decisions to be consistent with other choices in DESC.

This bit effectively overrides a ceci default to prevent overwriting previous results, generally good but not necessary for this demo.

The DataStore uses DataHandle objects to keep track of the connections between the various stages. When one stage returns a DataHandle and then you pass that DataHandle to another stage, the underlying code can establish the connections needed to build a reproducilble pipeline.

In [3]:
DS = RailStage.data_store
DS.__class__.allow_overwrite = True

Here we need a few configuration parameters to deal with differences in data schema between existing PZ codes.

In [4]:
from rail.core.utils import RAILDIR
flow_file = os.path.join(RAILDIR, 'examples/goldenspike/data/pretrained_flow.pkl')
bands = ['u','g','r','i','z','y']
band_dict = {band:f'mag_{band}_lsst' for band in bands}
rename_dict = {f'mag_{band}_lsst_err':f'mag_err_{band}_lsst' for band in bands}

Train the Flow Engine¶

First we need to train the normalizing flow that will serve as the engine for the notebook.

In the cell below, we load the example galaxy catalog from PZFlow and save it so that it can be used to train the flow. We also set the path where we will save the flow

In [5]:
DATA_DIR = Path().resolve() / "data"
DATA_DIR.mkdir(exist_ok=True)

catalog_file = DATA_DIR / "base_catalog.pq"
catalog = get_galaxy_data().rename(band_dict, axis=1)
tables_io.write(catalog, str(catalog_file.with_suffix("")), catalog_file.suffix[1:])

catalog_file = str(catalog_file)
flow_file = str(DATA_DIR / "trained_flow.pkl")

Now we set the parameters for the FlowModeler, i.e. the pipeline stage that trains the flow:

In [6]:
flow_modeler_params = {
    "name": "flow_modeler",
    "input": catalog_file,
    "model": flow_file,
    "seed": 0,
    "phys_cols": {"redshift": [0, 3]},
    "phot_cols": {
        "mag_u_lsst": [17, 35],
        "mag_g_lsst": [16, 32],
        "mag_r_lsst": [15, 30],
        "mag_i_lsst": [15, 30],
        "mag_z_lsst": [14, 29],
        "mag_y_lsst": [14, 28],
    },
    "calc_colors": {"ref_column_name": "mag_i_lsst"},
}

Now we will create the flow and train it

In [7]:
flow_modeler = FlowModeler.make_stage(**flow_modeler_params)
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
In [8]:
flow_modeler.fit_model()
Inserting handle into data store.  input: /home/runner/work/RAIL/RAIL/examples/goldenspike/data/base_catalog.pq, flow_modeler
Training 30 epochs 
Loss:
(0) 21.3266
(1) 4.1805
(2) 2.9966
(3) 2.0658
(4) 0.6698
(5) 0.7940
(6) 0.6846
(7) 0.3540
(8) -1.1601
(9) 3402823273761818485311871060541440.0000
(10) -1.4746
(11) 3402823273761818485311871060541440.0000
(12) -1.0880
(13) -2.4083
(14) 3402823273761818485311871060541440.0000
(15) -1.9668
(16) -2.6006
(17) -2.3457
(18) 3402823273761818485311871060541440.0000
(19) -1.9753
(20) -2.3087
(21) -1.1899
(22) -1.0707
(23) 3402823273761818485311871060541440.0000
(24) -3.3070
(25) -3.5133
(26) -3.3352
(27) 1.0717
(28) -3.4730
(29) -3.7733
(30) -2.1753
Inserting handle into data store.  model_flow_modeler: /home/runner/work/RAIL/RAIL/examples/goldenspike/data/inprogress_trained_flow.pkl, flow_modeler
Out[8]:
<rail.core.data.FlowHandle at 0x7f9fdc1c3280>

Make mock data¶

Now we will use the trained flow to create training and test data for the photo-z estimators.

For both the training and test data we will:

  1. Use the Flow to produce some synthetic data
  2. Use the LSSTErrorModel to add photometric errors
  3. Use the FlowPosterior to estimate the redshift posteriors for the degraded sample
  4. Use the ColumnMapper to rename the error columns so that they match the names in DC2.
  5. Use the TableConverter to convert the data to a numpy dictionary, which will be stored in a hdf5 file with the same schema as the DC2 data

Training sample¶

For the training data we are going to apply a couple of extra degradation effects to the data beyond what we do to create test data, as the training data will have some spectroscopic incompleteness. This will allow us to see how the trained models perform with imperfect training data.

More details about the degraders are available in the rail/examples/creation/degradation_demo.ipynb notebook.

In [9]:
flow_creator_train = FlowCreator.make_stage(
    name='flow_creator_train', 
    model=flow_modeler.get_handle("model"), 
    n_samples=50,
    seed=1235,
)

lsst_error_model_train = LSSTErrorModel.make_stage(
    name='lsst_error_model_train',
    bandNames=band_dict, 
    seed=29,
)

inv_redshift = InvRedshiftIncompleteness.make_stage(
    name='inv_redshift',
    pivot_redshift=1.0,
)

line_confusion = LineConfusion.make_stage(
    name='line_confusion', 
    true_wavelen=5007., 
    wrong_wavelen=3727.,
    frac_wrong=0.05,
)

quantity_cut = QuantityCut.make_stage(
    name='quantity_cut',    
    cuts={'mag_i_lsst': 25.0},
)

col_remapper_train = ColumnMapper.make_stage(
    name='col_remapper_train', 
    columns=rename_dict,
)
   
table_conv_train = TableConverter.make_stage(
    name='table_conv_train', 
    output_format='numpyDict',
)
In [10]:
train_data_orig = flow_creator_train.sample(150, 1235)
train_data_errs = lsst_error_model_train(train_data_orig,seed=66)
train_data_inc = inv_redshift(train_data_errs)
train_data_conf = line_confusion(train_data_inc)
train_data_cut = quantity_cut(train_data_conf)
train_data_pq = col_remapper_train(train_data_cut)
train_data = table_conv_train(train_data_pq)
Inserting handle into data store.  output_flow_creator_train: inprogress_output_flow_creator_train.pq, flow_creator_train
Inserting handle into data store.  output_lsst_error_model_train: inprogress_output_lsst_error_model_train.pq, lsst_error_model_train
Inserting handle into data store.  output_inv_redshift: inprogress_output_inv_redshift.pq, inv_redshift
Inserting handle into data store.  output_line_confusion: inprogress_output_line_confusion.pq, line_confusion
Inserting handle into data store.  output_quantity_cut: inprogress_output_quantity_cut.pq, quantity_cut
Inserting handle into data store.  output_col_remapper_train: inprogress_output_col_remapper_train.pq, col_remapper_train
Inserting handle into data store.  output_table_conv_train: inprogress_output_table_conv_train.hdf5, table_conv_train

Let's examine the quantities that we've generated, we'll use the handy tables_io package to temporarily write to a pandas dataframe for quick writeout of the columns:

In [11]:
train_table = tables_io.convertObj(train_data.data, tables_io.types.PD_DATAFRAME)
train_table.head()
Out[11]:
redshift mag_u_lsst mag_err_u_lsst mag_g_lsst mag_err_g_lsst mag_r_lsst mag_err_r_lsst mag_i_lsst mag_err_i_lsst mag_z_lsst mag_err_z_lsst mag_y_lsst mag_err_y_lsst
0 0.958749 24.459368 0.050371 24.193349 0.013625 23.481190 0.008226 22.683827 0.006912 21.967017 0.006708 21.703655 0.009496
1 0.205615 24.440223 0.049531 23.779293 0.010089 23.448761 0.008079 23.186146 0.008980 23.176253 0.014071 23.161475 0.030450
2 1.205826 27.332480 0.548658 25.883870 0.058052 25.001308 0.026166 24.335299 0.021597 23.434110 0.017335 22.902281 0.024289
3 0.632042 24.316868 0.044448 23.993473 0.011722 23.636696 0.009018 23.110995 0.008582 23.079912 0.013055 23.052467 0.027677
4 0.582394 24.185892 0.039631 23.499354 0.008475 22.761638 0.006082 22.194906 0.005890 22.105536 0.007108 21.984195 0.011524

You see that we've generated redshifts, ugrizy magnitudes, and magnitude errors with names that match those in the cosmoDC2_v1.1.4_image data.

Testing sample¶

For the test sample we will:

  1. Use the Flow to produce some synthetic data
  2. Use the LSSTErrorModel to smear the data
  3. Use the FlowPosterior to estimate the redshift posteriors for the degraded sample
  4. Use ColumnMapper to rename some of the columns to match DC2
  5. Use the TableConverter to convert the data to a numpy dictionary, which will be stored in a hdf5 file with the same schema as the DC2 data
In [12]:
flow_creator_test = FlowCreator.make_stage(
    name='flow_creator_test',
    model=flow_modeler.get_handle("model"),
    n_samples=50,
)
      
lsst_error_model_test = LSSTErrorModel.make_stage(
    name='lsst_error_model_test',
    bandNames=band_dict,
)

flow_post_test = FlowPosterior.make_stage(
    name='flow_post_test',
    model=flow_modeler.get_handle("model"),
    column='redshift',
    grid=np.linspace(0., 5., 21),
)
                
col_remapper_test = ColumnMapper.make_stage(
    name='col_remapper_test',
    columns=rename_dict,
    hdf5_groupname='',
)

table_conv_test = TableConverter.make_stage(
    name='table_conv_test', 
    output_format='numpyDict',
)
In [13]:
test_data_orig = flow_creator_test.sample(150, 1234)
test_data_errs = lsst_error_model_test(test_data_orig,seed=58)
test_data_post = flow_post_test.get_posterior(test_data_errs, err_samples=None)
test_data_pq = col_remapper_test(test_data_errs)
test_data = table_conv_test(test_data_pq)
Inserting handle into data store.  output_flow_creator_test: inprogress_output_flow_creator_test.pq, flow_creator_test
Inserting handle into data store.  output_lsst_error_model_test: inprogress_output_lsst_error_model_test.pq, lsst_error_model_test
Inserting handle into data store.  output_flow_post_test: inprogress_output_flow_post_test.hdf5, flow_post_test
Warning.  Failed to convert column 'ArrayImpl' object has no attribute 'data'
Inserting handle into data store.  output_col_remapper_test: inprogress_output_col_remapper_test.pq, col_remapper_test
Inserting handle into data store.  output_table_conv_test: inprogress_output_table_conv_test.hdf5, table_conv_test
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/qp/interp_pdf.py:83: RuntimeWarning: invalid value encountered in divide
  self._ycumul = (self._ycumul.T / self._ycumul[:,-1]).T
In [14]:
test_table = tables_io.convertObj(test_data.data, tables_io.types.PD_DATAFRAME)
test_table.head()
Out[14]:
redshift mag_u_lsst mag_err_u_lsst mag_g_lsst mag_err_g_lsst mag_r_lsst mag_err_r_lsst mag_i_lsst mag_err_i_lsst mag_z_lsst mag_err_z_lsst mag_y_lsst mag_err_y_lsst
0 0.122277 23.948877 0.032242 22.780389 0.006203 22.011501 0.005324 21.772573 0.005451 21.607630 0.005975 21.510299 0.008435
1 0.545639 25.078498 0.086770 24.385587 0.015866 23.425509 0.007977 23.064986 0.008356 22.901749 0.011425 22.731198 0.020970
2 1.229709 28.453834 1.135473 26.786951 0.128317 25.306018 0.034185 24.470777 0.024269 23.912741 0.026060 23.298156 0.034343
3 0.472265 27.899986 0.809929 27.429590 0.221586 26.128743 0.070895 25.799778 0.078473 25.432443 0.099714 25.784967 0.295216
4 1.526582 26.788719 0.364370 26.363759 0.088690 26.098064 0.068996 25.348765 0.052620 24.820060 0.058070 24.232487 0.078591

"Inform" some estimators¶

More details about the process of "informing" or "training" the models used by the estimators is available in the rail/examples/estimation/RAIL_estimation_demo.ipynb notebook.

We use "inform" rather than "train" to generically refer to the preprocessing of any prior information. For a machine learning estimator, that prior information is a training set, but it can also be an SED template library for a template-fitting or hybrid estimator.

In [15]:
inform_bpz = Inform_BPZ_lite.make_stage(
    name="inform_bpz",
    model="bpz.pkl",
    hdf5_groupname="",
)

inform_knn = Inform_KNearNeighPDF.make_stage(
    name='inform_knn', 
    nondetect_val=np.nan,
    model='knnpz.pkl', 
    hdf5_groupname='',
)

inform_fzboost = Inform_FZBoost.make_stage(
    name='inform_FZBoost', 
    model='fzboost.pkl', 
    hdf5_groupname='',
)
In [16]:
inform_bpz.inform(train_data)
inform_knn.inform(train_data)
inform_fzboost.inform(train_data)
using 47 galaxies in calculation
best values for fo and kt:
[1.]
[0.3]
minimizing for type 0
[0.4 1.8 0.1] 27.962917140749738
[0.42 1.8  0.1 ] 29.735186874337632
[0.4  1.89 0.1 ] 27.66418010144885
[0.4   1.8   0.105] 29.438726300396013
[0.38       1.86       0.10333333] 26.949928116646976
[0.36  1.89  0.105] 25.59915990966773
[0.37333333 1.92       0.09833333] 24.82079798829084
[0.36  1.98  0.095] 22.80167160765164
[0.34666667 2.04       0.1       ] 23.03565859604935
[0.31111111 2.05       0.1       ] 20.39266192377808
[0.26666667 2.13       0.1       ] 18.105583176275662
[0.28888889 2.21       0.09166667] 18.806226534162178
[0.2637037  2.17333333 0.09111111] 17.788023694783774
[0.22222222 2.24       0.08666667] 20.216745229898986
[0.18617284 2.36222222 0.09351852] 24.02322779450482
[0.31654321 2.07555556 0.09462963] 19.756816562219868
[0.22962963 2.26666667 0.09388889] 18.933670151587137
[0.25135802 2.21888889 0.09407407] 18.13054028052276
[0.23226337 2.13814815 0.09845679] 16.95888927013622
[0.20395062 2.10222222 0.10185185] 16.22491420112253
[0.2381893  2.05148148 0.10123457] 16.57363555206576
[0.20389575 2.08802469 0.09613169] 16.372732845879383
[0.16698674 1.98781893 0.10836763] 14.723882262272527
[0.11862826 1.89506173 0.11699588] 13.75962286674244
[0.11279378 2.00539095 0.10875171] 16.41367456226704
[0.14414266 2.01691358 0.10687243] 15.374396491499809
[0.10725194 1.92144033 0.12101509] 14.058649380346445
[0.04273129 1.78672154 0.12807042] 19.41594485913312
[0.16364579 2.02334705 0.10840649] 15.053870516672038
[0.11554133 1.87631916 0.12407255] 13.826792749702795
[0.06396857 1.77186709 0.13298252] 14.825559506598939
[0.08888787 1.83473708 0.12683852] 13.789794586376305
[0.1081197  1.81597165 0.12425621] 13.511918190098992
[0.10855357 1.76323731 0.12587677] 13.559231634654438
[0.09488256 1.82086115 0.12132119] 13.463984880110049
[0.08455317 1.79313214 0.11994551] 13.575712688293617
[0.12553247 1.8531926  0.11487701] 13.440841295639304
[0.14385476 1.86242036 0.10889626] 13.562214487035227
[0.10039489 1.76495521 0.12330706] 13.22316931669934
[0.0912782  1.69990195 0.12646264] 13.318422654334837
[0.10575358 1.81003432 0.11541396] 13.23985132271272
[0.12623807 1.79792694 0.11441083] 13.129349120724516
[0.14191582 1.78645983 0.11095564] 13.176604902377145
[0.09605855 1.72875171 0.12054422] 12.946307879204994
[0.0813216  1.66653126 0.12337782] 13.053239305468937
[0.1093741  1.71772158 0.12342744] 13.312910222287588
[0.10665871 1.78695614 0.11741733] 13.0726974096925
[0.11890866 1.77746798 0.11160786] 12.940681656657112
[0.12816555 1.78372437 0.10575826] 13.067822013310264
[0.08817922 1.73085695 0.11863545] 13.002663697701793
[0.09543892 1.70442829 0.11644102] 12.71506789117684
[0.08982902 1.66316436 0.11595286] 12.598413308922298
[0.11501828 1.71539909 0.11343452] 12.668640560509758
[0.11977875 1.70860258 0.10678594] 12.561220285089561
[0.13163885 1.69852802 0.0999068 ] 12.625538179173715
[0.0975087  1.61397604 0.11250769] 12.307779170531449
[0.08680872 1.53223007 0.1129576 ] 12.295689674122329
[0.08259272 1.55393226 0.11036309] 12.25612148466408
[0.06637994 1.47319884 0.10882738] 12.518357297266357
[0.10295777 1.53334558 0.10411822] 11.95360483821603
[0.10952215 1.46843618 0.0982009 ] 11.831517564850799
[0.06617031 1.32779642 0.10756179] 12.767736940526488
[0.10637664 1.61340104 0.1069799 ] 12.1678362057218
[0.11218562 1.55828291 0.097405  ] 12.043640761688923
[0.13613022 1.53948117 0.09136078] 12.01373468768545
[0.13218202 1.43073247 0.08433122] 11.890282716064931
[0.13970398 1.40081697 0.08519027] 11.942509985203483
[0.11814188 1.32717591 0.08712081] 11.98200128203119
[0.12263896 1.38025222 0.08818081] 11.857928946918
[0.10319145 1.45213028 0.09528501] 11.803967963079653
[0.08493518 1.47778694 0.10033239] 12.036707185785264
[0.09138635 1.43647999 0.1034466 ] 11.965560161626875
[0.1219831  1.43216935 0.08911006] 11.8076161401908
[0.10049217 1.52157165 0.10021651] 11.93230483291457
[0.11710227 1.41558208 0.09118973] 11.79693093284275
[0.11866239 1.39815163 0.0855223 ] 11.840295915707031
[0.11180721 1.45086504 0.09503125] 11.794341753700824
[0.09941751 1.44688226 0.0985606 ] 11.816720111698436
[0.11634171 1.43584758 0.0914727 ] 11.788591631166923
[0.12697601 1.41606618 0.08984411] 11.860777188522055
[0.10913759 1.44311426 0.09392479] 11.783218425886332
[0.10775541 1.47096917 0.09576276] 11.809667678242583
[0.11476555 1.42942885 0.09233299] 11.786379910523047
[0.11502268 1.42139541 0.0901224 ] 11.787251141156178
[0.10960884 1.42677811 0.09278075] 11.777124717792523
[0.10624241 1.42224337 0.09343478] 11.776956058809436
[0.10507435 1.44179558 0.09633931] 11.79257619311134
[0.1125356  1.42649545 0.09167662] 11.780167386389191
[0.10384485 1.43180654 0.09369114] 11.78934763543391
[0.11203538 1.43002327 0.09267253] 11.779954294738376
[0.11140467 1.40939381 0.0912645 ] 11.778898922113896
[0.1072527  1.41461152 0.09323791] 11.778032082622126
[0.10456448 1.40080919 0.09261893] 11.78005962317059
[0.11016765 1.42271975 0.09265913] 11.777181309033518
[0.10437051 1.43032262 0.09495671] 11.782205208962143
[0.10964613 1.41462601 0.09218755] 11.776684444271192
[0.11011809 1.42511458 0.09228306] 11.777529537026572
[0.10940174 1.42248881 0.09252177] 11.776423165083857
[0.10669254 1.41685238 0.09277027] 11.776285986857143
[0.10495498 1.41391869 0.09282585] 11.778182789451373
[0.11091786 1.41373476 0.09155162] 11.77748257425347
[0.10741127 1.42011622 0.09296399] 11.776081256074189
[0.10602424 1.42501226 0.09331647] 11.778442543139892
[0.10874066 1.41722258 0.09246978] 11.77595539429968
[0.1058279  1.41363864 0.09294759] 11.77668192085746
[0.10850828 1.42027627 0.09262823] 11.776003716027692
[0.1097476  1.42155766 0.09260439] 11.776641107484016
[0.1074563 1.4180287 0.0927288] 11.775956594759945
[0.10905889 1.41690214 0.09225389] 11.77601431523945
[0.10864699 1.41770566 0.09243141] 11.77590284204797
[0.10805435 1.41502836 0.09245844] 11.775863366693672
[0.10782738 1.4124044  0.09237354] 11.776038683394427
[0.10950502 1.41527569 0.09217762] 11.776370689263267
[0.10796848 1.41734045 0.09259101] 11.775841724078134
[0.10770589 1.4161604  0.09251746] 11.77587750667452
[0.10717216 1.41464714 0.09261319] 11.775968599664704
[0.10827828 1.41694103 0.09247686] 11.77584346414267
[0.10849485 1.41671282 0.09250008] 11.77590587578194
[0.10790313 1.41629851 0.09251311] 11.775839556846737
[0.10804558 1.41869164 0.09259555] 11.775924078225158
[0.10805216 1.41594418 0.09249272] 11.775834306428917
[0.1076709  1.41611439 0.0925877 ] 11.775852576856424
[0.10812643 1.41673437 0.09250457] 11.775833766764668
[0.108086   1.41531092 0.09241592] 11.775846589731215
[0.10799786 1.41683307 0.09254724] 11.775833474454988
[0.10821451 1.41670924 0.09251657] 11.77583986209378
[0.10798097 1.41640119 0.09251398] 11.775833036468354
[0.10801802 1.41736824 0.09255114] 11.775849666587334
[0.10804362 1.41630019 0.09250732] 11.77583092043052
[0.10788854 1.41628859 0.09254112] 11.775834291488064
[0.10806696 1.41662293 0.09251371] 11.7758319631176
[0.10806318 1.41604981 0.0924761 ] 11.775833255158123
[0.10804685 1.41624562 0.09249388] 11.77583176086206
[0.10812398 1.41637797 0.09249596] 11.77583252627719
[0.10808823 1.41638378 0.09250047] 11.775831432582581
[0.10805217 1.4159968  0.09248741] 11.775833080812022
[0.10806326 1.4164664  0.09250713] 11.775831253815285
[0.10808323 1.41652129 0.09251606] 11.77583115336379
[0.10803851 1.41647481 0.09251988] 11.775830668540735
[0.10801366 1.41652032 0.09252958] 11.775830746620594
[0.10804698 1.4163978  0.09252171] 11.77583086813096
[0.10800285 1.41626058 0.09251654] 11.775830937719418
[0.10802294 1.41632575 0.09251642] 11.775830705540603
[0.10802867 1.41649871 0.09253135] 11.775830629422508
[0.1080212  1.41659797 0.09254337] 11.775830979024404
[0.10801311 1.41646839 0.09252339] 11.775830837339932
[0.10802157 1.41645074 0.09252297] 11.775830626827842
[0.10803623 1.41662375 0.09253304] 11.775830898902775
[0.10802627 1.41640025 0.09252058] 11.775830582973235
Inserting handle into data store.  model_inform_bpz: inprogress_bpz.pkl, inform_bpz
split into 35 training and 12 validation samples
finding best fit sigma and NNeigh...



best fit values are sigma=0.03166666666666667 and numneigh=3



Inserting handle into data store.  model_inform_knn: inprogress_knnpz.pkl, inform_knn
stacking some data...
read in training data
fit the model...
finding best bump thresh...
finding best sharpen parameter...
Inserting handle into data store.  model_inform_FZBoost: inprogress_fzboost.pkl, inform_FZBoost
Out[16]:
<rail.core.data.ModelHandle at 0x7f9fb9c3fac0>

If you run into issues here:

If you've installed rail and bpz to different directories (most commonly, you've installed rail from source and bpz from PyPI), you may run into an issue where rail cannot locate a file installed by bpz.

To fix this, find your test_bpz.columns file in your bpz directory (or grab a new one here on GitHub) and copy it into your rail directory to /RAIL/src/rail/examples/estimation/configs/test_bpz.columns.

Alternatively, if you don't want to move files, you should be able rewrite code to replace the configured paths with your actual test_bpz.columns path (inform stage: bpz_lite.py L89 and estimation: bpz_lite.py L259).

Estimate photo-z posteriors¶

More details about the estimators is available in the rail/examples/estimation/RAIL_estimation_demo.ipynb notebook.

randomPZ is a very simple class that does not actually predict a meaningful photo-z, instead it produces a randomly drawn Gaussian for each galaxy.
trainZ is our "pathological" estimator, it makes a PDF from a histogram of the training data and assigns that PDF to every galaxy.
BPZ_lite is a template-based code that outputs the posterior estimated given a specific template set and Bayesian prior. See Benitez (2000) for more details.

In [17]:
estimate_bpz = BPZ_lite.make_stage(
    name='estimate_bpz', 
    hdf5_groupname='', 
    model=inform_bpz.get_handle('model'),
)

estimate_knn = KNearNeighPDF.make_stage(
    name='estimate_knn', 
    hdf5_groupname='', 
    nondetect_val=np.nan, 
    model=inform_knn.get_handle('model'),
)

estimate_fzboost = FZBoost.make_stage(
    name='test_FZBoost', 
    nondetect_val=np.nan,
    model=inform_fzboost.get_handle('model'), 
    hdf5_groupname='',
    aliases=dict(input='test_data', output='fzboost_estim'),
)
In [18]:
knn_estimated = estimate_knn.estimate(test_data)
fzboost_estimated = estimate_fzboost.estimate(test_data)
bpz_estimated = estimate_bpz.estimate(test_data)
Process 0 running estimator on chunk 0 - 150
Process 0 estimating PZ PDF for rows 0 - 150
Inserting handle into data store.  output_estimate_knn: inprogress_output_estimate_knn.hdf5, estimate_knn
Process 0 running estimator on chunk 0 - 150
Process 0 estimating PZ PDF for rows 0 - 150
Inserting handle into data store.  output_test_FZBoost: inprogress_output_test_FZBoost.hdf5, test_FZBoost
  Generating new AB file El_B2004a.DC2LSST_u.AB....
El_B2004a DC2LSST_u
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/El_B2004a.DC2LSST_u.AB
  Generating new AB file El_B2004a.DC2LSST_g.AB....
El_B2004a DC2LSST_g
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/El_B2004a.DC2LSST_g.AB
  Generating new AB file El_B2004a.DC2LSST_r.AB....
El_B2004a DC2LSST_r
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/El_B2004a.DC2LSST_r.AB
  Generating new AB file El_B2004a.DC2LSST_i.AB....
El_B2004a DC2LSST_i
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/El_B2004a.DC2LSST_i.AB
  Generating new AB file El_B2004a.DC2LSST_z.AB....
El_B2004a DC2LSST_z
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/El_B2004a.DC2LSST_z.AB
  Generating new AB file El_B2004a.DC2LSST_y.AB....
El_B2004a DC2LSST_y
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/El_B2004a.DC2LSST_y.AB
  Generating new AB file Sbc_B2004a.DC2LSST_u.AB....
Sbc_B2004a DC2LSST_u
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Sbc_B2004a.DC2LSST_u.AB
  Generating new AB file Sbc_B2004a.DC2LSST_g.AB....
Sbc_B2004a DC2LSST_g
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Sbc_B2004a.DC2LSST_g.AB
  Generating new AB file Sbc_B2004a.DC2LSST_r.AB....
Sbc_B2004a DC2LSST_r
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Sbc_B2004a.DC2LSST_r.AB
  Generating new AB file Sbc_B2004a.DC2LSST_i.AB....
Sbc_B2004a DC2LSST_i
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Sbc_B2004a.DC2LSST_i.AB
  Generating new AB file Sbc_B2004a.DC2LSST_z.AB....
Sbc_B2004a DC2LSST_z
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Sbc_B2004a.DC2LSST_z.AB
  Generating new AB file Sbc_B2004a.DC2LSST_y.AB....
Sbc_B2004a DC2LSST_y
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Sbc_B2004a.DC2LSST_y.AB
  Generating new AB file Scd_B2004a.DC2LSST_u.AB....
Scd_B2004a DC2LSST_u
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Scd_B2004a.DC2LSST_u.AB
  Generating new AB file Scd_B2004a.DC2LSST_g.AB....
Scd_B2004a DC2LSST_g
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Scd_B2004a.DC2LSST_g.AB
  Generating new AB file Scd_B2004a.DC2LSST_r.AB....
Scd_B2004a DC2LSST_r
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Scd_B2004a.DC2LSST_r.AB
  Generating new AB file Scd_B2004a.DC2LSST_i.AB....
Scd_B2004a DC2LSST_i
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Scd_B2004a.DC2LSST_i.AB
  Generating new AB file Scd_B2004a.DC2LSST_z.AB....
Scd_B2004a DC2LSST_z
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Scd_B2004a.DC2LSST_z.AB
  Generating new AB file Scd_B2004a.DC2LSST_y.AB....
Scd_B2004a DC2LSST_y
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Scd_B2004a.DC2LSST_y.AB
  Generating new AB file Im_B2004a.DC2LSST_u.AB....
Im_B2004a DC2LSST_u
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Im_B2004a.DC2LSST_u.AB
  Generating new AB file Im_B2004a.DC2LSST_g.AB....
Im_B2004a DC2LSST_g
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Im_B2004a.DC2LSST_g.AB
  Generating new AB file Im_B2004a.DC2LSST_r.AB....
Im_B2004a DC2LSST_r
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Im_B2004a.DC2LSST_r.AB
  Generating new AB file Im_B2004a.DC2LSST_i.AB....
Im_B2004a DC2LSST_i
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Im_B2004a.DC2LSST_i.AB
  Generating new AB file Im_B2004a.DC2LSST_z.AB....
Im_B2004a DC2LSST_z
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Im_B2004a.DC2LSST_z.AB
  Generating new AB file Im_B2004a.DC2LSST_y.AB....
Im_B2004a DC2LSST_y
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/Im_B2004a.DC2LSST_y.AB
  Generating new AB file SB3_B2004a.DC2LSST_u.AB....
SB3_B2004a DC2LSST_u
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB3_B2004a.DC2LSST_u.AB
  Generating new AB file SB3_B2004a.DC2LSST_g.AB....
SB3_B2004a DC2LSST_g
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB3_B2004a.DC2LSST_g.AB
  Generating new AB file SB3_B2004a.DC2LSST_r.AB....
SB3_B2004a DC2LSST_r
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB3_B2004a.DC2LSST_r.AB
  Generating new AB file SB3_B2004a.DC2LSST_i.AB....
SB3_B2004a DC2LSST_i
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB3_B2004a.DC2LSST_i.AB
  Generating new AB file SB3_B2004a.DC2LSST_z.AB....
SB3_B2004a DC2LSST_z
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB3_B2004a.DC2LSST_z.AB
  Generating new AB file SB3_B2004a.DC2LSST_y.AB....
SB3_B2004a DC2LSST_y
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB3_B2004a.DC2LSST_y.AB
  Generating new AB file SB2_B2004a.DC2LSST_u.AB....
SB2_B2004a DC2LSST_u
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB2_B2004a.DC2LSST_u.AB
  Generating new AB file SB2_B2004a.DC2LSST_g.AB....
SB2_B2004a DC2LSST_g
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB2_B2004a.DC2LSST_g.AB
  Generating new AB file SB2_B2004a.DC2LSST_r.AB....
SB2_B2004a DC2LSST_r
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB2_B2004a.DC2LSST_r.AB
  Generating new AB file SB2_B2004a.DC2LSST_i.AB....
SB2_B2004a DC2LSST_i
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB2_B2004a.DC2LSST_i.AB
  Generating new AB file SB2_B2004a.DC2LSST_z.AB....
SB2_B2004a DC2LSST_z
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB2_B2004a.DC2LSST_z.AB
  Generating new AB file SB2_B2004a.DC2LSST_y.AB....
SB2_B2004a DC2LSST_y
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/SB2_B2004a.DC2LSST_y.AB
  Generating new AB file ssp_25Myr_z008.DC2LSST_u.AB....
ssp_25Myr_z008 DC2LSST_u
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_25Myr_z008.DC2LSST_u.AB
  Generating new AB file ssp_25Myr_z008.DC2LSST_g.AB....
ssp_25Myr_z008 DC2LSST_g
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_25Myr_z008.DC2LSST_g.AB
  Generating new AB file ssp_25Myr_z008.DC2LSST_r.AB....
ssp_25Myr_z008 DC2LSST_r
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_25Myr_z008.DC2LSST_r.AB
  Generating new AB file ssp_25Myr_z008.DC2LSST_i.AB....
ssp_25Myr_z008 DC2LSST_i
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_25Myr_z008.DC2LSST_i.AB
  Generating new AB file ssp_25Myr_z008.DC2LSST_z.AB....
ssp_25Myr_z008 DC2LSST_z
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_25Myr_z008.DC2LSST_z.AB
  Generating new AB file ssp_25Myr_z008.DC2LSST_y.AB....
ssp_25Myr_z008 DC2LSST_y
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_25Myr_z008.DC2LSST_y.AB
  Generating new AB file ssp_5Myr_z008.DC2LSST_u.AB....
ssp_5Myr_z008 DC2LSST_u
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_5Myr_z008.DC2LSST_u.AB
  Generating new AB file ssp_5Myr_z008.DC2LSST_g.AB....
ssp_5Myr_z008 DC2LSST_g
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_5Myr_z008.DC2LSST_g.AB
  Generating new AB file ssp_5Myr_z008.DC2LSST_r.AB....
ssp_5Myr_z008 DC2LSST_r
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_5Myr_z008.DC2LSST_r.AB
  Generating new AB file ssp_5Myr_z008.DC2LSST_i.AB....
ssp_5Myr_z008 DC2LSST_i
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_5Myr_z008.DC2LSST_i.AB
  Generating new AB file ssp_5Myr_z008.DC2LSST_z.AB....
ssp_5Myr_z008 DC2LSST_z
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_5Myr_z008.DC2LSST_z.AB
  Generating new AB file ssp_5Myr_z008.DC2LSST_y.AB....
ssp_5Myr_z008 DC2LSST_y
x_res[0] 3000.0
x_res[-1] 11500.0
Writing AB file  /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/estimation/data/AB/ssp_5Myr_z008.DC2LSST_y.AB
Process 0 running estimator on chunk 0 - 150
Inserting handle into data store.  output_estimate_bpz: inprogress_output_estimate_bpz.hdf5, estimate_bpz

Evaluate the estimates¶

Now we evaluate metrics on the estimates, separately for each estimator.

Each call to the Evaluator.evaluate will create a table with the various performance metrics. We will store all of these tables in a dictionary, keyed by the name of the estimator.

In [19]:
eval_dict = dict(bpz=bpz_estimated, fzboost=fzboost_estimated, knn=knn_estimated)
truth = test_data_orig

result_dict = {}
for key, val in eval_dict.items():
    the_eval = Evaluator.make_stage(name=f'{key}_eval', truth=truth)
    result_dict[key] = the_eval.evaluate(val, truth)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/evaluation/metrics/pit.py:176: UserWarning: p-value floored: true value smaller than 0.001
  ad_results = stats.anderson_ksamp([pits_clean, uniform_yvals])
Inserting handle into data store.  output_bpz_eval: inprogress_output_bpz_eval.hdf5, bpz_eval
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Inserting handle into data store.  output_fzboost_eval: inprogress_output_fzboost_eval.hdf5, fzboost_eval
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
5 PITs removed from the sample.
Inserting handle into data store.  output_knn_eval: inprogress_output_knn_eval.hdf5, knn_eval
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'
Warning.  Failed to convert column 'list' object has no attribute 'dtype'

The Pandas DataFrame output format conveniently makes human-readable printouts of the metrics.
This next cell will convert everything to Pandas.

In [20]:
results_tables = {key:tables_io.convertObj(val.data, tables_io.types.PD_DATAFRAME) for key,val in result_dict.items()}
In [21]:
results_tables['knn']
Out[21]:
PIT_KS_stat PIT_KS_pval PIT_CvM_stat PIT_CvM_pval PIT_OutRate POINT_SimgaIQR POINT_Bias POINT_OutlierRate POINT_SigmaMAD CDE_stat CDE_pval
0 0.343531 2.604196e-16 5.234572 1.161805e-10 0.308671 0.197134 -0.077817 0.06 0.197723 1.258707 NaN
In [22]:
results_tables['fzboost']
Out[22]:
PIT_KS_stat PIT_KS_pval PIT_CvM_stat PIT_CvM_pval PIT_OutRate POINT_SimgaIQR POINT_Bias POINT_OutlierRate POINT_SigmaMAD CDE_stat CDE_pval
0 0.340544 4.985259e-16 5.437786 2.456322e-10 0.398692 0.281459 -0.132998 0.0 0.264428 1.807554 NaN
In [23]:
results_tables['bpz']
Out[23]:
PIT_KS_stat PIT_KS_pval PIT_CvM_stat PIT_CvM_pval PIT_OutRate POINT_SimgaIQR POINT_Bias POINT_OutlierRate POINT_SigmaMAD CDE_stat CDE_pval
0 0.274119 1.881857e-10 5.818235 3.712586e-12 0.21149 0.139398 -0.069721 0.12 0.13254 4.825021 NaN

Summarize the per-galaxy redshift constraints to make population-level distributions¶

{introduce the summarizers}

First we make the stages, then execute them, then plot the output.

In [24]:
point_estimate_test = PointEstimateHist.make_stage(name='point_estimate_test')
naive_stack_test = NaiveStack.make_stage(name='naive_stack_test')
In [25]:
point_estimate_ens = point_estimate_test.summarize(eval_dict['bpz'])
naive_stack_ens = naive_stack_test.summarize(eval_dict['bpz'])
Inserting handle into data store.  output_point_estimate_test: inprogress_output_point_estimate_test.hdf5, point_estimate_test
Inserting handle into data store.  single_NZ_point_estimate_test: inprogress_single_NZ_point_estimate_test.hdf5, point_estimate_test
Inserting handle into data store.  output_naive_stack_test: inprogress_output_naive_stack_test.hdf5, naive_stack_test
Inserting handle into data store.  single_NZ_naive_stack_test: inprogress_single_NZ_naive_stack_test.hdf5, naive_stack_test
In [26]:
_ = naive_stack_ens.data.plot_native(xlim=(0,3))
In [27]:
_ = point_estimate_ens.data.plot_native(xlim=(0,3))

Convert this to a ceci Pipeline¶

Now that we have all these stages defined and configured, and that we have established the connections between them by passing DataHandle objects between them, we can build a ceci Pipeline.

In [28]:
import ceci
pipe = ceci.Pipeline.interactive()
stages = [
    # train the flow
    flow_modeler,
    # create the training catalog
    flow_creator_train, lsst_error_model_train, inv_redshift,
    line_confusion, quantity_cut, col_remapper_train, table_conv_train,
    # create the test catalog
    flow_creator_test, lsst_error_model_test, col_remapper_test, table_conv_test,
    # inform the estimators
    inform_bpz, inform_knn, inform_fzboost,
    # estimate posteriors
    estimate_bpz, estimate_knn, estimate_fzboost,
    # estimate n(z), aka "summarize"
    point_estimate_test, naive_stack_test,
]
for stage in stages:
    pipe.add_stage(stage)
In [29]:
pipe.initialize(dict(input=catalog_file), dict(output_dir='.', log_dir='.', resume=False), None)
Out[29]:
(({'flow_modeler': <Job flow_modeler>,
   'flow_creator_test': <Job flow_creator_test>,
   'lsst_error_model_test': <Job lsst_error_model_test>,
   'col_remapper_test': <Job col_remapper_test>,
   'table_conv_test': <Job table_conv_test>,
   'flow_creator_train': <Job flow_creator_train>,
   'lsst_error_model_train': <Job lsst_error_model_train>,
   'inv_redshift': <Job inv_redshift>,
   'line_confusion': <Job line_confusion>,
   'quantity_cut': <Job quantity_cut>,
   'col_remapper_train': <Job col_remapper_train>,
   'table_conv_train': <Job table_conv_train>,
   'inform_FZBoost': <Job inform_FZBoost>,
   'test_FZBoost': <Job test_FZBoost>,
   'inform_knn': <Job inform_knn>,
   'estimate_knn': <Job estimate_knn>,
   'inform_bpz': <Job inform_bpz>,
   'estimate_bpz': <Job estimate_bpz>,
   'naive_stack_test': <Job naive_stack_test>,
   'point_estimate_test': <Job point_estimate_test>},
  [<rail.creation.engines.flowEngine.FlowModeler at 0x7fa00979ef40>,
   <rail.creation.engines.flowEngine.FlowCreator at 0x7f9fdc7e2af0>,
   LSSTErrorModel parameters:
   
   Model for bands: mag_u_lsst, mag_g_lsst, mag_r_lsst, mag_i_lsst, mag_z_lsst, mag_y_lsst
   
   Using error type point
   Exposure time = 30.0 s
   Number of years of observations = 10.0
   Mean visits per year per band:
      mag_u_lsst: 5.6, mag_g_lsst: 8.0, mag_r_lsst: 18.4, mag_i_lsst: 18.4, mag_z_lsst: 16.0, mag_y_lsst: 16.0
   Airmass = 1.2
   Irreducible system error = 0.005
   Magnitudes dimmer than 30.0 are set to nan
   gamma for each band:
      mag_u_lsst: 0.038, mag_g_lsst: 0.039, mag_r_lsst: 0.039, mag_i_lsst: 0.039, mag_z_lsst: 0.039, mag_y_lsst: 0.039
   
   The coadded 5-sigma limiting magnitudes are:
   mag_u_lsst: 26.04, mag_g_lsst: 27.29, mag_r_lsst: 27.31, mag_i_lsst: 26.87, mag_z_lsst: 26.23, mag_y_lsst: 25.30
   
   The following single-visit 5-sigma limiting magnitudes are
   calculated using the parameters that follow them:
      mag_u_lsst: 23.83, mag_g_lsst: 24.90, mag_r_lsst: 24.47, mag_i_lsst: 24.03, mag_z_lsst: 23.46, mag_y_lsst: 22.53
   Cm for each band:
      mag_u_lsst: 23.09, mag_g_lsst: 24.42, mag_r_lsst: 24.44, mag_i_lsst: 24.32, mag_z_lsst: 24.16, mag_y_lsst: 23.73
   Median zenith sky brightness in each band:
      mag_u_lsst: 22.99, mag_g_lsst: 22.26, mag_r_lsst: 21.2, mag_i_lsst: 20.48, mag_z_lsst: 19.6, mag_y_lsst: 18.61
   Median zenith seeing FWHM (in arcseconds) for each band:
      mag_u_lsst: 0.81, mag_g_lsst: 0.77, mag_r_lsst: 0.73, mag_i_lsst: 0.71, mag_z_lsst: 0.69, mag_y_lsst: 0.68
   Extinction coefficient for each band:
      mag_u_lsst: 0.491, mag_g_lsst: 0.213, mag_r_lsst: 0.126, mag_i_lsst: 0.096, mag_z_lsst: 0.069, mag_y_lsst: 0.17,
   Stage that applies remaps the following column names in a pandas DataFrame:
   f{str(self.config.columns)},
   <rail.core.utilStages.TableConverter at 0x7f9fdcaffe50>,
   <rail.creation.engines.flowEngine.FlowCreator at 0x7f9fdc26e280>,
   LSSTErrorModel parameters:
   
   Model for bands: mag_u_lsst, mag_g_lsst, mag_r_lsst, mag_i_lsst, mag_z_lsst, mag_y_lsst
   
   Using error type point
   Exposure time = 30.0 s
   Number of years of observations = 10.0
   Mean visits per year per band:
      mag_u_lsst: 5.6, mag_g_lsst: 8.0, mag_r_lsst: 18.4, mag_i_lsst: 18.4, mag_z_lsst: 16.0, mag_y_lsst: 16.0
   Airmass = 1.2
   Irreducible system error = 0.005
   Magnitudes dimmer than 30.0 are set to nan
   gamma for each band:
      mag_u_lsst: 0.038, mag_g_lsst: 0.039, mag_r_lsst: 0.039, mag_i_lsst: 0.039, mag_z_lsst: 0.039, mag_y_lsst: 0.039
   
   The coadded 5-sigma limiting magnitudes are:
   mag_u_lsst: 26.04, mag_g_lsst: 27.29, mag_r_lsst: 27.31, mag_i_lsst: 26.87, mag_z_lsst: 26.23, mag_y_lsst: 25.30
   
   The following single-visit 5-sigma limiting magnitudes are
   calculated using the parameters that follow them:
      mag_u_lsst: 23.83, mag_g_lsst: 24.90, mag_r_lsst: 24.47, mag_i_lsst: 24.03, mag_z_lsst: 23.46, mag_y_lsst: 22.53
   Cm for each band:
      mag_u_lsst: 23.09, mag_g_lsst: 24.42, mag_r_lsst: 24.44, mag_i_lsst: 24.32, mag_z_lsst: 24.16, mag_y_lsst: 23.73
   Median zenith sky brightness in each band:
      mag_u_lsst: 22.99, mag_g_lsst: 22.26, mag_r_lsst: 21.2, mag_i_lsst: 20.48, mag_z_lsst: 19.6, mag_y_lsst: 18.61
   Median zenith seeing FWHM (in arcseconds) for each band:
      mag_u_lsst: 0.81, mag_g_lsst: 0.77, mag_r_lsst: 0.73, mag_i_lsst: 0.71, mag_z_lsst: 0.69, mag_y_lsst: 0.68
   Extinction coefficient for each band:
      mag_u_lsst: 0.491, mag_g_lsst: 0.213, mag_r_lsst: 0.126, mag_i_lsst: 0.096, mag_z_lsst: 0.069, mag_y_lsst: 0.17,
   <rail.creation.degradation.spectroscopic_degraders.InvRedshiftIncompleteness at 0x7f9fdc1f34c0>,
   <rail.creation.degradation.spectroscopic_degraders.LineConfusion at 0x7f9fdc220a90>,
   Degrader that applies the following cuts to a pandas DataFrame:
   {column: (min, max), ...}
   {'mag_i_lsst': (-inf, 25.0)},
   Stage that applies remaps the following column names in a pandas DataFrame:
   f{str(self.config.columns)},
   <rail.core.utilStages.TableConverter at 0x7f9fdc220460>,
   <rail.estimation.algos.flexzboost.Inform_FZBoost at 0x7f9fecf61040>,
   <rail.estimation.algos.flexzboost.FZBoost at 0x7f9fdc8f8fa0>,
   <rail.estimation.algos.knnpz.Inform_KNearNeighPDF at 0x7f9fecf3c2b0>,
   <rail.estimation.algos.knnpz.KNearNeighPDF at 0x7f9fdc21bc70>,
   <rail.estimation.algos.bpz_lite.Inform_BPZ_lite at 0x7f9fecf3ce80>,
   <rail.estimation.algos.bpz_lite.BPZ_lite at 0x7f9fb9c49d60>,
   <rail.estimation.algos.naiveStack.NaiveStack at 0x7f9fdc722430>,
   <rail.estimation.algos.pointEstimateHist.PointEstimateHist at 0x7f9fdc722670>]),
 {'output_dir': '.', 'log_dir': '.', 'resume': False})
In [30]:
pipe.save('tmp_goldenspike.yml')

Read back the pipeline and run it¶

In [31]:
pr = ceci.Pipeline.read('tmp_goldenspike.yml')
In [32]:
pr.run()
Executing flow_modeler
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.creation.engines.flowEngine.FlowModeler   --input=/home/runner/work/RAIL/RAIL/examples/goldenspike/data/base_catalog.pq   --name=flow_modeler   --config=tmp_goldenspike_config.yml   --model=.//home/runner/work/RAIL/RAIL/examples/goldenspike/data/trained_flow.pkl 
Output writing to ./flow_modeler.out

Job flow_modeler has completed successfully!

Executing flow_creator_test
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.creation.engines.flowEngine.FlowCreator   --model=.//home/runner/work/RAIL/RAIL/examples/goldenspike/data/trained_flow.pkl   --name=flow_creator_test   --config=tmp_goldenspike_config.yml   --output=./output_flow_creator_test.pq 
Output writing to ./flow_creator_test.out

Job flow_creator_test has completed successfully!

Executing lsst_error_model_test
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.creation.degradation.lsst_error_model.LSSTErrorModel   --input=./output_flow_creator_test.pq   --name=lsst_error_model_test   --config=tmp_goldenspike_config.yml   --output=./output_lsst_error_model_test.pq 
Output writing to ./lsst_error_model_test.out

Job lsst_error_model_test has completed successfully!

Executing col_remapper_test
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.core.utilStages.ColumnMapper   --input=./output_lsst_error_model_test.pq   --name=col_remapper_test   --config=tmp_goldenspike_config.yml   --output=./output_col_remapper_test.pq 
Output writing to ./col_remapper_test.out

Job col_remapper_test has completed successfully!

Executing table_conv_test
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.core.utilStages.TableConverter   --input=./output_col_remapper_test.pq   --name=table_conv_test   --config=tmp_goldenspike_config.yml   --output=./output_table_conv_test.hdf5 
Output writing to ./table_conv_test.out

Job table_conv_test has completed successfully!

Executing flow_creator_train
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.creation.engines.flowEngine.FlowCreator   --model=.//home/runner/work/RAIL/RAIL/examples/goldenspike/data/trained_flow.pkl   --name=flow_creator_train   --config=tmp_goldenspike_config.yml   --output=./output_flow_creator_train.pq 
Output writing to ./flow_creator_train.out

Job flow_creator_train has completed successfully!

Executing lsst_error_model_train
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.creation.degradation.lsst_error_model.LSSTErrorModel   --input=./output_flow_creator_train.pq   --name=lsst_error_model_train   --config=tmp_goldenspike_config.yml   --output=./output_lsst_error_model_train.pq 
Output writing to ./lsst_error_model_train.out

Job lsst_error_model_train has completed successfully!

Executing inv_redshift
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.creation.degradation.spectroscopic_degraders.InvRedshiftIncompleteness   --input=./output_lsst_error_model_train.pq   --name=inv_redshift   --config=tmp_goldenspike_config.yml   --output=./output_inv_redshift.pq 
Output writing to ./inv_redshift.out

Job inv_redshift has completed successfully!

Executing line_confusion
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.creation.degradation.spectroscopic_degraders.LineConfusion   --input=./output_inv_redshift.pq   --name=line_confusion   --config=tmp_goldenspike_config.yml   --output=./output_line_confusion.pq 
Output writing to ./line_confusion.out

[17:08:13] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:13] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:13] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:13] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:13] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:13] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:13] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:13] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:13] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:13] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

[17:08:14] WARNING: ../src/learner.cc:767: 
Parameters: { "silent" } are not used.

Job line_confusion has completed successfully!

Executing quantity_cut
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.creation.degradation.quantityCut.QuantityCut   --input=./output_line_confusion.pq   --name=quantity_cut   --config=tmp_goldenspike_config.yml   --output=./output_quantity_cut.pq 
Output writing to ./quantity_cut.out

Job quantity_cut has completed successfully!

Executing col_remapper_train
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.core.utilStages.ColumnMapper   --input=./output_quantity_cut.pq   --name=col_remapper_train   --config=tmp_goldenspike_config.yml   --output=./output_col_remapper_train.pq 
Output writing to ./col_remapper_train.out

Job col_remapper_train has completed successfully!

Executing table_conv_train
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.core.utilStages.TableConverter   --input=./output_col_remapper_train.pq   --name=table_conv_train   --config=tmp_goldenspike_config.yml   --output=./output_table_conv_train.hdf5 
Output writing to ./table_conv_train.out

Job table_conv_train has completed successfully!

Executing inform_FZBoost
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.estimation.algos.flexzboost.Inform_FZBoost   --input=./output_table_conv_train.hdf5   --name=inform_FZBoost   --config=tmp_goldenspike_config.yml   --model=./fzboost.pkl 
Output writing to ./inform_FZBoost.out

Job inform_FZBoost has completed successfully!

Executing test_FZBoost
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.estimation.algos.flexzboost.FZBoost   --model=./fzboost.pkl   --input=./output_table_conv_test.hdf5   --name=test_FZBoost   --config=tmp_goldenspike_config.yml   --output=./output_test_FZBoost.hdf5 
Output writing to ./test_FZBoost.out

Job test_FZBoost has completed successfully!

Executing inform_knn
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.estimation.algos.knnpz.Inform_KNearNeighPDF   --input=./output_table_conv_train.hdf5   --name=inform_knn   --config=tmp_goldenspike_config.yml   --model=./knnpz.pkl 
Output writing to ./inform_knn.out

Job inform_knn has completed successfully!

Executing estimate_knn
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.estimation.algos.knnpz.KNearNeighPDF   --model=./knnpz.pkl   --input=./output_table_conv_test.hdf5   --name=estimate_knn   --config=tmp_goldenspike_config.yml   --output=./output_estimate_knn.hdf5 
Output writing to ./estimate_knn.out

Job estimate_knn has completed successfully!

Executing inform_bpz
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.estimation.algos.bpz_lite.Inform_BPZ_lite   --input=./output_table_conv_train.hdf5   --name=inform_bpz   --config=tmp_goldenspike_config.yml   --model=./bpz.pkl 
Output writing to ./inform_bpz.out

Job inform_bpz has completed successfully!

Executing estimate_bpz
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.estimation.algos.bpz_lite.BPZ_lite   --model=./bpz.pkl   --input=./output_table_conv_test.hdf5   --name=estimate_bpz   --config=tmp_goldenspike_config.yml   --output=./output_estimate_bpz.hdf5 
Output writing to ./estimate_bpz.out

Job estimate_bpz has completed successfully!

Executing naive_stack_test
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.estimation.algos.naiveStack.NaiveStack   --input=./output_estimate_bpz.hdf5   --name=naive_stack_test   --config=tmp_goldenspike_config.yml   --output=./output_naive_stack_test.hdf5   --single_NZ=./single_NZ_naive_stack_test.hdf5 
Output writing to ./naive_stack_test.out

Job naive_stack_test has completed successfully!

Executing point_estimate_test
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.estimation.algos.pointEstimateHist.PointEstimateHist   --input=./output_estimate_bpz.hdf5   --name=point_estimate_test   --config=tmp_goldenspike_config.yml   --output=./output_point_estimate_test.hdf5   --single_NZ=./single_NZ_point_estimate_test.hdf5 
Output writing to ./point_estimate_test.out

Job point_estimate_test has completed successfully!
Out[32]:
0

Clean up:¶

Finally, you'll notice that we've written a large number of temporary files in the course of running this demo, to delete these and clean up the directory just run the cleanup.sh script in this directory to delete the data files.

In [ ]:
 
In [ ]: