author: Eric Charles
last run successfully: Dec 12, 2022
This notbook shows how to:
import os
import numpy as np
import ceci
import rail
from rail.core.stage import RailStage
from rail.creation.degradation import LSSTErrorModel, InvRedshiftIncompleteness, LineConfusion, QuantityCut
from rail.creation.engines.flowEngine import FlowCreator, FlowPosterior
from rail.core.data import TableHandle
from rail.core.stage import RailStage
from rail.core.utilStages import ColumnMapper, TableConverter
We'll start by setting up the Rail data store. RAIL uses ceci, which is designed for pipelines rather than interactive notebooks; the data store will work around that and enable us to use data interactively.
When working interactively, we want to allow overwriting data in the Rail data store to avoid errors if we re-run cells.
See the rail/examples/goldenspike/goldenspike.ipynb
example notebook for more details on the Data Store.
DS = RailStage.data_store
DS.__class__.allow_overwrite = True
The example pipeline builds some of the RAIL creation functionality into a pipeline.
Here we are defining:
from rail.core.utils import RAILDIR
flow_file = os.path.join(RAILDIR, 'rail/examples/goldenspike/data/pretrained_flow.pkl')
bands = ['u','g','r','i','z','y']
band_dict = {band:f'mag_{band}_lsst' for band in bands}
rename_dict = {f'mag_{band}_lsst_err':f'mag_err_{band}_lsst' for band in bands}
post_grid = [float(x) for x in np.linspace(0., 5, 21)]
The RailStage base class defines the make_stage
"classmethod" function, which allows us to make a stage of
that particular type in a general way.
Note that that we are passing in the configuration parameters to each pipeline stage as keyword arguments.
The names of the parameters will depend on the stage type.
A couple of things are important:
ceci
, stage names default to the name of the class (e.g., FlowCreator, or LSSTErrorModel); this would be problematic if you wanted two stages of the same type in a given pipeline, so be sure to assign each stage its own name.flow_engine_test = FlowCreator.make_stage(name='flow_engine_test',
model=flow_file, n_samples=50)
lsst_error_model_test = LSSTErrorModel.make_stage(name='lsst_error_model_test',
bandNames=band_dict)
col_remapper_test = ColumnMapper.make_stage(name='col_remapper_test', hdf5_groupname='',
columns=rename_dict)
flow_post_test = FlowPosterior.make_stage(name='flow_post_test',
column='redshift', flow=flow_file,
grid=post_grid)
table_conv_test = TableConverter.make_stage(name='table_conv_test', output_format='numpyDict',
seed=12345)
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Inserting handle into data store. model: /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/goldenspike/data/pretrained_flow.pkl, flow_engine_test
Here we make an empty interactive pipeline (interactive in the sense that it will be run locally, rather than using the batch submission mechanisms built into ceci
), and add the stages to that pipeline.
pipe = ceci.Pipeline.interactive()
stages = [flow_engine_test, lsst_error_model_test, col_remapper_test, table_conv_test]
for stage in stages:
pipe.add_stage(stage)
I.e., some functions that you can use to figure out what the pipeline is doing.
# Get the names of the stages
pipe.stage_names
['flow_engine_test', 'lsst_error_model_test', 'col_remapper_test', 'table_conv_test']
# Get the configuration of a particular stage
pipe.flow_engine_test.config
StageConfig{output_mode:default,n_samples:50,seed:12345,name:flow_engine_test,model:/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/goldenspike/data/pretrained_flow.pkl,config:None,aliases:{'output': 'output_flow_engine_test'},}
# Get the list of outputs 'tags'
# These are how the stage thinks of the outputs, as a list names associated to DataHandle types.
pipe.flow_engine_test.outputs
[('output', rail.core.data.PqHandle)]
# Get the list of outputs 'aliased tags'
# These are how the pipeline things of the outputs, as a unique key that points to a particular file
pipe.flow_engine_test._outputs
{'output_flow_engine_test': 'output_flow_engine_test.pq'}
We can use the RailStage.connect_input
function to connect one stage to another.
By default, this will connect the output data product called output
for one stage.
lsst_error_model_test.connect_input(flow_engine_test)
col_remapper_test.connect_input(lsst_error_model_test)
#flow_post_test.connect_input(col_remapper_test, inputTag='input')
table_conv_test.connect_input(col_remapper_test)
Inserting handle into data store. output_flow_engine_test: inprogress_output_flow_engine_test.pq, flow_engine_test Inserting handle into data store. output_lsst_error_model_test: inprogress_output_lsst_error_model_test.pq, lsst_error_model_test Inserting handle into data store. output_col_remapper_test: inprogress_output_col_remapper_test.pq, col_remapper_test
This will do a few things:
pipe.initialize(dict(model=flow_file), dict(output_dir='.', log_dir='.', resume=False), None)
(({'flow_engine_test': <Job flow_engine_test>, 'lsst_error_model_test': <Job lsst_error_model_test>, 'col_remapper_test': <Job col_remapper_test>, 'table_conv_test': <Job table_conv_test>}, [<rail.creation.engines.flowEngine.FlowCreator at 0x7f9d68a80a00>, LSSTErrorModel parameters: Model for bands: mag_u_lsst, mag_g_lsst, mag_r_lsst, mag_i_lsst, mag_z_lsst, mag_y_lsst Using error type point Exposure time = 30.0 s Number of years of observations = 10.0 Mean visits per year per band: mag_u_lsst: 5.6, mag_g_lsst: 8.0, mag_r_lsst: 18.4, mag_i_lsst: 18.4, mag_z_lsst: 16.0, mag_y_lsst: 16.0 Airmass = 1.2 Irreducible system error = 0.005 Magnitudes dimmer than 30.0 are set to nan gamma for each band: mag_u_lsst: 0.038, mag_g_lsst: 0.039, mag_r_lsst: 0.039, mag_i_lsst: 0.039, mag_z_lsst: 0.039, mag_y_lsst: 0.039 The coadded 5-sigma limiting magnitudes are: mag_u_lsst: 26.04, mag_g_lsst: 27.29, mag_r_lsst: 27.31, mag_i_lsst: 26.87, mag_z_lsst: 26.23, mag_y_lsst: 25.30 The following single-visit 5-sigma limiting magnitudes are calculated using the parameters that follow them: mag_u_lsst: 23.83, mag_g_lsst: 24.90, mag_r_lsst: 24.47, mag_i_lsst: 24.03, mag_z_lsst: 23.46, mag_y_lsst: 22.53 Cm for each band: mag_u_lsst: 23.09, mag_g_lsst: 24.42, mag_r_lsst: 24.44, mag_i_lsst: 24.32, mag_z_lsst: 24.16, mag_y_lsst: 23.73 Median zenith sky brightness in each band: mag_u_lsst: 22.99, mag_g_lsst: 22.26, mag_r_lsst: 21.2, mag_i_lsst: 20.48, mag_z_lsst: 19.6, mag_y_lsst: 18.61 Median zenith seeing FWHM (in arcseconds) for each band: mag_u_lsst: 0.81, mag_g_lsst: 0.77, mag_r_lsst: 0.73, mag_i_lsst: 0.71, mag_z_lsst: 0.69, mag_y_lsst: 0.68 Extinction coefficient for each band: mag_u_lsst: 0.491, mag_g_lsst: 0.213, mag_r_lsst: 0.126, mag_i_lsst: 0.096, mag_z_lsst: 0.069, mag_y_lsst: 0.17, Stage that applies remaps the following column names in a pandas DataFrame: f{str(self.config.columns)}, <rail.core.utilStages.TableConverter at 0x7f9d649f40d0>]), {'output_dir': '.', 'log_dir': '.', 'resume': False})
This will actually write two files (b/c this is what ceci
wants)
pipe.save('pipe_saved.yml')
pr = ceci.Pipeline.read('pipe_saved.yml')
This will actually launch Unix process to individually run each stage of the pipeline; you can see the commands that are being executed in each case.
pr.run()
Executing flow_engine_test Command is: OMP_NUM_THREADS=1 python3 -m ceci rail.creation.engines.flowEngine.FlowCreator --model=/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/rail/examples/goldenspike/data/pretrained_flow.pkl --name=flow_engine_test --config=pipe_saved_config.yml --output=./output_flow_engine_test.pq Output writing to ./flow_engine_test.out Job flow_engine_test has completed successfully! Executing lsst_error_model_test Command is: OMP_NUM_THREADS=1 python3 -m ceci rail.creation.degradation.lsst_error_model.LSSTErrorModel --input=./output_flow_engine_test.pq --name=lsst_error_model_test --config=pipe_saved_config.yml --output=./output_lsst_error_model_test.pq Output writing to ./lsst_error_model_test.out Job lsst_error_model_test has completed successfully! Executing col_remapper_test Command is: OMP_NUM_THREADS=1 python3 -m ceci rail.core.utilStages.ColumnMapper --input=./output_lsst_error_model_test.pq --name=col_remapper_test --config=pipe_saved_config.yml --output=./output_col_remapper_test.pq Output writing to ./col_remapper_test.out Job col_remapper_test has completed successfully! Executing table_conv_test Command is: OMP_NUM_THREADS=1 python3 -m ceci rail.core.utilStages.TableConverter --input=./output_col_remapper_test.pq --name=table_conv_test --config=pipe_saved_config.yml --output=./output_table_conv_test.hdf5 Output writing to ./table_conv_test.out Job table_conv_test has completed successfully!
0