Getting Started¶
Setting up directories¶
To run an MLy-Pipeline search we need a directory structure that will be used
by the features of the pipeline. To make the creation of such directory
structure easy we have a command automate the creation for us. Assumimg
that there is a local MLy-Pipeline installation (located at ./mlyPipeline
) we
can create the directory structure by simply typing:
python ./mlyPipeline/initialisation.py -p ./mysearch
Where instead of mysearch
you can use a more apropriate name that fits your
need. The command above will create the directory ./mysearch
and also the subdirectories
below (these are their default names):
triggerplot_directory
trigger_directory
output_directory
masterDirectory
bufferDirectory
falseAlarmRates
efficiencies
You can change the names of these directories by adding their new names as extra parameters in the command. For example if we want to have a custom name for masterDirectory we can do:
python ./mlyPipeline/initialisation.py -p ./mysearch --masterDirectory customName
This applies for all the other directories.
In addition to that it will also create a script to run the search called runall.sh and a config file of .json format that has the parameters that are used by the search. It is up to the user to change those parameters to conduct the search the way they want.
Running the search under different circumstances¶
If you want to just run the search as it would be run in low latency (in real time) are set. In case you want to run the search on MDC data, which means you need a fixed background, then there is a light version of the same command. When we run on MDC we do not need some of the directories, and also some configuration parameters will need to change to ignore specific functionalities. The comand below will create a version of the search that does not do the continious FAR estimation and neither does efficiency tests. It also points the search to the MDC frames and channels.
python ./mlyPipeline/initialisation.py -p ./mysearch --mode MDC
Note
You can use the same “mode” to initialise a search directory for a moch data chalenge that has different frames and channels. The only thing you need to do is go to the config file (see below) and edit the frames and channels parameter.
The runall.sh script¶
The runall.sh
is the only thing that you need to run from inside ./mysearch
(or however you named it) directory for the search to comence. Processing a second
during a search, doing inference and issiuing a possible alert takes more than a second. For that reason we split our
search into many scripts to avoid queing delays. Hence is only one parameter that
can be adjusted in the inside the runall script and that is the STEP
parameter.
This paramater breaks the search into STEP
in number, non overlaping scripts.
The search as it is takes around 2 to maximum 3 seconds (rarely) to process a second.
To be sure we set the STEP=4
in case there are any delays. We suggest that
you keep that parameter fixed unless there is strong evidence that the latency
of individual scripts is bigger than STEP
.
Inside the runall.sh
script you will three main functions to be called,
search.py
, manager.py
and continuesFAR.py
.
search.py
runs the online search, saves the outputs and issues events when the output is bigger than the threshold where we define detections. The event creation is a paralell process that sends an alert to GraceDB and creates an event directory with the name of the GraceDB id. Inside this directory. it also creates plots of the data fed to the model and the skymap.manager.py
runs every 5 minutes. It organises all outputs into pandas data frames (saved in pickle format) and in fixed intervals it runs efficiency tests. It also creates plots and clears files that no longer are needed.continuesFAR.py
is called with two different parameters does two things in parallel.continuesFAR.py --mode generation
takes the data of the last hour saved in the masterDirectory and generates condor jobs. Each jobs greates a specific amount of timeshifted versions of these data and saves them in a temporary file in the scrach directory (falseAlarmRates/temp), ready to be used for background testing.continuesFAR.py --mode inference
does inference on the data generated using available GPUs or the GPUs specified inselectedGPUs
parameter. This script will load any time-lag data available and return a pandas data frame with the results. The asemble of those files is done by the managers script.
Configuration File¶
All the above functions get their parameters from the config.json
file. Below we are going to give descriptions about each config parameter. By changing the
config you change the way the search will run, so make sure that you check that
config is the way you want it after you create the search directory.
File Names and Paths¶
The following are just the directory names of the directories created by with
the initialisation.py
. If the default names were used, this will look like:
output_directory:”output_directory”
trigger_directory:”trigger_directory”
triggerplot_directory:”triggerplot_directory”
masterDirectory:”masterDirectory”
bufferDirectory:”bufferDirectory”
falseAlarmRates:”falseAlarmRates”
efficiencies:”efficiencies”
After this we have the path to your local mlyPipeline installation. This is
identified automatically when you run the initialisation.py
, and it will look
like below, do not edit that unles you moved your mlyPipeline directory.
mlyPipelineSource:”/home/<albert.einstein>/extraPath/mlyPipeline”
User and accounting group for condor jobs.
user_name: This is automatically filled by the enviroment
accounting_group: “allsky.mlyonline”
This is the name of the search directory, in our case it will look like:
path :”./mysearch”
Generator Function Parameters¶
The following parameters are passed to the generator function that processes the data before inference. The values assigned are the default values.
fs:1024 Sample frequency
duration:1 Duration of processing window
detectors:”HLV” Detectors used for the search
The prefix dictionary of the paths of directories where O3-replay and MDC data are. If the source of the data you use is different, you need to edit this parameter, after creating the search directory.
- prefixes:
0:”/dev/shm/kafka/H1_O3ReplayMDC/H-H1_O3ReplayMDC_llhoft-“
1:”/dev/shm/kafka/L1_O3ReplayMDC/L-L1_O3ReplayMDC_llhoft-“
2:”/dev/shm/kafka/V1_O3ReplayMDC/V-V1_O3ReplayMDC_llhoft-“
Indicating which data to use “MDC” for MDC or “NOISE” for O3-replay.
channels:”NOISE”
Requesting Data Parameters¶
parallel_scripts: 4 This is the STEP parameter insided the runall.sh script (see above).
wait:0.5 Time to wait before requesting a segment of data again
timeout:20 How many times to try requesting a data segment before going to the next.
required_buffer:16 How many seconds of data to request.
start_lag:92 How many seconds before the current gps time to start the search from. We expect that given the reset time below this will be reseted in the first attempt.
gps_reset_time:32 The amount of time difference in seconds where we reset the gps that we request to the current one. This is for cases where latency is running behind momenterily.
farfile: “/home/vasileios.skliris/mly-hermes/outputs/FARfile” The path to an initial FAR directory. When the search starts there will be no background estimation yet. This will take sometime to be produced and until then we use another background. The initial FAR estimation will be used until one year of background has been estimated. Then the manager will overight this path to the path of the search:
mysearch/falseAlarmRates/FARfile
Models¶
model1_path:”/home/mly/models/model1_32V_No5.h5” Coinsidence model (model 1).
model2_path:”/home/mly/models/model2_32V_No6.h5” Coherency model (model 2).
td_pe_model_path:”/home/mly/models/td_model” Time domain parameter estimation model.
fd_pe_model_path:”/home/mly/models/fd_model” Frequency domain parameter estimation model.
Skymap¶
skymap:true Option for generation of skymaps with each issued event.
nside:64 Parameter related to the resolution of the skymap.
Efficiecy Calculation Parameters¶
eff_config A dictionary of parameters that are related to the efficiency tests.
injectionDirectoryPath:”/home/mly/injections/” The path were all injection type directories are.
injectionsWithHRSS: [“SGE70Q3”, “SGE153Q8d9”, “SGL153Q8d9”, “WNB250”] The list of the injection directories that use HRSS.
injectionsWithSNR: [“cbc_20_20”, “wnb_03_train_pod”, “cusp_00”] The list of the injection directories that use SNR.
injectionHRSS:”1e-22,1e-21,1e-22” Intervals for tests that use HRSS (first, last, step).
injectionSNR:”0,50,5” Intervals for tests that use SNR (first, last step).
testSize:”100” Number of tests on each value of HRSS or SNR respectively.
howOften: 360 After how many successfull inferences to run an efficiency test.
Continious FAR estimation Parameters¶
far_config A dictionary of parameters that are related to the continious FAR tests.
batch_size: 1024 Batch size of inference. Used by hermes client inference.
threshold: 2.3148e-05 Default for once per 2 days (Hz). Used to define what is an event and what not.
restriction: 0.0001 The minimum score of an inference to keep it in the history.
max_lag: 3600 The maximum time distane allowed, between two lagged segments.
lags: 1024 The number of timeshifts applied on the zero-laged data to produced background tests.
batches: The amount of condor jobs to break the generation of background tests. This can be adjusted if they do not finish within the hour.
visible_gpu_devices: “local” GPU devices to use. Local will make all the local GPUs visible.
selectedGPUs: [0] An index list to choose which GPUs are to be used. Default is to use the first visible.
parallelGenerations: 3 How many dags ( each corresponding to an hour of data) are allowed to run at the same time. This is actually a condor_job number restriction. As the default values are, it will restrict the jobs to dags + jobs < parallelGenerations*batches.
Misc¶
maxDataFrameSize:360 The number of outputs grouped together in one data frame from the manager.
trigger_destination:null Which domain of GraceDB to send the event (test,dev,playground). When left empty it it does not send an event but it creates follow-up and seves it in a file with made up ID.