«  Plotting   ::   Contents   ::   Running a test  »

Run MLZ

Input file

A brief explanation on how to run MLZ. The main code is included as a executable and can be called directly or its directory, its content can be viewed here

A self-explanatory view of the Input file template is helpful to look at before running the code. This file can be used as a template for other files, the parameters can be checked in advance by setting CheckOnly to yes.

Note that the names of the variables are case insensitive but all of them need to be present.

Prepare data

Both the training file and test file must have the attributes (magnitudes, colors, etc.) and (optimally) their errors. If errors are not present assume a very small value is used. For now ascii files and numpy files (.npy) are valid. Spectroscopic redshifts must be included on the training file, if present on the test file they can be used for testing the performance of MLZ, although it is not required.

Add the full path relative to the working directory of these file to the input file and define a output folder for the results.

There are 3 very important variables on the input file to specify the columns and the attributes to use by separating them using comas. Make sure to indicate the spectroscopic redshift by its name in the KeyAtt variable. Also always indicate the name of the error columns by adding the letter e in front of the name of the attribute (see the Input file template for an example)

In the Att variable, indicate the attributes to use to make a compute photo-z, you can add or remove attributes but make sure they are present on the columns names. Order is not important, but order in columns name are important

Some hidden parameters

In order to make the Input file template not too busy, there are some hidden parameters in the utils/utils_mlz.py file that are not frequently used and can be manually modified, among the most important ones:

  • oobfraction: fraction of data used for cross-validation, default is 1/3
  • dotrain: The training of the trees/maps is carried out, default is yes, set it to no if want to use same trees or maps on a separate data set, it can save some time for large training data
  • dotest: The test phase is carried out, default is yes, set it to no if only training is desired
  • multiplefiles: Write a PDF file per core for large runs to avoid communications bottleneck, default is no
  • writepdf: Write the PDF? default if yes, if not needed can be set to no

Run the code

Check the Running a test for a example use of the code on SDSS data including with the distribution.

To run the code, if using mpi4py from the main folder type:

$ mpirun -n <cores> ./runMLZ <input file>

Where <cores> is the number of processors desired to use and <input file> is the name of the Input file template. If not using mpi4py, type:

$ ./runMLZ <input file>

Or if distribution is build or installed using pip, just type:

$ runMLZ <input file>

This will create two folder on the output directory, one named trees (or maps) where several files for trees or maps are stored for further analysis and the other folder named results where the main results are stored as well as the parameters used. The .mlz file contains 7 columns (zspec, zmode, zmean, zconf_mode, zcond_mean, error_mode, error_mean) which summarizes the results if no PDF is further needed. The PDF for all the galaxies are also stored in the same folder.

Machine learning approach

MLZ can be used through TPZ or SOMz and whichever is used is set on the Input file template under the PredictionMode variable. Whether is a classification or a Regression problem this is set on the PredictionClass variable. There are some variables common for both approaches and other exclusively used by one of them. For classification labels you can must use integers can can use the variable MinZ and MaxZ to enclose the range of values. OOB and cross-validation data are computed when the variable OobError is set to yes and a ranking of variable importance can be computed if the variable VarImportance is set to yes.

Preview of results

Some routines are provided to preview some results. See the Running a test and plotting for more information and some examples of figures that can be created

«  Plotting   ::   Contents   ::   Running a test  »