This short notebook aims to show the difference in PIT values computed now and in the past (for DC1 paper).
Three sets of PIT values are comapared:
A subset of the data (1%) is used for the purpose of illustration. Computing the PITs using scipy.integrate.quad would take many hours for the ~399k galaxies present in DC1 sample.
from sample import Sample
import matplotlib.pyplot as plt
import numpy as np
import os
%matplotlib inline
%reload_ext autoreload
%autoreload 2
my_path = "/Users/julia/TESTDC1FLEXZ"
pdfs_file = os.path.join(my_path, "1pct_Mar5Flexzgold_pz.out")
ztrue_file = os.path.join(my_path, "1pct_Mar5Flexzgold_idszmag.out")
oldpitfile = os.path.join(my_path,"1pct_TESTPITVALS.out") # TESTPITVALS.out masked by object ids in 1pct files
%%time
sample_qp_pit = Sample(pdfs_file, ztrue_file, code="FlexZBoost", name="DC1 paper data", qp_pit=True)
print(sample_qp_pit)
---------------------- Sample: DC1 paper data Algorithm: FlexZBoost ---------------------- 3993 PDFs with 200 probabilities each qp representation: interp z grid: 200 z values from 0.016282 to 1.99986 inclusive CPU times: user 599 ms, sys: 58.6 ms, total: 658 ms Wall time: 811 ms
%%time
pit_qp = sample_qp_pit.pit
CPU times: user 9.2 s, sys: 153 ms, total: 9.36 s Wall time: 10.6 s
%%time
sample_scipy_pit = Sample(pdfs_file, ztrue_file, code="FlexZBoost", name="DC1 paper data", qp_pit=False)
print(sample_scipy_pit)
---------------------- Sample: DC1 paper data Algorithm: FlexZBoost ---------------------- 3993 PDFs with 200 probabilities each qp representation: interp z grid: 200 z values from 0.016282 to 1.99986 inclusive CPU times: user 546 ms, sys: 42 ms, total: 588 ms Wall time: 637 ms
%%time
pit_scipy = sample_scipy_pit.pit
/Users/julia/github/RAIL/rail/evaluation/sample.py:116: IntegrationWarning: The maximum number of subdivisions (50) has been achieved. If increasing the limit yields no improvement it is advised to analyze the integrand in order to determine the difficulties. If the position of a local difficulty can be determined (singularity, discontinuity) one will probably gain from splitting up the interval and calling the integrator on the subranges. Perhaps a special-purpose integrator should be used. self._pit[i] = quad(tmpfunc, 0, self._ztrue[i])[0] /Users/julia/github/RAIL/rail/evaluation/sample.py:116: IntegrationWarning: The occurrence of roundoff error is detected, which prevents the requested tolerance from being achieved. The error may be underestimated. self._pit[i] = quad(tmpfunc, 0, self._ztrue[i])[0]
CPU times: user 2min 35s, sys: 2.12 s, total: 2min 38s Wall time: 2min 56s
pit_dc1 = np.loadtxt(oldpitfile, skiprows=1,usecols=(1))
plt.figure(figsize=[7,5])
plt.subplot(211)
plt.plot(pit_dc1, pit_qp - pit_dc1, 'k,')
plt.plot([0,1], [0,0], 'r--', lw=3)
plt.xlabel("PIT DC1")
plt.ylabel("PIT$_{qp}$ - PIT DC1")
plt.xlim(0,1)
plt.ylim(-0.01, 0.01)
plt.subplot(212)
plt.plot(pit_dc1, pit_scipy - pit_dc1, 'k,')
plt.plot([0,1], [0,0], 'r--', lw=3)
plt.xlabel("PIT DC1")
plt.ylabel("PIT$_{scipy}$ - PIT DC1")
plt.xlim(0,1)
plt.ylim(-0.01, 0.01)
plt.tight_layout()
Conclusion: using qp
causes small discrepancies in the PIT values compared to the reference results (DC1), which would be negligible if using the scipy
integration method. However, the latter is not as efficient as qp, and can make the computation unfeasible if the dataset is larger.