Workflow Guide¶
The repository is organized as a staged pipeline. Each stage writes reusable products into data/ and diagnostic plots into images/, so later stages can be rerun without always restarting from the ROOT inputs.
Detector And Reconstruction Stages¶
src/physics/calibration/: correction, calibration, discrimination, reconstruction, smearing.src/physics/detector/preselection/: production-level summaries, efficiencies, and clustering studies.src/physics/detector/pds/: optical-flash studies, adjacent-flash diagnostics, and flash-matching efficiency.src/physics/detector/tpc/: adjacent-cluster studies and electron/energy-resolution scans.src/physics/detector/vertex/: vertex smearing, fiducial scans, and reconstruction performance.
The usual top-level entry point for the full detector chain is:
python3 src/pipelines/run_all.py --config hd_1x2x6_centralAPA --name marley
If calibration products already exist, start from the downstream detector-analysis chain with:
python3 src/pipelines/run_analysis.py --config hd_1x2x6_centralAPA --name marley
Truth And Weighting Stages¶
The src/physics/truth/ scripts prepare the signal and background ingredients used by the later solar analyses, including oscillation-grid processing, nadir weighting, and background surface or external PDFs. Entry point: src/pipelines/run_truth.py.
Solar Analysis Stages¶
The current analysis layer lives in src/physics/ (per-domain subdirectories) and is orchestrated by src/pipelines/run_sensitivity.py.
Important scripts include:
src/physics/signal/01_fiducialize.py: builds fiducial scan products.src/physics/signal/02_best_fiducial.py: selects optimized fiducials and writes best-fiducial summaries.src/physics/signal/fiducialization_plot.py: renders best/no-fiducial significance plots from fiducial scan products.src/physics/sensitivity/05_best_sigmas.py: records the best significance curves for downstream plots.src/pipelines/run_sensitivity.py: orchestrates the full DayNight, HEP, and Sensitivity workflow.src/physics/daynight/01_daynight.py,exposure_plot.py,significance_plot.py: Day-Night spectrum, exposure, and significance products.src/physics/hep/01_hep.py,exposure_plot.py,significance_plot.py,significance_comparison.py: HEP spectrum, exposure, and significance products.src/physics/sensitivity/01_background_template.py,02_signal_template.py,06_significance.py,contour_plot.py: signal/background templates, oscillation fits, and contour plots.
HEP Profile-Likelihood Updates¶
Full mathematical derivations of all significance methods, the profile-likelihood formulation, adaptive rebinning, Barlow-Beeston masking, PL smoothing pipeline, and spike detection are in docs/hep_likelihood_derivation.tex.
Three improvements to the HEP profile-likelihood significance computation:
Monotonicity enforcement and continuous smoothing (src/physics/hep/01_hep.py): the post-scan running maximum (np.maximum.accumulate) has been replaced by a two-step pipeline. First, scipy.ndimage.gaussian_filter1d convolves the raw PL significance array with a Gaussian kernel (σ = 6 exposure-grid index units by default, tunable via _PL_SMOOTH_SIGMA). This is the standard HEP technique for smoothing discrete numerical curves, equivalent to ROOT’s TH1::Smooth, and removes solver oscillations at low signal-to-background ratios. Second, sklearn.isotonic.IsotonicRegression (PAVA) enforces strict monotonicity by finding the non-decreasing sequence with minimum L2 distance from the smoothed values. At low signal-to-background ratios this pipeline converts a stepped curve with few unique significance values to a fully continuous one.
±1σ expected discovery bands (src/physics/hep/01_hep.py): the PL error band uses signal normalization variation — signal events are scaled by (1 ± σ_s) where σ_s = signal_uncertainty (reconstruction efficiency systematic, passed via --signal_uncertainty, typically 0.1). The background is never shifted, so the β̂_null nuisance parameter is unaffected and both bands collapse symmetrically when signal is negligible. Bands converge to the nominal line when signal is negligible (±σ_s × 0 = 0 change) and remain narrow and symmetric otherwise. The PL computation uses the original fine binning throughout — no adaptive rebin — since PL is optimal at the finest available resolution and the likelihood ratio naturally suppresses bins with negligible signal.
Barlow-Beeston MC mask and smoothing clip (src/physics/hep/01_hep.py): two numerical robustness measures protect the PL LLR from divergence. First, a static per-bin mask (pl_bin_mask) zeros signal and background in bins whose raw background MC count is below min_mc_per_bin (configurable). This is the Barlow-Beeston lite approach used by ROOT HistFactory: bins with no MC support produce unreliable LLR terms that grow super-linearly with exposure, so they are excluded rather than floored. Second, histogram rates after Gaussian smoothing are clipped to ≥ 0 before the PL computation. Gaussian smoothing at distribution edges can produce small negative rate values; after the min_expected floor these appear as effectively signal-only bins whose LLR grows as signal × log(signal/floor) — diverging at large exposure. Clipping removes this artifact without affecting the unsmoothed path.
Spike-robust best-cut selection (src/physics/sensitivity/05_best_sigmas.py): PL curves that pass spike detection are excluded from the max(significance) cut selection. A curve is flagged if any consecutive step in RawPreIsotonicProfileLikelihood or PreIsotonicProfileLikelihood exceeds --max_pl_jump σ (default 1.0). Spiked cuts are saved separately to {config}_{name}_highest_spiked_HEP.pkl for diagnostic review. If all cuts for a given (config, name, energy) are spiked, the filter is bypassed with a warning to preserve output. --max_pl_jump 0 disables filtering entirely (backward-compatible).
Fastest-sigma cut selection (src/physics/sensitivity/05_best_sigmas.py): when the significance reference is ProfileLikelihood, the fastest_sigma2 and fastest_sigma3 cut selection now uses PLSigma2 / PLSigma3 (the exposure at which the PL significance crosses 2σ / 3σ) instead of the Asimov-based Sigma2 / Sigma3. Backward-compatible fallback to Asimov columns applies for output files that pre-date the PL crossing columns.
Orchestrator Flags¶
src/pipelines/run_sensitivity.py exposes boolean flags to skip stages without re-running the full pipeline:
Flag |
Default |
Effect when disabled |
|---|---|---|
|
True |
Skip all computation; run only plot-producing macros |
|
True |
Skip |
|
True |
Skip |
|
True |
Skip |
Flag precedence: --no-computation → --no-significance → --no-fiducialization → --no-rebin. Presentation generation always runs regardless of computation flags. sensitivity/05_best_sigmas.py runs when --computation is enabled, independently of --significance.
Component Policy In Analysis Orchestration¶
The high-level orchestrator src/pipelines/run_sensitivity.py applies a component-selection policy before launching per-sample analysis jobs.
The policy is configured in analysis/config.json (or analysis/backgrounds.json) under BACKGROUND_SAMPLES:
ANALYSES: per-analysis background component lists (DAYNIGHT,HEP,SENSITIVITY).ESSENTIAL: map of components that must be present (true) vs optional (false).
Runtime behavior:
Non-essential components not listed in the selected analysis component list are not processed.
Essential components that are missing on disk produce warnings.
Optional components that are missing are skipped with warnings.
This avoids failures when optional backgrounds (for example radiological) are unavailable for a given detector configuration while still protecting required components.