The upper-right bins (predicted 0.7 to 0.9) sit visibly below the diagonal. Every one of those "misses" was a verified severe storm (hail, downburst, damaging wind) that did not produce a tornado. That gap is the radar-only ceiling on tornado-vs-severe discrimination; the next reduction has to come from new sensing (GLM lightning jump, ProbSevere v3, dual-pol), not threshold tuning.
| EF | n | POD | FAR | Mean lead (min) | Brier |
|---|
Verified retrospectively on the 2015-2024 Atlantic + East-Pacific HURDAT2 best-track corpus. Model: logistic regression on trajectory features only (current intensity, prior 12 / 24 h intensity change, latitude, motion, calendar position). Trained on 1990-2014 storms (857 storms), held-out on 2015-2024 (436 storms, 12,830 observation points, 653 confirmed RI events).
Reference: NHC SHIPS-RII operational baseline, DeMaria et al 2021 NHC RI verification report (AUC ~0.78-0.83, POD ~0.45, FAR ~0.70). Squall's gain over the operational baseline is driven by perfect-history retrospective evaluation; real-time performance against ATCF advisories is the next verification milestone. Inner-core lightning (GLM), SST, and shear features are not yet in the fit; we expect those to materially improve real-time POD.