Quality Control Procedures
- The automated QC procedures were identical to those applied to GHCN-Daily (Durre et al. 2010)
- The procedure included two stages
- Stage 1 does temporal checks
- multi-day accumulations
- duplicate data within timeseries
- frequent occurance of values
- world record exceedances
- outlier checks
- temporal consistency checks
- Stage 2 does spatial checks
- checks whether values are consistent with negihbours
Durre, I., Menne, M. J., Gleason, B. E., Houston, T. G., and Vose, R. S.: Comprehensive automated quality assurance of daily sur- face observations, J. Appl. Meteorol. Clim., 49, 1615–1633
(My ideal) Future of climate datasets
- All “observational” datasets are estimates from a statistical model consisting of aleatoric and epistemic uncertainties
- If we stop thinking of observations as immutable facts and instead think of them as data generating models than we can ask more meaningful questions
- E.g. for validation studies, instead of doing a grid cell by grid cell comparison we can calculate the conditional probability of the model output given the observations
- To do this we need observations to be inherently probabilistic (the entire distribution), e.g. Risser et al. 2019
- Artificial intelligence assisted inference can alleviate computational bottlenecks that traditionally made inference algorithms impractical in climate sciences, e.g. Zammit-Mangion et. al 2021 and Lenzi et. al 2023
- As the examples demonstrate even a dataset of extremes is possible with this approach
Risser, M. D., Paciorek, C. J., Wehner, M. F., O’Brien, T. A., & Collins, W. D. (2019). A probabilistic gridded product for daily precipitation extremes over the United States. Climate Dynamics, 53(5), 2517–2538.
Zammit-Mangion, A., Ng, T. L. J., Vu, Q., & Filippone, M. (2021). Deep Compositional Spatial Models. Journal of the American Statistical Association, 0(0), 1–47.
Lenzi, A., Bessac, J., Rudi, J., & Stein, M. L. (2023). Neural networks for parameter estimation in intractable models. Computational Statistics & Data Analysis, 185, 107762.