Steefan Contractor

Contractor, S., Donat, M. G., Alexander, L. V., Ziese, M., Meyer-Christoffer, A., Schneider, U., Rustemeier, E., Becker, A., Durre, I., and Vose, R. S.: Rainfall Estimates on a Gridded Network (REGEN) – a global land-based gridded dataset of daily precipitation from 1950 to 2016, Hydrol. Earth Syst. Sci., 24, 919–943

- Daily estimates over 1950 - 2016
- Gridded 1 degree latitude x 1 degree longitude resolution
- Global land coverage

- Purpose built for climate studies with a long temporal record and consistent global spatial analysis
- Based on a large in situ archive from combining GPCC with GHCN-Daily among others
- Includes various statistical model error estimates
- Also includes guidance for users less aware of issues with in situ based precipitation observations

- Total stations: 135,178
- Around 50K stations for each day
- Min stations per day: 35,460
- Max stations per day: 56,190

- GPCC stations
- GHCN-Daily stations
- Collected during GEWEX workshops

- Lat + Lon match and
- World Met. Org. (WMO) ID match or missing

Or

- Coordinates within 1º of each other and
- WMO ID match or missing and
- 0.99 correlation between timeseries with 365 days of data of which at least 10d with >1mm precip

- The automated QC procedures were identical to those applied to GHCN-Daily (Durre et al. 2010)
- The procedure included two stages
- Stage 1 does temporal checks
- multi-day accumulations
- duplicate data within timeseries
- frequent occurance of values
- world record exceedances
- outlier checks
- temporal consistency checks

- Stage 2 does spatial checks
- checks whether values are consistent with negihbours

Durre, I., Menne, M. J., Gleason, B. E., Houston, T. G., and Vose, R. S.: Comprehensive automated quality assurance of daily sur- face observations, J. Appl. Meteorol. Clim., 49, 1615–1633

- Ordinary Block Kirging
- Best Linear Unbiassed Estimator (BLUE)
- Linear because the estimate is a weighted average of surrounding stations

$$\mathbf{Z}^*(s_0) = \sum_{i=0}^{N} λ_i\mathbf{Z}(S_i)$$

- Best because we use the spatial structure (covariance) to determine the value of the weights
- Unbiassed because the weights are constrained to add up to 1 and so the result cannot be biassed to any one station

$$\sum_{i=1}^N λ = 1$$

- Ordinary Kriging assumes second order stationarity (mean and variance constant across domain)

$$\mathbf{Z}^*(s_0) = μ + ε(s_0)$$

- Block implies that the algorithm produces gridded area-average estimates as opposed to point estimates

- The All stations based dataset interpolates all underlying stations
- The Long Term version interpolates only stations with 40 complete years of data
- A year is complete if all 12 months had at least 70% non-missing days

- The uncertainty info includes Kriging Error (KE): a weighted average of modeled variance (between interpolation location and stations) and depends solely on the spatial distribution of stations and grid size, and
- Yamamoto coefficient of variation (CV) (Yamamoto et al. 2000): weighted (by Kriging weights) average error between the estimate and the station values
- Number of stations used for each grid estimate is also included

Yamamoto, J. K.: An Alternative Measure of the Reliability of Ordinary Kriging Estimates, Math. Geol., 32, 489–509

A grid cell was left unmasked if:

- It contained 60% of days in every decade with at least 1 station, and
- both the KE and CV were under the 95th percentile (spatial distribution) of the temporally averaged (over 1950 - 2016) KE and CV respectively

Mean difference

SD of difference

Temporal correlation

Trends in annual precipitation (1950 - 2016) (mm/yr)

Contractor, S., Donat, M. G., & Alexander, L. V. (2021). Changes in Observed Daily Precipitation over Global Land Areas since 1950. Journal of Climate, 34(1), 3–19.

Wet-day frequency changes between 1950-1983 and 1984-2016 (%)

Mean precipitation intensity changes between 1950-1983 and 1984-2016 (%)

- Spatially, changes in precipitation seem complex, even stochastic at first
- But a clear signal of positive precipitation changes in the high quantiles consitent with thermodynamic expectations is apparent
- This signal dissappears for the most extreme precipitation again

Relative difference in area showing postive changes vs area showing negative changes

- Mean changes in frequency and intensity are aligned in only around 1/3
^{rd}of the grids - Extreme changes in frequency and intensity are aligned in almost 80% of areas globally

- All “observational” datasets are estimates from a statistical model consisting of aleatoric and epistemic uncertainties
- If we stop thinking of observations as immutable facts and instead think of them as data generating models than we can ask more meaningful questions
- E.g. for validation studies, instead of doing a grid cell by grid cell comparison we can calculate the conditional probability of the model output given the observations
- To do this we need observations to be inherently probabilistic (the entire distribution), e.g. Risser et al. 2019
- Artificial intelligence assisted inference can alleviate computational bottlenecks that traditionally made inference algorithms impractical in climate sciences, e.g. Zammit-Mangion et. al 2021 and Lenzi et. al 2023
- As the examples demonstrate even a dataset of extremes is possible with this approach

Risser, M. D., Paciorek, C. J., Wehner, M. F., O’Brien, T. A., & Collins, W. D. (2019). A probabilistic gridded product for daily precipitation extremes over the United States. Climate Dynamics, 53(5), 2517–2538.

Zammit-Mangion, A., Ng, T. L. J., Vu, Q., & Filippone, M. (2021). Deep Compositional Spatial Models. Journal of the American Statistical Association, 0(0), 1–47.

Lenzi, A., Bessac, J., Rudi, J., & Stein, M. L. (2023). Neural networks for parameter estimation in intractable models. Computational Statistics & Data Analysis, 185, 107762.