Explanation of Model Performance Scores
The maps indicating measures of model performance are based on seasonal (3-month averaged) anomalies. Anomalies are departure from average seasonal conditions. Model anomalies are with respect to model climatology and observed anomalies are with respect to observed climatology. The model simulations of the climate have been forced at the lower boundary by globally observed sea surface temperatures (SSTs). Maps are currently available for anomaly correlations and R.O.C. (Relative Operating Characteristics) Scores.
A perfect correlation between observed and simulated variability is 1.0. A perfect correlation is effectively impossible to obtain, however, since Nature has a component of weather/noise that can not be separated from the boundary(SST)-forced climate signal suggested by the ensemble-averaged model results. For regions where the changes in SST do influence changes in the climate and where the model simulates the physics of the climate variability well, a high (e.g. statistically significant) correlation generally exists. High [positive] correlations indicate that the model simulated anomalies are mimicing the observed anomalies, with the right sign (positive or negative departures from average) and with the correct relative amplitudes. The magnitude of the simulated variability need not be equal to the observed variability. Anomaly correlations do not always give a true sense of a model's potential; for example, simulated anomalies that are similar to observed in being near average, but with the wrong sign, degrade the correlation. Also, a model may do well under certain circumstance, such as during wet years, but not in others, and this information will be lost in a measure such as anomaly correlation.
For a system that has no skill, the warnings and events are by definition independent occurrences, and so the probability that a warning was provided is not contingent upon an event occurring or not occurring. In other words, the probability that a warning was provided is unrelated to the outcome. Therefore, when there is no skill the hit and false-alarm rates are both equal to the prior probability of a warning being provided (Murphy and Winkler, 1987). This equality occurs when warnings are issued randomly, and when perpetual warnings or no-warnings are provided. When the forecast system has some skill, the hit rate exceeds the false-alarm rate; negative skill is indicated when the false-alarm rate exceeds the hit rate.
For probabilistic forecasts, a warning can be issued when the forecast probability for a pre-defined event exceeds some threshold (Mason, 1979). Different warning thresholds can be used for the pre-defined event, and a set of hit and false-alarm rates can then be determined. This set of hit rates is plotted against the corresponding false-alarm rates to generate the ROC curve. While there are a number of indices for summarizing the performance (Mason, 1982), the area under the curve is the most commonly used (and simplest to calculate), and has become known as the ROC score (Mason and Graham, 1999).
In general, for skilful forecast systems, the ROC curve bends toward the top left, where hit rates are larger than false-alarm rates, and the total area under the curve is then greater than 0.5. Where the curve lies close to the diagonal, the forecast system does not provide any useful information, and the area beneath the curve is approximately 0.5. If the curve lies below the line, negative skill is indicated.
References:
Doswell, C. A., R. Davies-Jones, and D. L. Keller, 1990: On summary measures of skill in rare event forecasting based on contingency tables. Weather and Forecasting, 5, 576-585.
Harvey, L. O., K. R. Hammond, C. M. Lusk, and E. F. Mross, 1992: The application of signal detection theory to weather forecasting behavior. Monthly Weather Review,120, 863-883.
Mason, I., 1979: On reducing probability forecasts to yes/no forecasts. Monthly Weather Review, 107, 207-211.
Mason, I., 1982: A model for assessment of weather forecasts. Australian Meteorological Magazine, 30, 291-303.
Mason, S. J. and N. E. Graham, 1999: Conditional probabilities, relative operating characteristics and relative operating levels. Weather and Forecasting, in press.
Murphy, A. H. and R. L. Winkler, 1987: A general framework for forecast verification. Monthly Weather Review, 115, 1330-1338.
Olsen, R. H., 1965: On the use of Bayes theorem in estimating false alarm rates. Monthly Weather Review, 93, 557-558.
Swets, J. A., 1973: The relative operating characteristic in psychology. Science, 182, 990-1000.
Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, 467 pp.