Machine Learning Safety with Applications to the Climate Sciences


Rapid progress in machine learning (ML) has engendered numerous applications across the sciences. Deployment of modern ML systems have increased our ability to validate and automate the scientific process, broadening the space for discovery. However, these advances come with their own associated challenges and risks. For example, optimization schemes incorporated during training are necessarily opaque, leading one to question if the algorithm has learned features of nature or artifacts of the data. A recent trend emerging from these types of issues is the desire to make ML systems more explainable and interpretable. In this talk, we develop some robust and transparent unsupervised machine learning methods for clustering the Earth’s climate system. Leveraging the discrete wavelet transform, we analyze the effect sample resolution of data has on clustering. This allows us to produce an ensemble of clusterings across many spatial temporal resolutions, as opposed to a single clustering. Using information theory, we discover a small subcollection of this ensemble that span the majority of the variance observed. This subcollection of key clusterings is then combined to produce a single fuzzy clustering along with a confidence metric that assess the uncertainty of the clustering at different points in space.

Center for Nonlinear Studies Colloquium
Los Alamos, NM