GCM-Filters: A Python Package for Spatial Filtering of Gridded Data from Ocean and Climate Models


Nora Loose1, Ryan Abernathey2, Ian Grooms1, Julius Busecke2, Arthur Guillaumin3, Elizabeth Yankovsky3, Gustavo Marques4, Jacob Steinberg5, Andrew Slavin Ross3, Hemant Khatri6, Scott Bachman4, Laure Zanna3, and Paige Martin2,7
1Department of Applied Mathematics, University of Colorado Boulder, Boulder, CO, USA
2Lamont-Doherty Earth Observatory, Columbia University, New York, NY, USA
3Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
4Climate and Global Dynamics Division, National Center for Atmospheric Research, Boulder, CO, USA
5Woods Hole Oceanographic Institution, Woods Hole, MA, USA
6Earth, Ocean and Ecological Sciences, University of Liverpool, UK
7Australian National University, Canberra, Australia

Abstract


Applying a spatial filter to gridded data from models or observations is a common analysis method in the Earth Sciences, for example to study oceanic and atmospheric motions at different spatial scales or to develop subgrid-scale parameterizations for ocean models. The spatial filters that are included in existing python packages, such as SciPy's multidimensional Gaussian filter, are typically specialized for image processing. For data from General Circulation Models (GCMs), image processing tools are however of limited use because these tools assume a regular and rectangular Cartesian grid and employ simple boundary conditions. In contrast, GCMs come on irregular curvilinear grids, and continental boundaries in GCMs need more careful treatment than currently supported. 

Here we introduce a new python package, GCM-Filters, which is specifically designed to filter GCM data. The GCM-Filters algorithm applies a discrete Laplacian to smooth a field through an iterative process that resembles diffusion. The discrete Laplacian takes into account the varying grid-cell geometry of the complex GCM grids and uses a no-flux boundary condition, mimicking how diffusion is internally implemented in GCMs and ensuring conservation of the integral. Conservation of the integral is a desirable filter property for many physical quantities, for example energy or ocean salinity. The user can employ GCM-Filters on either CPUs or GPUs, with NumPy or CuPy input data. Moreover, GCM-Filters leverages Dask and Xarray to support filtering of larger-than-memory datasets and computational flexibility.

In this talk, we will present examples of how to use GCM-Filters in a range of oceanographic applications. Ongoing development of the GCM-Filters package is hosted on GitHub at https://github.com/ocean-eddy-cpt/gcm-filters, with community contributions welcome.


12th Symposium on Advances in Modeling and Analysis Using Python, AMS Meeting 2022
Session: Python Tools in the Atmospheric and Oceanic Sciences. Part I
Date: January 25, 2022

Spatial filters in the geosciences: What for?


Useful for:
  • Studying oceanic and atmospheric motions at different spatial scales
  • Identifying small scale part of data (= the residual), which is often unresolved in models and needs to be parameterized

Why not use existing tools in the python stack?


In [4]:
from scipy.ndimage import uniform_filter, gaussian_filter
blurred_cat = uniform_filter(octocat)

Existing python tools are specialized for image processing and Cartesian grids

General circulation model (GCM) output comes on complex grids


  • Grid cell geometry is spatially varying
  • Continental boundaries
  • Complex grid topologies require non-trivial exchanges at grid "edges" (e.g., periodic, tripolar)

Continental boundaries need proper handling


Conventional filter propagates fill value from land into ocean interior

Complex grid topology requires non-trivial exchanges


Conventional filter ignores exchanges across tripole fold

Introducing GCM-Filters


Algorithm: \begin{align*} \bar{d}_0 & = d \\ \bar{d}_k & = \bar{d}_{k-1} + \frac{1}{s_k} \Delta \bar{d}_{k-1} \end{align*}
  • based on discrete Laplacian (Grooms et al., 2021)
  • mimics how diffusion is internally implemented in GCMs

Additional goals of GCM-Filters:
  • Support filtering of larger-than-memory datasets via parallelized & delayed execution
  • Enable filter analysis on both CPUs and GPUs

GCM-Filters in action


In [6]:
import xarray as xr
import gcm_filters

Available grid types¶

In [9]:
list(gcm_filters.GridType)
Out[9]:
[<GridType.REGULAR: 1>,
 <GridType.REGULAR_AREA_WEIGHTED: 2>,
 <GridType.REGULAR_WITH_LAND: 3>,
 <GridType.REGULAR_WITH_LAND_AREA_WEIGHTED: 4>,
 <GridType.IRREGULAR_WITH_LAND: 5>,
 <GridType.MOM5U: 6>,
 <GridType.MOM5T: 7>,
 <GridType.TRIPOLAR_REGULAR_WITH_LAND_AREA_WEIGHTED: 8>,
 <GridType.TRIPOLAR_POP_WITH_LAND: 9>,
 <GridType.VECTOR_C_GRID: 10>]

Required grid variables¶

In [10]:
gcm_filters.required_grid_vars(gcm_filters.GridType.TRIPOLAR_POP_WITH_LAND)
Out[10]:
['wet_mask', 'dxe', 'dye', 'dxn', 'dyn', 'tarea']

Creating the filter object


In [16]:
specs = {
    'filter_scale': 100000,
    'filter_shape': gcm_filters.FilterShape.GAUSSIAN,
}
In [17]:
filter = gcm_filters.Filter(
    **specs,
    grid_type=gcm_filters.GridType.TRIPOLAR_POP_WITH_LAND,
    grid_vars={
        'wet_mask': wet_mask, 
        'dxe': dxe, 'dye': dye, 'dxn': dxn, 'dyn': dyn, 
        'tarea': area
    },
    dx_min=dxe.where(wet_mask).min().values,
)
filter
Out[17]:
Filter(filter_scale=100000, dx_min=array(2245.78304344), filter_shape=<FilterShape.GAUSSIAN: 1>, transition_width=3.141592653589793, ndim=2, n_steps=49, grid_type=<GridType.TRIPOLAR_POP_WITH_LAND: 9>)

Filtering on CPU


In [18]:
KE.data  # 1 snapshot of kinetic energy 
Out[18]:
Array Chunk
Bytes 4.29 GB 69.12 MB
Shape (1, 62, 2400, 3600) (1, 1, 2400, 3600)
Count 750 Tasks 62 Chunks
Type float64 numpy.ndarray
1 1 3600 2400 62
In [19]:
KE_filtered = filter.apply(KE, dims=['nlat', 'nlon'])  # apply filter lazily: computation is deferred
KE_filtered.data
Out[19]:
Array Chunk
Bytes 4.29 GB 69.12 MB
Shape (1, 62, 2400, 3600) (1, 1, 2400, 3600)
Count 951 Tasks 62 Chunks
Type float64 numpy.ndarray
1 1 3600 2400 62

Filtering on GPU


In [20]:
import cupy as cp
In [29]:
KE_gpu.data
Out[29]:
Array Chunk
Bytes 4.29 GB 69.12 MB
Shape (1, 62, 2400, 3600) (1, 1, 2400, 3600)
Count 812 Tasks 62 Chunks
Type float64 cupy.ndarray
1 1 3600 2400 62
In [30]:
KE_gpu_filtered = filter_gpu.apply(KE_gpu, dims=['nlat', 'nlon'])  # apply filter lazily
KE_gpu_filtered.data
Out[30]:
Array Chunk
Bytes 4.29 GB 69.12 MB
Shape (1, 62, 2400, 3600) (1, 1, 2400, 3600)
Count 1025 Tasks 62 Chunks
Type float64 cupy.ndarray
1 1 3600 2400 62

Triggering filter computation for surface level


In [31]:
%time computed_cpu = KE_filtered.isel(z_t=0).compute()  # trigger filter computation on CPU
CPU times: user 17.4 s, sys: 11.1 s, total: 28.6 s
Wall time: 29.3 s
In [32]:
%time computed_gpu = KE_gpu_filtered.isel(z_t=0).compute()  # trigger filter computation on GPU
CPU times: user 1.81 s, sys: 887 ms, total: 2.7 s
Wall time: 2.98 s

Computing eddy kinetic energy


Define mean kinetic energy (MKE) and eddy kinetic energy (EKE): \begin{align*} \text{MKE} = \frac{1}{2} ( \bar{u}^2 + \bar{v}^2), \qquad\qquad \text{EKE} = \underbrace{\frac{1}{2} \overline{ (u^2 + v^2)}}_\text{filtered KE} - \underbrace{\frac{1}{2} ( \bar{u}^2 + \bar{v}^2)}_\text{MKE}. \end{align*}

In [38]:
u_filtered = filter.apply(u, dims=['nlat', 'nlon'])
v_filtered = filter.apply(v, dims=['nlat', 'nlon'])
In [39]:
MKE = 0.5 * (u_filtered**2 + v_filtered**2)
EKE = KE_filtered - MKE

Summary GCM-Filters


  • Specifically designed for gridded data produced by GCMs of ocean and climate
  • Geometry of complex GCM grids is respected by GCM-Filters Laplacians
  • Parallel, out-of-core filter analysis on both CPUs and GPUs

Get Involved!¶

  • Try it out:
    conda install -c conda-forge gcm_filters
    
  • Documentation: gcm-filters.readthedocs.io/
  • File issues and make pull requests on GitHub: https://github.com/ocean-eddy-cpt/gcm-filters

Tests codecov Documentation Status Conda Version PyPI version Downloads status