NaN Analysis Module
The NaN analysis module provides utilities for analyzing the sparsity of data in NetCDF files by counting NaN (Not a Number) values. This module was added to help better understand data sparsity in global datasets, especially for land-only data on global grids.
Analysis Functions
analyze_netcdf(file_path, verbose=True, return_stats=False)The main function for analyzing NetCDF files. It counts valid data points vs. NaN values and calculates sparsity statistics.
This function processes a NetCDF file and computes:
Total number of data points
Number of valid (non-NaN) data points
Number of NaN values
Percentage of valid data and NaN values
Variable-specific statistics
Reporting Functions
print_short_summary(stats)Provides a concise summary of NaN statistics, including:
Percentage of valid data points
Percentage of NaN values
Warning indicators for very sparse datasets (>95% NaN)
For land-only data on global grids, this analysis is particularly valuable, as such datasets typically contain many NaN values over ocean grid cells. In one analyzed dataset, the analysis revealed 99.82% NaN values, which is expected for land-only data on a global grid.