Performance Optimization
This page provides guidance on optimizing lpjg2nc2’s performance for different environments and dataset sizes.
Parallelization Parameters
lpjg2nc2 uses a two-level parallelization strategy:
Outer Parallelization (
-jor--jobs) Controls how many output patterns are processed in parallel. For example, if you have multiple output types (vegc.out, soilc.out, etc.), this parameter determines how many are processed concurrently.Inner Parallelization (
--inner-jobs) Controls the number of parallel jobs for processing variables within each output pattern. This affects how many variables are processed simultaneously within a single file type.
Memory Usage Considerations
The memory usage of lpjg2nc2 is primarily affected by:
Number of parallel jobs: Higher values for
-jand--inner-jobswill increase memory usage.Chunk size: The
--chunk-sizeparameter controls how many data points are processed at once.
Memory usage can be approximated as:
Recommended Settings
Based on extensive testing, here are recommended settings for different environments:
Environment |
–jobs (-j) |
–inner-jobs |
–chunk-size |
|---|---|---|---|
Desktop (16GB RAM) |
4 |
8 |
25000 |
Workstation (32GB RAM) |
8 |
16 |
50000 |
HPC Node (128GB+ RAM) |
8 |
64 |
75000 |
Performance Benchmarks
The following benchmarks show performance improvements from the vectorized implementation compared to the original point-by-point implementation:
Dataset Size |
Original (vars/s) |
Vectorized (v/s) |
Speedup |
|---|---|---|---|
Small (~1GB) |
~1.30 |
~8.5 |
6.5× |
Medium (~10GB) |
~1.25 |
~25.5 |
20.4× |
Large (~100GB) |
~0.90 |
~18.2 |
20.2× |
Vectorization improvements are particularly significant for larger datasets. The optimizations include:
Replacing point-by-point operations with NumPy’s vectorized operations
Optimizing time index creation with more efficient data structures
Using efficient hashing for coordinate matching
Chunking large arrays to balance memory usage and performance
Profiling the Conversion Process
When running with the verbose flag (-v), lpjg2nc2 provides timing information for different stages of the conversion process:
Processing variables: 100%|█████████████████████████| 8/8 [00:46<00:00, 5.87s/it]
Processed all 8 variables, stored in dataset: ['co2', 'par', 'LW', 'precip', 'airT', 'soilT', 'surfW', 'deepW']
Creating xarray dataset with 1D coordinates...
Creating xarray dataset with complete grid expansion...
Expanding from 24521 processed points to 88838 total grid points...
Building optimized grid mapping with efficient hashing...
Created mapping in 0.75 seconds
Found 24501 matching points between processed and full grid
Expanding variables to full grid...
Preserved all variables in dataset: ['co2', 'par', 'LW', 'precip', 'airT', 'soilT', 'surfW', 'deepW']
Coordinate processing completed in 50.05 seconds
Grid expansion took 1.19 seconds
Final grid has 88838 points with 24501 data points
Expansion speed: 74525 points/sec
⏱️ Total processing time: 89.19 seconds (1.49 minutes)
This information can help identify bottlenecks and optimize settings for your specific dataset.