| Title: | Tidy Empirical Orthogonal Functions and Spatial Downscaling |
|---|---|
| Description: | An R package for conducting empirical orthogonal function (EOF) analysis in the tidyverse framework. Functions to isolate modes of variability from spatiotemporal data, run various diagnostics, and use these patterns for spatial downscaling via canonical correlation analysis. |
| Authors: | Nicolas Gauthier [aut, cre] (ORCID: <https://orcid.org/0000-0002-2225-5827>) |
| Maintainer: | Nicolas Gauthier <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-28 06:58:22 UTC |
| Source: | https://github.com/nick-gauthier/tidyeof |
Extract subset of PCs from a patterns object
## S3 method for class 'patterns' x[i, ...]## S3 method for class 'patterns' x[i, ...]
x |
A patterns object |
i |
Index or names of PCs to keep |
... |
Additional arguments (unused) |
A patterns object with only selected PCs
Calculates area weights for spatial data using st_area(). Works uniformly for raster grids, irregular geometries, and different coordinate systems.
area_weights(dat)area_weights(dat)
dat |
A stars object with spatial dimensions |
Numeric weights (sqrt of normalized areas)
Performs joint PCA on multiple datasets that share a spatial domain. Each dataset is anomalized with its own climatology, then concatenated along time for joint PCA. The resulting shared spatial patterns get source-specific amplitudes and climatologies, enabling CCA coupling and cross-source prediction.
common_patterns( datasets, k = 4, scale = TRUE, rotate = FALSE, monthly = FALSE, weight = TRUE, irlba_threshold = 5e+05 )common_patterns( datasets, k = 4, scale = TRUE, rotate = FALSE, monthly = FALSE, weight = TRUE, irlba_threshold = 5e+05 )
datasets |
Named list of stars objects sharing the same spatial grid. Names become the source identifiers used for extraction. |
k |
Number of EOF modes to retain |
scale |
Logical, whether to scale anomalies by standard deviation (default TRUE) |
rotate |
Logical, whether to apply varimax rotation (default FALSE) |
monthly |
Logical, whether to use monthly climatology (default FALSE) |
weight |
Logical, whether to apply area weighting (default TRUE) |
irlba_threshold |
Minimum data elements to trigger IRLBA (default 500000) |
A 'common_patterns' S3 object. Source-specific patterns are extracted with '$' or '[[' using source names (e.g., 'cpat$era'). Each extracted element is a standard 'patterns' object with shared EOFs but source-specific climatology and amplitudes.
## Not run: cpat <- common_patterns( list(era = era_coarse, phyda = phyda_coarse), k = 11, scale = TRUE ) # Extract source-specific patterns cpat$era # patterns object with ERA climatology + amplitudes cpat$phyda # patterns object with PHYDA climatology + amplitudes # Use with existing couple/predict workflow fine_pat <- patterns(era_fine, k = 13) coupled <- couple(cpat$era, fine_pat, k = 9) predict(coupled, era_new) # Cross-source prediction predict(coupled, phyda_new, predictor_patterns = cpat$phyda) ## End(Not run)## Not run: cpat <- common_patterns( list(era = era_coarse, phyda = phyda_coarse), k = 11, scale = TRUE ) # Extract source-specific patterns cpat$era # patterns object with ERA climatology + amplitudes cpat$phyda # patterns object with PHYDA climatology + amplitudes # Use with existing couple/predict workflow fine_pat <- patterns(era_fine, k = 13) coupled <- couple(cpat$era, fine_pat, k = 9) predict(coupled, era_new) # Cross-source prediction predict(coupled, phyda_new, predictor_patterns = cpat$phyda) ## End(Not run)
This function couples predictor and response patterns using Canonical Correlation Analysis (CCA) as the primary method. CCA finds linear combinations of predictor and response patterns that maximize correlation between them.
couple( predictor_patterns, response_patterns, k = NULL, method = "cca", center = FALSE, validate = TRUE )couple( predictor_patterns, response_patterns, k = NULL, method = "cca", center = FALSE, validate = TRUE )
predictor_patterns |
A patterns object containing predictor patterns (e.g., from patterns()) |
response_patterns |
A patterns object containing response patterns (e.g., from patterns()) |
k |
Number of CCA modes to retain. If NULL, uses min(ncol(predictor), ncol(response)) |
method |
Coupling method. Currently only "cca" is supported |
center |
Logical, whether to center the data before CCA (default: FALSE) |
validate |
Logical, whether to validate input patterns compatibility |
A coupled_patterns object containing:
cca |
The CCA results from cancor() |
predictor_patterns |
The original predictor patterns |
response_patterns |
The original response patterns |
k |
Number of CCA modes retained |
method |
Coupling method used |
## Not run: # Get patterns from your data pred_patterns <- patterns(predictor_data, k = 5) resp_patterns <- patterns(response_data, k = 5) # Couple the patterns coupled <- couple(pred_patterns, resp_patterns, k = 3) # Make predictions predictions <- predict(coupled, new_predictor_data) ## End(Not run)## Not run: # Get patterns from your data pred_patterns <- patterns(predictor_data, k = 5) resp_patterns <- patterns(response_data, k = 5) # Couple the patterns coupled <- couple(pred_patterns, resp_patterns, k = 3) # Make predictions predictions <- predict(coupled, new_predictor_data) ## End(Not run)
Tests whether the k-th eigenvalue is significantly different from noise using a modified Rule N approach based on Tracy-Widom distribution.
eigen_test(lambdas, k, M, n, p = 0.05)eigen_test(lambdas, k, M, n, p = 0.05)
lambdas |
Vector of eigenvalues from PCA |
k |
Index of eigenvalue to test |
M |
Number of spatial points (grid cells) |
n |
Number of time steps |
p |
Significance level (default 0.05) |
Based on the gamma approximation to the Tracy-Widom distribution described in Cheng & Wallace (1993) and implemented following Overland & Preisendorfer (1982). Constants (shape = 46.4, scale factor = 0.186, location = 9.85) derive from fitting the gamma CDF to the Tracy-Widom Type 1 distribution.
Logical, TRUE if eigenvalue is significant at level p
Calculate anomalies from a climatological mean
get_anomalies(dat, clim = NULL, scale = FALSE, monthly = FALSE)get_anomalies(dat, clim = NULL, scale = FALSE, monthly = FALSE)
dat |
A stars object with dimensions (x, y, time) |
clim |
Optional climatology from get_climatology(). If NULL, computed internally |
scale |
Logical. If TRUE, divide by standard deviation |
monthly |
Logical. If TRUE, compute monthly anomalies |
A stars object with anomalies
Extract the canonical correlations and related statistics
get_canonical_correlations(object, k = NULL)get_canonical_correlations(object, k = NULL)
object |
A coupled_patterns object |
k |
Number of modes to return (default: all available) |
Data frame with canonical correlation statistics
Computes the spatial patterns corresponding to each canonical mode by taking linear combinations of the original EOFs weighted by CCA coefficients. These are the spatial patterns that, when projected onto the data, yield the canonical variates.
get_canonical_patterns(object, type = c("predictor", "response"), k = NULL)get_canonical_patterns(object, type = c("predictor", "response"), k = NULL)
object |
A coupled_patterns object |
type |
Either "predictor" or "response" |
k |
Number of canonical modes to extract (default: all available) |
A stars object with canonical spatial patterns (dimension "CV" instead of "PC")
## Not run: coupled <- couple(pred_patterns, resp_patterns, k = 3) # Get canonical patterns for response side resp_canonical <- get_canonical_patterns(coupled, type = "response") plot(resp_canonical) # Compare to original EOFs plot(coupled$response_patterns$eofs) ## End(Not run)## Not run: coupled <- couple(pred_patterns, resp_patterns, k = 3) # Get canonical patterns for response side resp_canonical <- get_canonical_patterns(coupled, type = "response") plot(resp_canonical) # Compare to original EOFs plot(coupled$response_patterns$eofs) ## End(Not run)
Extract canonical variables from either predictor or response patterns
get_canonical_variables( object, data, type = c("predictor", "response"), k = NULL )get_canonical_variables( object, data, type = c("predictor", "response"), k = NULL )
object |
A coupled_patterns object |
data |
Original data (patterns object or amplitudes tibble) |
type |
Either "predictor" or "response" |
k |
Number of canonical modes to extract |
Tibble with canonical variables
Computes climatological statistics (mean and standard deviation) for a spatial field, either annually or monthly. Preserves spatial dimensions and units from the input data.
get_climatology(dat, monthly = FALSE)get_climatology(dat, monthly = FALSE)
dat |
A stars object containing a spatial field with dimensions (x, y, time) |
monthly |
Logical. If TRUE, computes monthly climatology. If FALSE (default), computes statistics over the entire period. |
A list with two stars objects:
mean |
Climatological mean with original spatial dimensions and units |
sd |
Climatological standard deviation with same structure |
# Create sample data library(stars) times <- seq(as.Date("2000-01-01"), as.Date("2002-12-31"), by = "month") x <- seq(0, 1, length.out = 10) y <- seq(0, 1, length.out = 10) dat <- stars::st_as_stars(array(rnorm(10*10*36), c(10, 10, 36))) %>% st_set_dimensions(1, values = x, name = "x") %>% st_set_dimensions(2, values = y, name = "y") %>% st_set_dimensions(3, values = times, name = "time") # Calculate annual climatology clim <- get_climatology(dat) # Calculate monthly climatology monthly_clim <- get_climatology(dat, monthly = TRUE)# Create sample data library(stars) times <- seq(as.Date("2000-01-01"), as.Date("2002-12-31"), by = "month") x <- seq(0, 1, length.out = 10) y <- seq(0, 1, length.out = 10) dat <- stars::st_as_stars(array(rnorm(10*10*36), c(10, 10, 36))) %>% st_set_dimensions(1, values = x, name = "x") %>% st_set_dimensions(2, values = y, name = "y") %>% st_set_dimensions(3, values = times, name = "time") # Calculate annual climatology clim <- get_climatology(dat) # Calculate monthly climatology monthly_clim <- get_climatology(dat, monthly = TRUE)
Computes pixel-wise correlations between a spatiotemporal field and each PC amplitude time series. Checks for overlapping time steps between the raster field and the PC amplitudes.
get_correlation(dat, patterns, amplitudes = NULL)get_correlation(dat, patterns, amplitudes = NULL)
dat |
A stars object with a time dimension |
patterns |
A patterns object from patterns() |
amplitudes |
Optional amplitudes tibble (defaults to patterns$amplitudes) |
A stars object with correlation values for each PC
Computes pixel-wise correlations with FDR correction and returns significance contour lines as sf polygons.
get_fdr(dat, patterns, fdr = 0.1, amplitudes = NULL)get_fdr(dat, patterns, fdr = 0.1, amplitudes = NULL)
dat |
A stars object with a time dimension |
patterns |
A patterns object from patterns() |
fdr |
False discovery rate threshold (default 0.1) |
amplitudes |
Optional amplitudes tibble (defaults to patterns$amplitudes) |
An sf object with significance contour polygons for each PC
This function performs Empirical Orthogonal Function (EOF) analysis on spatial-temporal data. For large datasets, it automatically uses IRLBA (Implicitly Restarted Lanczos Bidiagonalization Algorithm) for efficient computation when the irlba package is available.
patterns( dat, k = 4, scale = FALSE, rotate = FALSE, monthly = FALSE, weight = TRUE, irlba_threshold = 5e+05 )patterns( dat, k = 4, scale = FALSE, rotate = FALSE, monthly = FALSE, weight = TRUE, irlba_threshold = 5e+05 )
dat |
A 'stars' object containing spatial and temporal dimensions |
k |
The number of PC/EOF modes to retain |
scale |
Logical, whether to scale before PCA |
rotate |
Logical, whether to apply Varimax rotation |
monthly |
Logical, whether to use monthly climatology |
weight |
Logical, whether to apply area weighting |
irlba_threshold |
Minimum number of data elements to trigger IRLBA usage (default: 50000). Set to Inf to always use base prcomp(). |
A 'patterns' object containing EOFs, amplitudes, and metadata
Shows shared EOFs on top and overlaid amplitude time series (colored by source) on the bottom.
## S3 method for class 'common_patterns' plot( x, scale = c("standardized", "variance", "raw"), scale_y = c("fixed", "free"), overlay = NULL, overlay_color = "grey30", overlay_fill = NA, ... )## S3 method for class 'common_patterns' plot( x, scale = c("standardized", "variance", "raw"), scale_y = c("fixed", "free"), overlay = NULL, overlay_color = "grey30", overlay_fill = NA, ... )
x |
A common_patterns object |
scale |
Amplitude scaling: "standardized" (default), "variance", or "raw" |
scale_y |
Y-axis scaling: "fixed" (default) or "free" |
overlay |
Optional sf object to overlay on EOF maps |
overlay_color |
Color for overlay geometry (default "grey30") |
overlay_fill |
Fill for overlay geometry (default NA) |
... |
Additional arguments (currently unused) |
A patchwork object (EOFs + amplitudes)
Provides visualization helpers for 'coupled_patterns' objects. The default 'type = "combined"' mirrors the patterns plotting workflow by displaying the predictor and response spatial patterns alongside their canonical variate time series. Additional types include canonical correlation bars, standalone canonical variate panels, canonical spatial patterns, or direct access to the underlying predictor / response pattern plots.
## S3 method for class 'coupled_patterns' plot( x, type = c("combined", "correlations", "canonical", "canonical_patterns", "predictor", "response"), side = c("predictor", "response", "both"), data = NULL, k = NULL, scaled = FALSE, ... )## S3 method for class 'coupled_patterns' plot( x, type = c("combined", "correlations", "canonical", "canonical_patterns", "predictor", "response"), side = c("predictor", "response", "both"), data = NULL, k = NULL, scaled = FALSE, ... )
x |
A 'coupled_patterns' object. |
type |
Plot type: one of '"combined"', '"correlations"', '"canonical"', '"canonical_patterns"', '"predictor"', or '"response"'. |
side |
When 'type = "canonical"' or 'type = "canonical_patterns"', choose from the predictor side, response side, or both (values: '"predictor"', '"response"', '"both"'). Ignored for other plot types. |
data |
Optional amplitudes or patterns for the canonical variate plots. For 'side = "both"', a list with elements 'predictor' and 'response' may be supplied. Defaults to the training patterns stored in 'x' when omitted. |
k |
Number of canonical modes to display (defaults to all available). |
scaled |
Logical, passed to the underlying pattern plots when relevant (defaults to 'FALSE'). |
... |
Additional arguments forwarded to 'plot.patterns()' for 'type = "predictor"' or 'type = "response"' calls. |
A ggplot object (or a patchwork object when 'type = "combined"' or 'type = "canonical_patterns"' with 'side = "both"').
Plot method for patterns objects
## S3 method for class 'patterns' plot( x, type = "combined", scaled = FALSE, rawdata = NULL, scale = c("standardized", "variance", "raw"), scale_y = c("fixed", "free"), events = NULL, overlay = NULL, overlay_color = "grey30", overlay_fill = NA, ... )## S3 method for class 'patterns' plot( x, type = "combined", scaled = FALSE, rawdata = NULL, scale = c("standardized", "variance", "raw"), scale_y = c("fixed", "free"), events = NULL, overlay = NULL, overlay_color = "grey30", overlay_fill = NA, ... )
x |
A patterns object |
type |
Type of plot: "combined" (default), "eofs", or "amplitudes" |
scaled |
For EOFs: show correlations (TRUE) or raw loadings (FALSE) |
rawdata |
Optional raw data for correlation calculation when scaled = TRUE |
scale |
For amplitudes: scaling method ("standardized", "variance", "raw") |
scale_y |
For amplitudes: y-axis scaling ("fixed" or "free") |
events |
For amplitudes: optional dates to mark with vertical lines |
overlay |
Optional sf object to overlay on EOF maps (e.g., coastlines, boundaries) |
overlay_color |
Color for overlay geometry (default "grey30") |
overlay_fill |
Fill for overlay geometry (default NA for no fill) |
... |
Additional arguments (currently unused) |
A ggplot2 object or patchwork object for combined plots
This function makes predictions using a coupled_patterns object created by couple(). It applies the learned CCA relationship to new predictor data to predict response patterns.
## S3 method for class 'coupled_patterns' predict( object, newdata, k = NULL, reconstruct = TRUE, predictor_patterns = NULL, ... )## S3 method for class 'coupled_patterns' predict( object, newdata, k = NULL, reconstruct = TRUE, predictor_patterns = NULL, ... )
object |
A coupled_patterns object from couple() |
newdata |
New predictor data (stars object) for making predictions |
k |
Number of CCA modes to use for prediction. If NULL, uses all available modes |
reconstruct |
Logical, whether to reconstruct the full spatial field (default: TRUE) |
predictor_patterns |
Optional patterns object to use instead of the one stored in the coupled object. Useful for cross-source prediction with common EOFs: the override patterns share the same EOF space but carry a different climatology. |
... |
Additional arguments (currently unused) |
If reconstruct=TRUE, returns a stars object with reconstructed spatial fields. If reconstruct=FALSE, returns a tibble with predicted amplitudes.
## Not run: # Create coupled patterns coupled <- couple(pred_patterns, resp_patterns, k = 3) # Make predictions on new data predictions <- predict(coupled, new_predictor_data) # Just get predicted amplitudes without spatial reconstruction amplitudes <- predict(coupled, new_predictor_data, reconstruct = FALSE) # Cross-source prediction with common EOFs cpat <- common_patterns(list(era = era, phyda = phyda), k = 5) coupled <- couple(cpat$era, fine_patterns, k = 3) predict(coupled, phyda_new, predictor_patterns = cpat$phyda) ## End(Not run)## Not run: # Create coupled patterns coupled <- couple(pred_patterns, resp_patterns, k = 3) # Make predictions on new data predictions <- predict(coupled, new_predictor_data) # Just get predicted amplitudes without spatial reconstruction amplitudes <- predict(coupled, new_predictor_data, reconstruct = FALSE) # Cross-source prediction with common EOFs cpat <- common_patterns(list(era = era, phyda = phyda), k = 5) coupled <- couple(cpat$era, fine_patterns, k = 3) predict(coupled, phyda_new, predictor_patterns = cpat$phyda) ## End(Not run)
Computes EOF patterns at maximum k once per fold, enabling cheap truncation during grid search. This is the expensive step - call it once, then use 'tune_cca()' for fast hyperparameter exploration.
prep_cv_folds( predictor, response, kfolds = 5, max_k_pred = 10, max_k_resp = 10, scale = FALSE, scale_pred = NULL, scale_resp = NULL, rotate = FALSE, monthly = FALSE, weight = TRUE, common_with = NULL )prep_cv_folds( predictor, response, kfolds = 5, max_k_pred = 10, max_k_resp = 10, scale = FALSE, scale_pred = NULL, scale_resp = NULL, rotate = FALSE, monthly = FALSE, weight = TRUE, common_with = NULL )
predictor |
A stars object with predictor data |
response |
A stars object with response data |
kfolds |
Number of cross-validation folds (default 5) |
max_k_pred |
Maximum number of predictor EOFs to compute (default 10) |
max_k_resp |
Maximum number of response EOFs to compute (default 10) |
scale |
Logical, whether to scale data before EOF extraction (default FALSE). Sets the default for both predictor and response; use 'scale_pred' and/or 'scale_resp' to override individually. |
scale_pred |
Logical, whether to scale predictor data before EOF extraction. Overrides 'scale' for the predictor when not NULL (default NULL). |
scale_resp |
Logical, whether to scale response data before EOF extraction. Overrides 'scale' for the response when not NULL (default NULL). |
rotate |
Logical, whether to apply varimax rotation (default FALSE) |
monthly |
Logical, whether to compute monthly climatology (default FALSE) |
weight |
Logical, whether to apply area weighting (default TRUE) |
common_with |
Optional named list of additional stars objects to include in common EOF computation via [common_patterns()]. When provided, predictor patterns are computed jointly with these datasets. The primary predictor is included under the name '.primary'. Test times are excluded from all datasets to prevent leakage. |
A cv_folds S3 object containing:
folds |
List of fold data, each containing train patterns and test data |
max_k_pred |
Maximum predictor k used |
max_k_resp |
Maximum response k used |
kfolds |
Number of folds |
common_times |
Vector of overlapping time steps |
pattern_opts |
List of pattern extraction options |
common_with_sources |
Names of common EOF sources (if used) |
## Not run: # Standard folds cv <- prep_cv_folds(coarse_data, fine_data, kfolds = 5, max_k_pred = 10, max_k_resp = 10) # With common EOFs from additional sources cv <- prep_cv_folds(coarse_data, fine_data, common_with = list(phyda = phyda_coarse), kfolds = 5, max_k_pred = 10, max_k_resp = 10) # Then tune over k_pred and k_resp (unchanged) results <- tune_cca(cv, k_pred = 1:10, k_resp = 1:10) ## End(Not run)## Not run: # Standard folds cv <- prep_cv_folds(coarse_data, fine_data, kfolds = 5, max_k_pred = 10, max_k_resp = 10) # With common EOFs from additional sources cv <- prep_cv_folds(coarse_data, fine_data, common_with = list(phyda = phyda_coarse), kfolds = 5, max_k_pred = 10, max_k_resp = 10) # Then tune over k_pred and k_resp (unchanged) results <- tune_cca(cv, k_pred = 1:10, k_resp = 1:10) ## End(Not run)
Divides time steps into k contiguous folds for temporal cross-validation. Folds are kept contiguous to preserve temporal structure.
prep_folds(times, kfolds = 5)prep_folds(times, kfolds = 5)
times |
Vector of time values to split |
kfolds |
Number of folds (default 5) |
List of k vectors, each containing the time values for that fold
[prep_cv_folds()] for preparing complete CV folds with pre-computed patterns
[tune_cca()] for hyperparameter grid search
Print method for coupled_patterns
## S3 method for class 'coupled_patterns' print(x, ...)## S3 method for class 'coupled_patterns' print(x, ...)
x |
A coupled_patterns object |
... |
Additional arguments (ignored) |
Print method for cv_folds objects
## S3 method for class 'cv_folds' print(x, ...)## S3 method for class 'cv_folds' print(x, ...)
x |
A cv_folds object |
... |
Additional arguments (ignored) |
Print method for patterns objects
## S3 method for class 'patterns' print(x, ...)## S3 method for class 'patterns' print(x, ...)
x |
A patterns object |
... |
Additional arguments passed to print |
This function projects new spatial-temporal data onto existing EOF patterns, returning the corresponding principal component time series. This is a core function used in pattern-based downscaling and reconstruction.
project_patterns(patterns, newdata)project_patterns(patterns, newdata)
patterns |
A patterns object containing EOFs, climatology, and other metadata |
newdata |
A stars object with new spatial-temporal data to project |
A tibble with time column and PC amplitude columns
## Not run: # Get patterns from training data pat <- patterns(training_data, k = 5) # Project new data onto these patterns new_amplitudes <- project_patterns(pat, new_data) ## End(Not run)## Not run: # Get patterns from training data pat <- patterns(training_data, k = 5) # Project new data onto these patterns new_amplitudes <- project_patterns(pat, new_data) ## End(Not run)
Convenience function to project multiple datasets onto the same patterns
project_patterns_multiple(patterns, data_list, names = NULL)project_patterns_multiple(patterns, data_list, names = NULL)
patterns |
A patterns object |
data_list |
List of stars objects to project |
names |
Optional names for the datasets |
List of projected amplitude tibbles
Converts PC amplitudes back into a full spatial-temporal field by multiplying by the EOF patterns and adding back the climatology.
reconstruct(target_patterns, amplitudes = NULL)reconstruct(target_patterns, amplitudes = NULL)
target_patterns |
A patterns object containing EOFs and climatology |
amplitudes |
Amplitudes to use for reconstruction. Can be: - NULL (default): uses original amplitudes from target_patterns - tibble: with time column and PC columns - stars object: will be projected onto patterns first |
For bounded variables like precipitation, the reconstructed field may contain small negative values due to EOF truncation. To clamp these, use 'mutate(result, across(everything(), ~pmax(.x, 0 * .x)))' (the '0 * .x' trick preserves units).
A stars object with reconstructed spatial-temporal data
Reverses the operation of get_anomalies(), adding the climatological
mean (and optionally multiplying by standard deviation) back to anomaly fields.
restore_climatology(anomalies, clim, scale = FALSE, monthly = FALSE)restore_climatology(anomalies, clim, scale = FALSE, monthly = FALSE)
anomalies |
A stars object containing anomalies (from |
clim |
A climatology list with |
scale |
Logical. If TRUE, multiply by standard deviation before adding mean (use when anomalies were standardized) |
monthly |
Logical. If TRUE, restore using monthly climatology |
A stars object with the original field restored
## Not run: clim <- get_climatology(dat) anom <- get_anomalies(dat, clim) restored <- restore_climatology(anom, clim) ## End(Not run)## Not run: clim <- get_climatology(dat) anom <- get_anomalies(dat, clim) restored <- restore_climatology(anom, clim) ## End(Not run)
Scree plot for EOF patterns
## S3 method for class 'patterns' screeplot(x, k = NULL, kmax = 10, rule_n = FALSE, ...)## S3 method for class 'patterns' screeplot(x, k = NULL, kmax = 10, rule_n = FALSE, ...)
x |
A patterns object from patterns() |
k |
Optional number of components to highlight with vertical line |
kmax |
Maximum number of components to show (default 10) |
rule_n |
Logical, whether to show the modified Rule N significance cutoff as a dashed blue line (default FALSE) |
... |
Additional arguments (currently unused) |
A ggplot object
## Not run: pat <- patterns(data, k = 5) screeplot(pat) screeplot(pat, k = 3, kmax = 8) screeplot(pat, rule_n = TRUE) # show significance cutoff ## End(Not run)## Not run: pat <- patterns(data, k = 5) screeplot(pat) screeplot(pat, k = 3, kmax = 8) screeplot(pat, rule_n = TRUE) # show significance cutoff ## End(Not run)
Aggregates results across folds and identifies best hyperparameters.
summarize_cv(cv_results, metric = "rmse", minimize = TRUE)summarize_cv(cv_results, metric = "rmse", minimize = TRUE)
cv_results |
A tibble from 'tune_cca()' |
metric |
Which metric to optimize (default "rmse") |
minimize |
Logical, whether to minimize (TRUE for RMSE) or maximize (FALSE for correlations). Default TRUE. |
A tibble with mean and sd of each metric per parameter combination, sorted by the target metric. Best parameters are attached as attribute "best_params".
## Not run: results <- tune_cca(cv, k_pred = 1:5, k_resp = 1:5) summary <- summarize_cv(results, metric = "rmse", minimize = TRUE) # Get best parameters (k_pred, k_resp, k_cca) attr(summary, "best_params") ## End(Not run)## Not run: results <- tune_cca(cv, k_pred = 1:5, k_resp = 1:5) summary <- summarize_cv(results, metric = "rmse", minimize = TRUE) # Get best parameters (k_pred, k_resp, k_cca) attr(summary, "best_params") ## End(Not run)
Aggregates results across folds and identifies optimal k.
summarize_eof_cv(cv_results, metric = "rmse", minimize = TRUE)summarize_eof_cv(cv_results, metric = "rmse", minimize = TRUE)
cv_results |
A tibble from 'tune_eof()' |
metric |
Which metric to optimize (default "rmse") |
minimize |
Logical, whether to minimize (TRUE for RMSE) or maximize (FALSE for correlations). Default TRUE. |
A tibble with mean and sd of each metric per k value, sorted by the target metric. Best k is attached as attribute "best_k".
Summary method for coupled_patterns
## S3 method for class 'coupled_patterns' summary(object, ...)## S3 method for class 'coupled_patterns' summary(object, ...)
object |
A coupled_patterns object |
... |
Additional arguments (ignored) |
Performs grid search over k_pred, k_resp, and optionally k_cca using precomputed patterns from 'prep_cv_folds()'. Pattern truncation is cheap, so this runs quickly even with large grids.
tune_cca( cv_folds, k_pred = 1:10, k_resp = 1:10, k_cca = NULL, metrics = c("rmse", "cor_spatial", "cor_temporal"), parallel = FALSE )tune_cca( cv_folds, k_pred = 1:10, k_resp = 1:10, k_cca = NULL, metrics = c("rmse", "cor_spatial", "cor_temporal"), parallel = FALSE )
cv_folds |
A cv_folds object from 'prep_cv_folds()' |
k_pred |
Vector of predictor EOF counts to try (default 1:10) |
k_resp |
Vector of response EOF counts to try (default 1:10) |
k_cca |
Vector of CCA mode counts to try, or NULL (default) to use 'min(k_pred, k_resp)' for each combination. Using fewer CCA modes than the maximum can act as regularization. |
metrics |
Character vector of metrics to compute. Options: "rmse", "cor_spatial", "cor_temporal" (default: all three) |
parallel |
Logical, whether to use furrr for parallel execution (default FALSE) |
A tibble with columns: k_pred, k_resp, k_cca, fold, and one column per metric requested.
## Not run: cv <- prep_cv_folds(coarse_data, fine_data, kfolds = 5, max_k_pred = 10, max_k_resp = 10) # Grid search over k_pred and k_resp (k_cca = min automatically) results <- tune_cca(cv, k_pred = 1:10, k_resp = 1:10) # Also tune k_cca for regularization results <- tune_cca(cv, k_pred = 1:10, k_resp = 1:10, k_cca = 1:5) # Summarize and find best params summary <- summarize_cv(results, metric = "rmse") ## End(Not run)## Not run: cv <- prep_cv_folds(coarse_data, fine_data, kfolds = 5, max_k_pred = 10, max_k_resp = 10) # Grid search over k_pred and k_resp (k_cca = min automatically) results <- tune_cca(cv, k_pred = 1:10, k_resp = 1:10) # Also tune k_cca for regularization results <- tune_cca(cv, k_pred = 1:10, k_resp = 1:10, k_cca = 1:5) # Summarize and find best params summary <- summarize_cv(results, metric = "rmse") ## End(Not run)
Evaluates reconstruction skill for different numbers of EOFs using k-fold cross-validation. For each fold, EOFs are fit on training data, test data is projected onto those EOFs, and the reconstruction is compared to the original test data.
tune_eof( data, k = 1:10, kfolds = 5, max_k = max(k), metrics = c("rmse", "cor_spatial", "cor_temporal"), scale = FALSE, monthly = FALSE, weight = TRUE )tune_eof( data, k = 1:10, kfolds = 5, max_k = max(k), metrics = c("rmse", "cor_spatial", "cor_temporal"), scale = FALSE, monthly = FALSE, weight = TRUE )
data |
A stars object with spatial-temporal data |
k |
Vector of EOF counts to evaluate (default 1:10) |
kfolds |
Number of cross-validation folds (default 5) |
max_k |
Maximum EOFs to compute per fold (default max(k)) |
metrics |
Character vector of metrics to compute. Options: "rmse", "cor_spatial", "cor_temporal" (default: all three) |
scale |
Logical, whether to scale data before EOF extraction (default FALSE) |
monthly |
Logical, whether to compute monthly climatology (default FALSE) |
weight |
Logical, whether to apply area weighting (default TRUE) |
A tibble with columns: k, fold, and one column per metric.
## Not run: # Find optimal k for precipitation field results <- tune_eof(precip_data, k = 1:15, kfolds = 5) summary <- summarize_eof_cv(results, metric = "rmse") # Plot reconstruction skill vs k library(ggplot2) results %>% group_by(k) %>% summarize(rmse = mean(rmse)) %>% ggplot(aes(k, rmse)) + geom_line() + geom_point() ## End(Not run)## Not run: # Find optimal k for precipitation field results <- tune_eof(precip_data, k = 1:15, kfolds = 5) summary <- summarize_eof_cv(results, metric = "rmse") # Plot reconstruction skill vs k library(ggplot2) results %>% group_by(k) %>% summarize(rmse = mean(rmse)) %>% ggplot(aes(k, rmse)) + geom_line() + geom_point() ## End(Not run)