Init VariableImportance class#129
Conversation
…t VariableImportance or NULL
There was a problem hiding this comment.
Pull request overview
This PR appears to be a small maintenance/release update across the supervised-training codepaths and documentation, plus a package version bump.
Changes:
- Remove redundant
check_is_S7(hyperparameters, <Algo>Hyperparameters)checks from severaltrain_methods. - Minor documentation/comment tweaks (tests note, LightGBM predict docs,
draw_varimpparam docs) and removal of unusedget_ranger_config(). - Bump package
VersionandDateinDESCRIPTION.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/testthat/test_Supervised.R | Adds context about small datasets and expected GLM warnings in tests. |
| R/train_TabNet.R | Removes redundant S7 hyperparameters check and some commented code. |
| R/train_SVM.R | Clarifies preprocessing section comments. |
| R/train_Ranger.R | Removes redundant S7 hyperparameters check. |
| R/train_LightRuleFit.R | Removes redundant S7 hyperparameters check. |
| R/train_LightRF.R | Removes redundant S7 hyperparameters check. |
| R/train_LightGBM.R | Removes redundant S7 hyperparameters check; expands predict docstring re preprocessing. |
| R/train_LightCART.R | Removes redundant S7 hyperparameters check. |
| R/train_Isotonic.R | Removes redundant S7 hyperparameters check. |
| R/train_GLMNET.R | Removes redundant S7 hyperparameters check. |
| R/train_GAM.R | Removes redundant S7 hyperparameters check. |
| R/draw_varimp.R | Documents that x can be coercible to numeric. |
| R/02_Hyperparameters.R | Removes get_ranger_config() helper. |
| man/draw_varimp.Rd | Keeps Rd docs in sync with draw_varimp() param doc. |
| DESCRIPTION | Version/date bump. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Version: 1.0.1 | ||
| Title: Machine Learning and Visualization | ||
| Date: 2026-03-14 | ||
| Date: 2026-04-03 |
There was a problem hiding this comment.
PR title suggests initializing a VariableImportance class, but the changes here are primarily cleanup (removing redundant check_is_S7() calls), small documentation/comment tweaks, and a version/date bump. Please either update the PR title/description to match the actual changes or include the missing VariableImportance implementation changes in this PR.
There was a problem hiding this comment.
Code Review
This pull request updates the rtemis package to version 1.0.1, primarily focusing on code cleanup and documentation refinements. Key changes include the removal of redundant check_is_S7 calls across various training methods, the deletion of the unused get_ranger_config function, and updates to the documentation for draw_varimp and LightGBM prediction. Additionally, commented-out code was removed from the TabNet training script, and a note regarding test datasets was added to the test suite. Feedback highlights a discrepancy between the PR title and the actual code changes, and suggests improving consistency in documentation and section headers across similar methods.
| #' which can be used to visualize variable importance, model coefficients, etc. | ||
| #' | ||
| #' @param x Numeric vector: Input. | ||
| #' @param x Numeric vector (or coercible to numeric): Input. |
There was a problem hiding this comment.
| #' @param newdata tabular data: Data to predict on. Will have been preprocessed by | ||
| #' `predict.Supervised` before calling this method if algorithm-specific preprocessing was performed during training. |
There was a problem hiding this comment.
This documentation update regarding newdata preprocessing is helpful, but it has only been added to the predict_super method for lgb.Booster. For consistency and to avoid confusion, similar documentation should be added to other predict_super methods in the package (e.g., for TabNet, GAM, etc.) that follow the same preprocessing logic.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request standardizes variable importance handling by introducing a new VariableImportance S7 class and updating the Supervised and SupervisedRes classes accordingly. Various algorithm-specific varimp_super methods and plotting functions were updated to support this new structure, and redundant S7 type checks were removed from several training functions. Review feedback highlights potential runtime errors when subsetting dgCMatrix objects in GLMNET and LightRuleFit, a risk of indexing an empty list in SupervisedRes plots, and the need for dynamic column naming in GAM and GLM importance data to reflect the specific measure used.
| } | ||
|
|
||
| # Exclude intercept | ||
| coefs <- coefs[-1] |
There was a problem hiding this comment.
coef(model) for a cv.glmnet object returns a dgCMatrix. Subsetting it with [-1] to remove the intercept converts it to a numeric vector but loses the names (variable names). This will cause names(coefs) to be NULL in the subsequent data.table call, leading to an error. Using [, 1] first ensures a named vector is returned.
coefs <- coefs[, 1][-1]| #' @noRd | ||
| method(varimp_super, LightRuleFit) <- function(model) { | ||
| coef(model@model_glmnet@model) | ||
| .coef <- coef(model@model_glmnet@model) |
There was a problem hiding this comment.
model@model_glmnet@model is typically a cv.glmnet object, so coef() returns a dgCMatrix. Calling names() on a matrix returns NULL, which will cause the variable column in the data.table to be empty or cause a length mismatch error. You should extract the first column to get a named vector and remove the intercept.
.coef <- coef(model@model_glmnet@model)[, 1][-1]| # ylab | ||
| if (is.null(ylab)) { | ||
| measure_name <- if (is.null(measure)) { | ||
| names(x@varimp[[1L]]@data)[2L] |
There was a problem hiding this comment.
| VariableImportance( | ||
| data.table( | ||
| variable = names(.coef), | ||
| Coefficient = unname(.coef) | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The column name for the importance measure is hardcoded to "Coefficient". However, this method supports multiple types of importance measures including "p-value" and "edf". Using the type argument as the column name would provide more accurate labels in plots and summaries.
.vi_dt <- data.table(variable = names(.coef), importance = unname(.coef))
setnames(.vi_dt, "importance", type)
VariableImportance(.vi_dt)| VariableImportance( | ||
| data.table( | ||
| variable = names(.coef), | ||
| Coefficient = unname(.coef) | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The column name for the importance measure is hardcoded to "Coefficient". Since this method also supports "p-value" as an importance type, it is better to use the type variable to name the column dynamically.
.vi_dt <- data.table(variable = names(.coef), importance = unname(.coef))
setnames(.vi_dt, "importance", type)
VariableImportance(.vi_dt)…t in LightRuleFit to propagate to LightGBM and GLMNET calls.
No description provided.