Skip to content

Conversation

@icfaust
Copy link
Contributor

@icfaust icfaust commented Dec 4, 2024

Description

This refactors and standardizes SVM algorithms to follow other sklearnex estimators and adds array API zero copy GPU support while reducing the code ~300-400 lines. This required the following changes:

  • Removed the SVMType object from onedal.svm which is not necessary for proper operation
  • The __init__ signature is standardized for all onedal SVM estimators, unused kwargs are removed for oneDAL calls before use
  • gamma keyword in onedal python estimator defaults to 'auto' rather than scale. It is implicitly expected that the user will calculate gamma before passing the value to the onedal python estimator
  • signature of fit in onedal SVM estimators add a class_count kwarg as oneDAL requires it to be defined beforehand. Calculating this in the onedal python estimator is scikit-learn conformance and is moved to the sklearnex estimator
  • All data validation is moved from the onedal estimator to the sklearnex estimator
  • support of csr_array is added (not just csr_matrix)
  • onedal SVM tests are completely rewritten to match the changes above, this includes moving some tests to the sklearnex estimator. No overall testing is lost. A refactor of the SVM testing should be done to see what is duplicating sklearn functionality and what utility the tests provide.
  • function get_sklearnex_version is removed as it is an unnecessary aliasing of daal_check_version
  • sklearnex.svm._common is renamed to sklearnex.svm._base to match scikit-learn
  • sklearnex.svm files containing sklearnex classes are centralized to sklearnex.svm._classes to minimize duplicated code and match scikit-learn
  • BaseSVM sklearnex object is greatly expanded to reduce code duplication and ease maintenance.
  • BaseSVC and BaseSVR classes are expanded to remove code duplication.
  • Function _svm_sample_weight_check replaces _get_sample_weight as _check_sample_weight central function in sklearnex.utils.validation is used instead. This function provides SVM-specific checks per class while maximally re-using available array API code in sklearnex. This should reduce maintenance
  • _compute_gamma_sigma is moved to sklearnex to match scikit-learn and is separated for easier maintenance
  • _onedal_cpu_supported and _onedal_gpu_supported use _n_jobs_supported_onedal_methods to define methods not including fit for oneDAL offloading checks. This reduces maintenance by making the n_jobs supporting list the single central location defining oneDAL supporting methods.
  • SVM method _validate_targets is defined locally with an array API-compliant version for Classification and Regression
  • Use of the _onedal_factory object allows for future easy SPMD support in the SVM algos, maximal code resuse, and follows precedence in the repository. This will minimize maintenance.
  • _save_attributes function now takes the xp array namespace to properly handle onedal data to sklearn data conversions
  • _onedal_ovr_decision_function is partially rewritten for array API support (fancy indexing issues make problems). Further performance optimization should be done given its nature
  • enable_array_api decorator used on SVM sklearnex estimators based on limitations in LabelEncoder accuracy_score and r2_score (> 1.5 sklearn)
  • _onedal_cpu_supported and _onedal_gpu_supported modified for array API support and for reusing maximal scikit-learn functionality for minimal maintenance
  • Certain sparse SVM tests deselections for 2025.0 removed
  • NuSVC-specific checks are rewritten in array API compliant manner. Likely needs a refactor to C++ code improve performance.
  • sklearnex design rule changed to allow continued support of target_offload for predict_proba/ probabilities
  • subtle change in sklearnex.utils.class_weight to guarantee numpy support when it supports __array_namespace__ but does not fully implement the array API standard (i.e. the device attribute).
  • SVM algos added to array_api.rst to show array API support in documentation
  • Removed negative weight support from SVM algos (missing check)
  • Added _onedal_validate_targets function which replicated sklearn's _validate_targets but with array API support and converts data to match X data dtype

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

@icfaust
Copy link
Contributor Author

icfaust commented Dec 4, 2024

/intelci: run

@icfaust
Copy link
Contributor Author

icfaust commented Dec 5, 2024

/intelci: run

sample_weight = _check_sample_weight(sample_weight, X)
# oneDAL only accepts sample_weights, apply class_weight directly

# due to the nature of how sklearn checks nu in NuSVC (by not checking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this perhaps have some misplaced parenthesis? Or is it perhaps missing some sentence?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nothing is missing.
Those are two different sentences that describe the code above (the first one) and below (the second one).

@david-cortes-intel
Copy link
Contributor

The CI issues:

E AssertionError: SVC failed when fitted on one label after sample_weight trimming. Error message is not explicit, it should have 'class'.

Looks like it could be solved by adding an extra check for single-class data in the patching conditions.

For the other issue:

WARNING: The candidate selected for download or install is a yanked version

It should be solvable by merging the latest master.

@david-cortes-intel
Copy link
Contributor

Looks like these changes will be required for sklearn1.8, as otherwise conformance tests throw errors about 'xp' argument in some methods.

@Vika-F
Copy link
Contributor

Vika-F commented Nov 26, 2025

/intelci: run

@david-cortes-intel
Copy link
Contributor

There will be some changes required for sklearn1.8 that generate merge conflicts with this PR:
#2801

Perhaps they could be all incorporated here instead if it makes the merging easier.

@Vika-F
Copy link
Contributor

Vika-F commented Nov 27, 2025

There will be some changes required for sklearn1.8 that generate merge conflicts with this PR: #2801

Perhaps they could be all incorporated here instead if it makes the merging easier.

@david-cortes-intel Ok, I will do that. Anyway I will be fixing pre-commit issues here.

@Vika-F
Copy link
Contributor

Vika-F commented Nov 27, 2025

/intelci: run

@Vika-F
Copy link
Contributor

Vika-F commented Nov 27, 2025

/intelci: run

@david-cortes-intel
Copy link
Contributor

/intelci: run

@david-cortes-intel
Copy link
Contributor

/intelci: run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants