-
Notifications
You must be signed in to change notification settings - Fork 184
[enhancement] Enable array API for SVM algorithms #2209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/intelci: run |
|
/intelci: run |
| sample_weight = _check_sample_weight(sample_weight, X) | ||
| # oneDAL only accepts sample_weights, apply class_weight directly | ||
|
|
||
| # due to the nature of how sklearn checks nu in NuSVC (by not checking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this perhaps have some misplaced parenthesis? Or is it perhaps missing some sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think nothing is missing.
Those are two different sentences that describe the code above (the first one) and below (the second one).
|
The CI issues:
Looks like it could be solved by adding an extra check for single-class data in the patching conditions. For the other issue:
It should be solvable by merging the latest master. |
|
Looks like these changes will be required for sklearn1.8, as otherwise conformance tests throw errors about 'xp' argument in some methods. |
|
/intelci: run |
|
There will be some changes required for sklearn1.8 that generate merge conflicts with this PR: Perhaps they could be all incorporated here instead if it makes the merging easier. |
@david-cortes-intel Ok, I will do that. Anyway I will be fixing pre-commit issues here. |
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
Description
This refactors and standardizes SVM algorithms to follow other sklearnex estimators and adds array API zero copy GPU support while reducing the code ~300-400 lines. This required the following changes:
SVMTypeobject fromonedal.svmwhich is not necessary for proper operation__init__signature is standardized for all onedal SVM estimators, unused kwargs are removed for oneDAL calls before usegammakeyword in onedal python estimator defaults to'auto'rather thanscale. It is implicitly expected that the user will calculategammabefore passing the value to the onedal python estimatorfitin onedal SVM estimators add aclass_countkwarg as oneDAL requires it to be defined beforehand. Calculating this in the onedal python estimator is scikit-learn conformance and is moved to the sklearnex estimatorcsr_arrayis added (not justcsr_matrix)get_sklearnex_versionis removed as it is an unnecessary aliasing ofdaal_check_versionsklearnex.svm._commonis renamed tosklearnex.svm._baseto match scikit-learnsklearnex.svmfiles containing sklearnex classes are centralized tosklearnex.svm._classesto minimize duplicated code and match scikit-learnBaseSVMsklearnex object is greatly expanded to reduce code duplication and ease maintenance.BaseSVCandBaseSVRclasses are expanded to remove code duplication._svm_sample_weight_checkreplaces_get_sample_weightas_check_sample_weightcentral function insklearnex.utils.validationis used instead. This function provides SVM-specific checks per class while maximally re-using available array API code in sklearnex. This should reduce maintenance_compute_gamma_sigmais moved to sklearnex to match scikit-learn and is separated for easier maintenance_onedal_cpu_supportedand_onedal_gpu_supporteduse_n_jobs_supported_onedal_methodsto define methods not includingfitfor oneDAL offloading checks. This reduces maintenance by making then_jobssupporting list the single central location defining oneDAL supporting methods.SVMmethod_validate_targetsis defined locally with an array API-compliant version for Classification and Regression_onedal_factoryobject allows for future easy SPMD support in the SVM algos, maximal code resuse, and follows precedence in the repository. This will minimize maintenance._save_attributesfunction now takes thexparray namespace to properly handle onedal data to sklearn data conversions_onedal_ovr_decision_functionis partially rewritten for array API support (fancy indexing issues make problems). Further performance optimization should be done given its natureenable_array_apidecorator used on SVM sklearnex estimators based on limitations inLabelEncoderaccuracy_scoreandr2_score(> 1.5 sklearn)_onedal_cpu_supportedand_onedal_gpu_supportedmodified for array API support and for reusing maximal scikit-learn functionality for minimal maintenancetarget_offloadfor predict_proba/ probabilitiessklearnex.utils.class_weightto guarantee numpy support when it supports__array_namespace__but does not fully implement the array API standard (i.e. the device attribute).array_api.rstto show array API support in documentation_onedal_validate_targetsfunction which replicated sklearn's_validate_targetsbut with array API support and converts data to match X data dtypePR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing