-
Notifications
You must be signed in to change notification settings - Fork 3
Change sparse matrix to array #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #96 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 16 16
Lines 943 943
=========================================
Hits 943 943 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
gmmajal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the changes specified for the doctrings and further clarify if text_features needs to be modified?
| def predict(self, X) -> csr_array: | ||
| """ | ||
| Predicts binary concept match labels for each input text. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the docstrings for the predict() method we can replace "A sparse matrix of shape ..." with a sparse array for the sake of consistency.
| txt_vec = self.text_vectorizer_.transform([inp]) | ||
| else: | ||
| txt_vec = 0 | ||
| txt_feat = self.text_features_.transform([text])[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why over here(line 433) and in line 458, when the transform method is applied we access index 0. I see that it was modified for the text_vectorizer attribute i.e. index 0 is not accessed. Should it be also modified for the text_features attribute or does it have a different data structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at what the transform methods are doing for text_vectorizer and text_features, respectively. The one for text_vectorizer returns a sparse matrix whereas for text_features a numpy array is returned. The data structure does indeed seem to be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text vectorizer produces a csr_matrix from scikit-learn, so we can’t switch it to a sparray at this point.
|
We can’t safely transition from Specifically, scikit-learn’s PR #31072 (“First steps toward sparray migration pass 2”) is still open, indicating that full adoption isn’t complete yet. |
Good catch! I had a look at the pull request you referenced. The team at |
Replace
spmatrixwithsparrayfor SciPy compatibilityThis pull request updates all references to
scipy.sparse.spmatrixto use the newscipy.sparse.sparrayclass, in line with SciPy's ongoing deprecation ofspmatrix. This change ensures compatibility with recent and future versions of SciPy.Changes Made
spmatrixtype checks and imports withsparray.Related Issue
Fixes stwfsapy#89