Skip to content

Conversation

@Lakshmi-bashyam
Copy link
Contributor

Replace spmatrix with sparray for SciPy compatibility

This pull request updates all references to scipy.sparse.spmatrix to use the new scipy.sparse.sparray class, in line with SciPy's ongoing deprecation of spmatrix. This change ensures compatibility with recent and future versions of SciPy.

Changes Made

  • Replaced all instances of spmatrix type checks and imports with sparray.
  • Modified test cases to make them compatible with the new change.

Related Issue

Fixes stwfsapy#89

@codecov
Copy link

codecov bot commented Jul 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (cf652c4) to head (04e04fc).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##            master       #96   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           16        16           
  Lines          943       943           
=========================================
  Hits           943       943           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Lakshmi-bashyam Lakshmi-bashyam marked this pull request as ready for review July 29, 2025 11:22
@Lakshmi-bashyam Lakshmi-bashyam requested a review from gmmajal July 29, 2025 11:22
Copy link
Contributor

@gmmajal gmmajal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the changes specified for the doctrings and further clarify if text_features needs to be modified?

def predict(self, X) -> csr_array:
"""
Predicts binary concept match labels for each input text.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the docstrings for the predict() method we can replace "A sparse matrix of shape ..." with a sparse array for the sake of consistency.

txt_vec = self.text_vectorizer_.transform([inp])
else:
txt_vec = 0
txt_feat = self.text_features_.transform([text])[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why over here(line 433) and in line 458, when the transform method is applied we access index 0. I see that it was modified for the text_vectorizer attribute i.e. index 0 is not accessed. Should it be also modified for the text_features attribute or does it have a different data structure?

Copy link
Contributor

@gmmajal gmmajal Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at what the transform methods are doing for text_vectorizer and text_features, respectively. The one for text_vectorizer returns a sparse matrix whereas for text_features a numpy array is returned. The data structure does indeed seem to be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text vectorizer produces a csr_matrix from scikit-learn, so we can’t switch it to a sparray at this point.

@Lakshmi-bashyam Lakshmi-bashyam marked this pull request as draft August 14, 2025 10:41
@Lakshmi-bashyam
Copy link
Contributor Author

We can’t safely transition from spmatrix to the sparray hierarchy just yet. Our dependency on scikit-learn still poses compatibility risks. While scikit-learn has begun its migration toward supporting sparray, the internal transition is still in progress.

Specifically, scikit-learn’s PR #31072 (“First steps toward sparray migration pass 2”) is still open, indicating that full adoption isn’t complete yet.

@gmmajal
Copy link
Contributor

gmmajal commented Aug 14, 2025

We can’t safely transition from spmatrix to the sparray hierarchy just yet. Our dependency on scikit-learn still poses compatibility risks. While scikit-learn has begun its migration toward supporting sparray, the internal transition is still in progress.

Specifically, scikit-learn’s PR #31072 (“First steps toward sparray migration pass 2”) is still open, indicating that full adoption isn’t complete yet.

Good catch! I had a look at the pull request you referenced. The team at scikit-learn are in the process of migrating as you mentioned. As part of their release v1.8, they have this particular pull request: scikit-learn/scikit-learn#31177. The release is expected to be available by mid Nov 2025, see here: https://github.com/scikit-learn/scikit-learn/milestone/66. We can wait till scikit-learn has a transition mechanism in place before migrating to sparray ourselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate from spmatrix to sparray

2 participants