test: replace fetch_openml() with local data make_classification()#430
test: replace fetch_openml() with local data make_classification()#430shamykyzer wants to merge 6 commits intomainfrom
Conversation
|
Hello @rpreen, could you please have a look at this?
Can I widen the tolerances or use range checks since this test verifies the factory pipeline, not model performance?
Also I believe the adaboost test is a pre-existing bug: Can you confirm that I am not misunderstanding? Thanks |
|
If there's a random seed set, does the test number need to be widened or just slightly adjusted? If a slight tweak is needed because it's using synth data that seems fine to me. The AdaBoost test should also have the seed set (they all should really). It looks like originally the disclosive and non-disclosive tests were in a single function which did set the seed to 42 (in fact I explicitly added the setting of the seed in #318 because it was causing problems) but then in #367 it was split into two functions and the random seed was removed and there were some other parameter changes which may have changed the behaviour. Perhaps you can use the original as a guide since - @jim-smith will know more than me as I didn't work on that test. |
|
There must be some way of avoiding the code duplication with the data creation? |
Replaces
fetch_openml()calls in test fixtures with locally generated data usingsklearn.make_classification()to avoid CI failures fromnetwork issues.
Closes [Tests] Replace network calls to fetch data from OpenML with local data #410