Skip to content

Datatype comparison bug 2021-12-01#90

Merged
fedshyvana merged 3 commits intomahmoodlab:masterfrom
andrew-weisman:datatype_comparison_bug-2021-12-01
Dec 2, 2021
Merged

Datatype comparison bug 2021-12-01#90
fedshyvana merged 3 commits intomahmoodlab:masterfrom
andrew-weisman:datatype_comparison_bug-2021-12-01

Conversation

@andrew-weisman
Copy link
Contributor

Without "dtype=self.slide_data['slide_id'].dtype", read_csv() will convert all-number columns to a numerical type. Even if we convert numerical columns back to objects later, we may lose zero-padding in the process; the columns must be correctly read in from the get-go. When we compare the individual train/val/test columns to self.slide_data['slide_id'] in the get_split_from_df() method, we cannot compare objects (strings) to numbers or even to incorrectly zero-padded objects/strings. An example of this breaking is shown in https://github.com/andrew-weisman/clam_analysis/tree/main/datatype_comparison_bug-2021-12-01 (look at the Jupyter notebook in GitHub).

@fedshyvana fedshyvana merged commit 5efe3ea into mahmoodlab:master Dec 2, 2021
@fedshyvana
Copy link
Collaborator

thanks Andrew, i did not anticipate slide ids to consist of only numerical characters but i suppose that is indeed possible.

@andrew-weisman
Copy link
Contributor Author

andrew-weisman commented Dec 3, 2021 via email

doori pushed a commit to msk-mind/CLAM that referenced this pull request Jan 26, 2022
…son_bug-2021-12-01

Datatype comparison bug 2021-12-01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants