Skip to content

Conversation

@uridah
Copy link
Contributor

@uridah uridah commented Mar 10, 2017

In reference to #59
cc: @mjpieters

@soumith
Copy link
Contributor

soumith commented Mar 10, 2017

@uridah do you have a notebook that checks for sanity of this code, like the notebook that loads some CIFAR10 images and displays them:
https://github.com/pytorch/vision/blob/master/test/sanity_checks.ipynb

@uridah
Copy link
Contributor Author

uridah commented Mar 10, 2017

@soumith I don't but I can create one.

@mjpieters
Copy link

I don't know if PyTorch follows PEP8 as a styleguide? If so, you way want to run flake8 on your code and fix the issues that finds; I see a few whitespace issues in it. :-) If not, I'll leave it to the project owners to set the requirements.

@soumith
Copy link
Contributor

soumith commented Mar 10, 2017

thanks @mjpieters . We have a LINT check as part of the contbuild, and it is failing:

https://travis-ci.org/pytorch/vision/builds/209884377

Locally doing:

pip install flake8
flake8 .

will show exact LINT errors you need to fix @uridah

raise RuntimeError('Dataset not found or corrupted.' +
' You can use download=True to download it')

self.train_data = []

This comment was marked as off-topic.

self.dataset = dataset # training set or test set or extra set

# download and load the data
if self.dataset=='train':

This comment was marked as off-topic.


def __len__(self):
if self.dataset == 'train':
return 73257

This comment was marked as off-topic.

self.root = root
self.transform = transform
self.target_transform = target_transform
self.dataset = dataset # training set or test set or extra set

This comment was marked as off-topic.

@fmassa
Copy link
Member

fmassa commented Mar 10, 2017

@uridah I made some small inline comments (very minor and are only style changes). The PR looks good, thanks!

uridah added 2 commits March 14, 2017 03:04
- now using dictionary for urls, filenames and md5s
- updated len function
- renamed 'dataset' keyword to split
- fixed whitespaces using flake8
@uridah
Copy link
Contributor Author

uridah commented Mar 13, 2017

@fmassa Your comments very very useful and really helped concise the code
@soumith I added a new notebook named sanity_checks1.ipynb which is basically same as the sanity_checks.ipynb notebook, but in addition, is also calling SVHN datasets for transformations https://github.com/uridah/vision/blob/master/test/sanity_checks1.ipynb
Also, ran the code through flake8
@mjpieters

# reading(loading) mat file as array
loaded_mat = sio.loadmat(os.path.join(root, self.filename))

if self.split != 'test':

This comment was marked as off-topic.

self.test_labels = loaded_mat['y']
self.test_data = np.transpose(self.test_data, (3, 2, 1, 0))
else:
print ("Wrong dataset entered! Please use split=train or split=extra or split=test")

This comment was marked as off-topic.

self.target_transform = target_transform
self.split = split # training set or test set or extra set

if self.split in self.split_list:

This comment was marked as off-topic.

print ("Wrong dataset entered! Please use split=train or split=extra or split=test")

def __getitem__(self, index):
if self.split == 'train' or self.split == 'extra':

This comment was marked as off-topic.

return img, target

def __len__(self):
return len(self.train_data)

This comment was marked as off-topic.

@fmassa
Copy link
Member

fmassa commented Mar 13, 2017

Hi @uridah
I made some more comments. The code is looking very nice!

@uridah
Copy link
Contributor Author

uridah commented Mar 13, 2017

Thanks @fmassa. Updated according to your suggestions


if self.split not in self.split_list:
raise ValueError('Wrong split entered! Please use split=train or split=extra or split=test')
else:

This comment was marked as off-topic.

self.download()

if not self._check_integrity():
raise RuntimeError('Dataset not found or corrupted.' +

This comment was marked as off-topic.

@fmassa
Copy link
Member

fmassa commented Mar 13, 2017

About the check_sanity1.pynb file, maybe you could integrate your changes into the check_sanity.pynb file? What do you think?

@uridah
Copy link
Contributor Author

uridah commented Mar 13, 2017

I will, if you guys think it's good enough.

@fmassa
Copy link
Member

fmassa commented Mar 13, 2017

@uridah another thing, it would be great if you could add an entry in the doc in README.rst, like the one in Cifar10 for example. Maybe an example in the doc would be enough?
cc @soumith

@uridah
Copy link
Contributor Author

uridah commented Mar 15, 2017

here is the pull request for the change in documentation: #104
@fmassa

@fmassa
Copy link
Member

fmassa commented Mar 15, 2017

Hi @uridah
So, looking at the notebook that you added, I have the impression that the numbers are rotated. Maybe the x and y axis in the dataset are inverted, and you need to transpose them?
Also, could you please fix the indentation in the line I commented?
Once those are fixed, I think there is no need for the notebook (but @soumith knows better), so could you please remove it?
After that, the PR is good to be merged!

@uridah
Copy link
Contributor Author

uridah commented Mar 16, 2017

@fmassa as it turns out I needed it to transpose along 3,2,0,1 axis instead of 3,2,1,0. I have fixed that and updated the indentation and sanity_checks1.ipynb. Please have a look and let me know if anything else needs to be changed.
Also, I created a separate request to update the documentation but now I have merged it with this one.

@soumith soumith merged commit c80b3ae into pytorch:master Mar 16, 2017
@soumith
Copy link
Contributor

soumith commented Mar 16, 2017

Thanks Uridah, as you saw the last 4 commits, I made some minor changes to your PR. But it looked great. Merged into master now!!!

soumith pushed a commit that referenced this pull request Mar 16, 2017
@soumith soumith mentioned this pull request Mar 17, 2017
@vabh
Copy link
Contributor

vabh commented Jul 3, 2017

Hello,

It seems that the labels returned are in the range 1-10 (the data set assigns class 10 to the digit 0 and class d for all other digits, d [ http://ufldl.stanford.edu/housenumbers/ see section overview ]).

Given that some of the loss functions (CELoss, NLLLoss) expect the class labels to be in the range [0, C-1], shouldn't that be reflected here as well?

Another thing is that the the returned labels are of type ByteTensor and of size batchSize x 1. Again, CELoss, etc, expect it to be a 1d LongTensor.

I can make a PR if the current behaviour is inconsistent and should be made similar to the other data sets, for example as in CIFAR10

@soumith
Copy link
Contributor

soumith commented Jul 3, 2017

@vabh if you could make a PR to reflect this, that'd be great. Thanks.

@vabh
Copy link
Contributor

vabh commented Jul 3, 2017

@soumith I made one here: #194

rajveerb pushed a commit to rajveerb/vision that referenced this pull request Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants