preprocess with PIL the full val dataset and save binary#16529
preprocess with PIL the full val dataset and save binary#16529luotao1 merged 4 commits intoPaddlePaddle:developfrom
Conversation
test=develop
test=develop
|
@luotao1 Please check if the DATA_DIR is ok? |
|
Could you add the command from |
| SIZE_FLOAT32 = 4 | ||
| SIZE_INT64 = 8 | ||
|
|
||
| DATA_DIR = '/data/ILSVRC2012' |
There was a problem hiding this comment.
If I run cd build; python ../paddle/fluid/inference/tests/api/preprocess.py, where will the output data?
There was a problem hiding this comment.
Currently it outputs to /data/ILSVRC2012/data.bin. I also wanted to ask this. Where should I put the output data.bin
There was a problem hiding this comment.
/data/ILSVRC2012/data.bin is not better, since we don't have /data authority. ./data/ILSVRC2012/data.bin?
There was a problem hiding this comment.
@lidanqing-intel you can put in the .cache like V1 did :
There was a problem hiding this comment.
ok will use .cache
There was a problem hiding this comment.
@luotao1 the data.bin path will be ~/.cache/int8_full_val_bin/data.bin Is it ok ? I am worried if I put directly in ~/.cache/ may cause some misunderstanding
There was a problem hiding this comment.
Since V1 put into ~/.cache/paddle/dataset/int8/download, how about put into ~/.cache/paddle/dataset/int8/download/int8_full_val.bin
There was a problem hiding this comment.
Since V1 put into
~/.cache/paddle/dataset/int8/download, how about put into~/.cache/paddle/dataset/int8/download/int8_full_val.bin
Yes! Agree.
There was a problem hiding this comment.
I see the unzip dataset is not in this location.
Ok I will add wget and unzip part. I may change the name to full_ILSVRC2012_val.py, but it is about preprocess. Maybe 'full_ILSVRC2012_val_preprocess.py'? What do you think? |
I think it's OK. |
Do I need to give the option to download 100 val images? Or only downloading full val is good. |
|
only downloading full val is enough |
ok |
|
@bingyanghuang The generated file is |
test=develop
Done |
| with open(file_list) as flist: | ||
| lines = [line.strip() for line in flist] | ||
| num_images = len(lines) | ||
| if not os.path.exists(output_file): |
There was a problem hiding this comment.
we cannot only judge the existence of the output file. Because the process of generating the the "data.bin" is too long , it is possible that the process is not finished but user stop this running or some error happens like "no space left". These kinds of interruption will leave the uncompleted file in the folder. And next time when you run again this python script, it will not generate the new output binary file.
| num_images = len(lines) | ||
| if not os.path.exists(output_file): | ||
| print( | ||
| 'Preprocessing to binary file...<num_images><all images><all labels>...\n' |
There was a problem hiding this comment.
This Print is hard to understand.
| print( | ||
| 'Preprocessing to binary file...<num_images><all images><all labels>...\n' | ||
| ) | ||
| with open(output_file, "w+b") as of: |
There was a problem hiding this comment.
please add some print every 1000 images.
luotao1
left a comment
There was a problem hiding this comment.
I merge it at first, please refine it later.
test=develop
preprocess the full val of ILSVRC2012 with python PIL and save to binary file so as to align with test_calibration INT8v1
provide full val data for analyzer_int8_resnet50_test in PR #16399