Skip to content

fix dataset reading and add support for full dataset#16559

Merged
luotao1 merged 13 commits intoPaddlePaddle:developfrom
wojtuss:wojtuss/fix-int8-test
Apr 2, 2019
Merged

fix dataset reading and add support for full dataset#16559
luotao1 merged 13 commits intoPaddlePaddle:developfrom
wojtuss:wojtuss/fix-int8-test

Conversation

@wojtuss
Copy link

@wojtuss wojtuss commented Mar 29, 2019

In this patch, we fix a bug in reading dataset and add support for reading a whole imagenet dataset preprocessed using the tool from #16529.

Most changes come from the diff between #16532 and merged #16399.

Additional methods are added to tester_helper.h. Could be refactored to reuse existing methods but would require more changes in other tests - left to further refactoring.

test=develop

"depthwise_conv_mkldnn_pass", "conv_bn_fuse_pass",
"conv_eltwiseadd_bn_fuse_pass", "conv_bias_mkldnn_fuse_pass",
"conv_elementwise_add_mkldnn_fuse_pass", "conv_relu_mkldnn_fuse_pass",
"fc_fuse_pass", "is_test_pass"});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix SetPass will in another PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Author

@wojtuss wojtuss Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1 , it turned out that no special fix is required for this and the call to SetPasses() can be simply removed now. Some other modifications must have fixed the accuracy problem. Of course, repeated passes problem would still benefit from a cleanup, but it is not critical here. We will prepare a PR with a cleanup later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will prepare a PR with a cleanup later.

Got it. Does new PR fix #16559 (comment) as well?

Copy link
Author

@wojtuss wojtuss Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will adjust the fps calculation and remove redundant PredictionRun method in this PR.
WIP

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -148,7 +151,7 @@ inference_analysis_api_test_with_fake_data(test_analyzer_mobilenet_depthwise_con
if(WITH_MKLDNN)
set(INT8_DATA_DIR "${INFERENCE_DEMO_INSTALL_DIR}/int8")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change the INT8_DATA_DIR if you change the data. Otherwise, since INT8_DATA_DIR is already existed ON CI, it will be error.

Copy link
Author

@wojtuss wojtuss Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this patch, we keep the data.bin file name but change the archive name into imagenet_val_100_tail.tar.gz, so there should be no conflict of archive names, or might there be a conflict of data.bin between several builds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if INT8_DATA_DIR is already exised ON CI, it will not download imagenet_val_100_tail.tar.gz

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean that several CI builds share the same directory with the dataset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our CI cache the INFERENCE_DEMO_INSTALL_DIR for speedup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set(INT8_DATA_DIR "${INFERENCE_DEMO_INSTALL_DIR}/int8v2") or other name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

DEFINE_int32(iterations, 0, "number of batches to process"); // setting to 0,
// means process
// the whole
// dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment from 46-49 should be in one line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}

// With support for multiple batches (multiple outputs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why do you rewrite this again, could you enhance line331predictor->Run(inputs[i], outputs, batch_size); to predictor->Run(inputs[i], &(*outputs)[i], FLAGS_batch_size);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original PredictionRun method discards outputs of all but the last iteration and outputs is of type std::vector<PaddleTensor> *. To calculate average accuracy over all iterations we have to keep all the output data and have outputs of type std::vector<std::vector<PaddleTensor>> *. I could modify just the original function, but it would require updates also in several other test files. I thought it would be refactored later, but I will do it here if you wish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to refactor here, thanks very much!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1 , I realized that refactoring would require modification of the latency calculation formula. This could influence some latency statistics of other tests. I am not sure whether these statistics are being gathered after running the tests and whether changing them could break anything beyond the tests. I cannot verify that quickly enough so I left the refactoring for later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why refactoring requires modification of the latency calculation formula?

  • Yours:
  auto latency = elapsed_time / (iterations * num_times * FLAGS_batch_size);
  PrintTime(FLAGS_batch_size, num_times, FLAGS_paddle_num_threads, 0, latency,
            1);
  • Ours:
PrintTime(batch_size, num_times, num_threads, tid, elapsed_time / num_times,
            inputs.size());

Why you don't use ours?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation fps value would be incorrect. It is 1/latency, so latency is related to a single frame, not a batch. The current formula does not handle that correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If current implementation fps value would be incorrect, you can correct it directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Wojciech Uss added 2 commits April 1, 2019 02:05
<< "ms ======";
}
LOG(INFO) << "====== batch_size: " << batch_size << ", iterations: " << epoch
<< ", repetitons: " << repeat << " ======";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repetitions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

LOG(INFO) << "FP32 & INT8 prediction run: batch_size " << FLAGS_batch_size
<< ", warmup batch size " << FLAGS_warmup_batch_size << ".";
PrintConfig(reinterpret_cast<const PaddlePredictor::Config *>(qconfig), true);
PrintConfig(reinterpret_cast<const PaddlePredictor::Config *>(config), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since line502 is the same as line503, how about move line503 after line507?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@luotao1
Copy link
Contributor

luotao1 commented Apr 1, 2019

I test this PR on 6271 CPU with 50000 images generated by danqing.
command

./test_analyzer_int8_resnet50 --infer_model=/dev/shm/.cache/paddle/dataset/int8/download/mobilenetv1_fp32/model/ --infer_data=/dev/shm/.cache/paddle/dataset/int8/download/int8_full_val.bin --test_all_data --gtest_filter=Analyzer_int8_resnet50.quantization 2> mobilenet_v2.log

log:
mobilenet_v2.log


I0401 20:46:35.058284 164022 analysis_predictor.cc:429] == optimize end ==
I0401 20:46:35.059271 164022 tester_helper.h:323] Thread 0, number of threads 1, run 1 times...
I0401 20:57:37.384917 164022 helper.h:273] ====== threads: 1, thread id: 0 ======
I0401 20:57:37.384954 164022 helper.h:275] ====== batch_size: 1, iterations: 50000, repetitons: 1 ======
I0401 20:57:37.384959 164022 helper.h:277] ====== batch latency: 13.2465ms, number of samples: 50000, sample latency: 13.2465ms, fps: 75.4916 ======
I0401 20:57:37.388720 164022 tester_helper.h:508] --- INT8 prediction start ---
I0401 20:57:45.525290 164022 mkldnn_quantizer.cc:393] == optimize 2 end ==
I0401 20:57:45.526273 164022 tester_helper.h:323] Thread 0, number of threads 1, run 1 times...
I0401 21:01:20.693503 164022 helper.h:273] ====== threads: 1, thread id: 0 ======
I0401 21:01:20.693539 164022 helper.h:275] ====== batch_size: 1, iterations: 50000, repetitons: 1 ======
I0401 21:01:20.693543 164022 helper.h:277] ====== batch latency: 4.30334ms, number of samples: 50000, sample latency: 4.30334ms, fps: 232.377 ======
I0401 21:01:20.696801 164022 tester_helper.h:510] --- comparing outputs --- 
I0401 21:01:20.705694 164022 tester_helper.h:456] Avg top1 INT8 accuracy: 0.7036
I0401 21:01:20.705705 164022 tester_helper.h:458] Avg top1 FP32 accuracy: 0.0010
I0401 21:01:20.705708 164022 tester_helper.h:460] Accepted accuracy drop threshold: 0.01
F0401 21:01:20.705729 164022 tester_helper.h:461] Check failed: std::abs(avg_acc1_quant - avg_acc1_ref) <= FLAGS_quantized_accuracy (0.70258 vs. 0.01) 

It seems the thoughput of FP32 is 75 and INT8 is 232? And the accuracy larger than 0.7?

@luotao1
Copy link
Contributor

luotao1 commented Apr 1, 2019

I test this PR in May 29th, the accuracy is correct:
image
You use SetPasses() in the PR.

@luotao1
Copy link
Contributor

luotao1 commented Apr 1, 2019

Besides, could you fix the compiler warning

/Paddle/paddle/fluid/inference/tests/api/tester_helper.h: In function ‘void paddle::inference::PredictionRun(paddle::PaddlePredictor*, const std::vector<std::vector<paddle::PaddleTensor> >&, std::vector<std::vector<paddle::PaddleTensor> >*, int, int)’:
/Paddle/paddle/fluid/inference/tests/api/tester_helper.h:319:62: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   if (FLAGS_iterations > 0 && FLAGS_iterations < inputs.size())
                                                              ^
/Paddle/paddle/fluid/inference/tests/api/tester_helper.h:332:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (size_t i = 0; i < iterations; i++) {
                            ^
/Paddle/paddle/fluid/inference/tests/api/tester_helper.h:339:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (size_t i = 0; i < iterations; i++) {
                            ^
In file included from /Paddle/paddle/fluid/inference/tests/api/analyzer_int8_image_classification_tester.cc:18:0:
/Paddle/paddle/fluid/inference/tests/api/tester_helper.h: In function ‘void paddle::inference::PredictionRun(paddle::PaddlePredictor*, const std::vector<std::vector<paddle::PaddleTensor> >&, std::vector<std::vector<paddle::PaddleTensor> >*, int, int)’:
/Paddle/paddle/fluid/inference/tests/api/tester_helper.h:319:62: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   if (FLAGS_iterations > 0 && FLAGS_iterations < inputs.size())
                                                              ^
/Paddle/paddle/fluid/inference/tests/api/tester_helper.h:332:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (size_t i = 0; i < iterations; i++) {
                            ^
/Paddle/paddle/fluid/inference/tests/api/tester_helper.h:339:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (size_t i = 0; i < iterations; i++) {

@wojtuss
Copy link
Author

wojtuss commented Apr 1, 2019

I test this PR on 6271 CPU with 50000 images generated by danqing.
command

./test_analyzer_int8_resnet50 --infer_model=/dev/shm/.cache/paddle/dataset/int8/download/mobilenetv1_fp32/model/ --infer_data=/dev/shm/.cache/paddle/dataset/int8/download/int8_full_val.bin --test_all_data --gtest_filter=Analyzer_int8_resnet50.quantization 2> mobilenet_v2.log

log:
mobilenet_v2.log


I0401 20:46:35.058284 164022 analysis_predictor.cc:429] == optimize end ==
I0401 20:46:35.059271 164022 tester_helper.h:323] Thread 0, number of threads 1, run 1 times...
I0401 20:57:37.384917 164022 helper.h:273] ====== threads: 1, thread id: 0 ======
I0401 20:57:37.384954 164022 helper.h:275] ====== batch_size: 1, iterations: 50000, repetitons: 1 ======
I0401 20:57:37.384959 164022 helper.h:277] ====== batch latency: 13.2465ms, number of samples: 50000, sample latency: 13.2465ms, fps: 75.4916 ======
I0401 20:57:37.388720 164022 tester_helper.h:508] --- INT8 prediction start ---
I0401 20:57:45.525290 164022 mkldnn_quantizer.cc:393] == optimize 2 end ==
I0401 20:57:45.526273 164022 tester_helper.h:323] Thread 0, number of threads 1, run 1 times...
I0401 21:01:20.693503 164022 helper.h:273] ====== threads: 1, thread id: 0 ======
I0401 21:01:20.693539 164022 helper.h:275] ====== batch_size: 1, iterations: 50000, repetitons: 1 ======
I0401 21:01:20.693543 164022 helper.h:277] ====== batch latency: 4.30334ms, number of samples: 50000, sample latency: 4.30334ms, fps: 232.377 ======
I0401 21:01:20.696801 164022 tester_helper.h:510] --- comparing outputs --- 
I0401 21:01:20.705694 164022 tester_helper.h:456] Avg top1 INT8 accuracy: 0.7036
I0401 21:01:20.705705 164022 tester_helper.h:458] Avg top1 FP32 accuracy: 0.0010
I0401 21:01:20.705708 164022 tester_helper.h:460] Accepted accuracy drop threshold: 0.01
F0401 21:01:20.705729 164022 tester_helper.h:461] Check failed: std::abs(avg_acc1_quant - avg_acc1_ref) <= FLAGS_quantized_accuracy (0.70258 vs. 0.01) 

It seems the thoughput of FP32 is 75 and INT8 is 232? And the accuracy larger than 0.7?

As for accuracy, it is for 100 images only, so it may be larger than 0.7.

By default there is only one iteration of batch size 100, without warmup phase. The latency is not precise then. We assumed the precise latency and accuracy will be measured on the whole dataset, then the warmup is negligible.

As for the accuracy for FP32, that was the problem with the passes I had on Friday. Today I have built it up from scratch and it worked fine.
I have the fix for repeated passes ready and could add the commit with the fix to this PR, or submit as a separate PR. What do you think?

@luotao1
Copy link
Contributor

luotao1 commented Apr 1, 2019

Today I have built it up from scratch and it worked fine.

What is scratch?

@luotao1
Copy link
Contributor

luotao1 commented Apr 1, 2019

I have the fix for repeated passes ready and could add the commit with the fix to this PR, or submit as a separate PR. What do you think?

A separate PR is better.

Besides, the thoughput of FP32 is 75 and INT8 is 232, which is different with V1 in python/paddle/fluid/contrib/int8_inference/README.md

@wojtuss
Copy link
Author

wojtuss commented Apr 1, 2019

"From scratch" means starting from clean directory.

Oh, I have just noticed you have run this on the full dataset.

Yes, with this test we get the following accuracy on the whole dataset:
ResNet50:
Avg top1 INT8 accuracy: 0.7640
Avg top1 FP32 accuracy: 0.7663
MobileNet-v1:
Avg top1 INT8 accuracy: 0.7039
Avg top1 FP32 accuracy: 0.7078

For throughput, you have to set larger batch_size.

@luotao1
Copy link
Contributor

luotao1 commented Apr 1, 2019

"From scratch" means starting from clean directory.

I clean the directory, and must add SetPasses to get the right accuracy.

@wojtuss
Copy link
Author

wojtuss commented Apr 1, 2019

Then I am sending a new PR with the fix for passes. With it, the SetPasses problem should be fixed.

std::vector<std::vector<PaddleTensor>> analysis_outputs;
std::vector<std::vector<PaddleTensor>> quantized_outputs;
LOG(INFO) << "--- FP32 prediction start ---";
TestAnalysisPrediction(config, inputs, &analysis_outputs, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add TestAnalysisPrediction? Could you use TestOneThreadPrediction directly? Then you can remove TestAnalysisPrediction and line502-503

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it seems cleaner this way for both AnalysisConfigs, less pointer casting. I can change it if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change it. Thanks very much!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Author

@wojtuss wojtuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the fix for repeated passes ready and could add the commit with the fix to this PR, or submit as a separate PR. What do you think?

A separate PR is better.

Submitted #16606
Could you please verify that this fixes the issue in the test on your machine?

std::vector<std::vector<PaddleTensor>> analysis_outputs;
std::vector<std::vector<PaddleTensor>> quantized_outputs;
LOG(INFO) << "--- FP32 prediction start ---";
TestAnalysisPrediction(config, inputs, &analysis_outputs, true);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it seems cleaner this way for both AnalysisConfigs, less pointer casting. I can change it if you like.

@wojtuss
Copy link
Author

wojtuss commented Apr 1, 2019

@luotao1 , when comparing latency, keep in mind that here the latency does not include loading data as the inference starts when the whole dataset is loaded into memory

@wojtuss
Copy link
Author

wojtuss commented Apr 1, 2019

@luotao1 , I have added a comment regarding accuracy drop for FP32 inference: #16606 (comment)

@luotao1
Copy link
Contributor

luotao1 commented Apr 2, 2019

The resnet50 result:

I0402 00:12:28.451478 150483 helper.h:277] ====== batch latency: 75.9028ms, number of samples: 50000, sample latency: 75.9028ms, fps: 13.1747 ======
I0402 00:29:30.439829 150483 helper.h:277] ====== batch latency: 20.0641ms, number of samples: 50000, sample latency: 20.0641ms, fps: 49.8403 ======
I0402 00:29:30.444981 150483 tester_helper.h:510] --- comparing outputs ---
I0402 00:29:30.462234 150483 tester_helper.h:456] Avg top1 INT8 accuracy: 0.7648
I0402 00:29:30.462246 150483 tester_helper.h:458] Avg top1 FP32 accuracy: 0.7663
I0402 00:29:30.462249 150483 tester_helper.h:460] Accepted accuracy drop threshold: 0.01

@luotao1
Copy link
Contributor

luotao1 commented Apr 2, 2019

I have added a comment regarding accuracy drop for FP32 inference

I will investigate it. Please solve the conflict and use SetPasses in this PR for merge at first.

@wojtuss
Copy link
Author

wojtuss commented Apr 2, 2019

I have added a comment regarding accuracy drop for FP32 inference

I will investigate it. Please solve the conflict and use SetPasses in this PR for merge at first.

Done.

@luotao1
Copy link
Contributor

luotao1 commented Apr 2, 2019

PR_CI fails. Maybe an error caused by #16584 . You can change like

--- a/paddle/fluid/inference/tests/api/CMakeLists.txt
+++ b/paddle/fluid/inference/tests/api/CMakeLists.txt
@@ -152,20 +152,20 @@ inference_analysis_api_test_with_fake_data(test_analyzer_mobilenet_depthwise_con
 if(WITH_MKLDNN)
   set(INT8_DATA_DIR "${INFERENCE_DEMO_INSTALL_DIR}/int8v2")
   if (NOT EXISTS ${INT8_DATA_DIR})
-    inference_download_and_uncompress(${INT8_DATA_DIR} ${INFERENCE_URL}"/int8" "imagenet_val_100_tail.tar.gz")
+    inference_download_and_uncompress(${INT8_DATA_DIR} "${INFERENCE_URL}/int8" "imagenet_val_100_tail.tar.gz")
   endif()

   #resnet50 int8
   set(INT8_RESNET50_MODEL_DIR "${INT8_DATA_DIR}/resnet50")
   if (NOT EXISTS ${INT8_RESNET50_MODEL_DIR})
-    inference_download_and_uncompress(${INT8_RESNET50_MODEL_DIR} ${INFERENCE_URL}"/int8" "resnet50_int8_model.tar.gz" )
+    inference_download_and_uncompress(${INT8_RESNET50_MODEL_DIR} "${INFERENCE_URL}/int8" "resnet50_int8_model.tar.gz" )
   endif()
   inference_analysis_api_int8_test(test_analyzer_int8_resnet50 ${INT8_RESNET50_MODEL_DIR} ${INT8_DATA_DIR} analyzer_int8_image_classification_tester.cc SERIAL)

   #mobilenet int8
   set(INT8_MOBILENET_MODEL_DIR "${INT8_DATA_DIR}/mobilenet")
   if (NOT EXISTS ${INT8_MOBILENET_MODEL_DIR})
-    inference_download_and_uncompress(${INT8_MOBILENET_MODEL_DIR} ${INFERENCE_URL}"/int8" "mobilenetv1_int8_model.tar.gz" )
+    inference_download_and_uncompress(${INT8_MOBILENET_MODEL_DIR} "${INFERENCE_URL}/int8" "mobilenetv1_int8_model.tar.gz" )
   endif()

@wojtuss
Copy link
Author

wojtuss commented Apr 2, 2019

Yes, that helped. Thank you.
Done.

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit 9b6a029 into PaddlePaddle:develop Apr 2, 2019
luotao1 added a commit that referenced this pull request Apr 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants