Time cost increases

Hi. Thanks for the codes and the detailed instruction.

I implemented sparse convolution into my encoder:
```python
with tf.variable_scope('featureEncoder'):
	auxiShape = (self.inputShape[0], self.inputShape[1], self.inputShape[2], 7)
	featureShape = (self.inputShape[0], self.inputShape[1], self.inputShape[2], 32)
	blockSize = 8
	blockStride = (8,8)
	blockOffset = (0,0)
	blockCount = (self.divup(self.inputShape[1], blockStride[0]), self.divup(self.inputShape[2], blockStride[1]))
	inBlockParams = { "dynamic_bsize": (blockSize, blockSize), "dynamic_boffset": blockOffset, "dynamic_bstride": blockStride }
	outBlockParams = { "dynamic_bsize": (blockSize, blockSize), "dynamic_boffset": blockOffset, "dynamic_bstride": blockStride }
	
	if not self.training:
		indices = sbnet_module.reduce_mask(self.mask, blockCount, tol=0.1, **inBlockParams)
	
		# stack active overlapping tiles to batch dimension
		stack = sbnet_module.sparse_gather(
			auxi, indices.bin_counts, indices.active_block_indices, transpose=False, **inBlockParams)
	else:
		stack = auxi
	# perform dense convolution on a sparse stack of tiles
	stack = self.conv_layer2(stack, 7, 32, name='1')
	stack = tf.nn.leaky_relu(stack)
	stack = self.conv_layer2(stack, 32,32, name='2')
	stack = tf.nn.leaky_relu(stack)
	stack = self.conv_layer2(stack, 32,32, name='3')
	stack = tf.nn.leaky_relu(stack)
	stack = self.conv_layer2(stack, 32,32, name='4')
	stack = tf.nn.leaky_relu(stack)
	stack = self.conv_layer2(stack, 32,32, name='5')
	stack = tf.nn.leaky_relu(stack)

	# write/scatter the tiles back on top of original tensor
	# note that the output tensor is reduced by 1 on each side due to 'VALID' convolution
	if not self.training:
		feature=sbnet_module.sparse_scatter(
			stack, indices.bin_counts, indices.active_block_indices,
			self.lastFeature, transpose=False, add=False, atomic=False, **outBlockParams)
		feature.set_shape(featureShape)
	else:
		feature=stack
```

`self.training` is set `False` when training and `True` when testing. Variable `mask` is generated outside the network and fed in via `tf.placeholder`. So does `self.lastFeature`.

I tried to measure the inference time with timeline:
```python
feed_dict = {model.source: src, model.target: tgt, model.batch_size:src_hdr.shape[0], model.mask:Mask, model.feature:Feature}
denoised_1_bd, Feature = sess.run([model.fake_image, model.feature], feed_dict, options=run_options, run_metadata=run_metadata)
tl = timeline.Timeline(run_metadata.step_stats)
ctf = tl.generate_chrome_trace_format(show_memory=True)
with open(os.path.join(errorlog_dir, 'timeline.json'),'w') as wd:
	wd.write(ctf)
```

![timeline](https://user-images.githubusercontent.com/28486541/127764964-e58c3a77-afdc-49ba-831d-d2d4f2ee0edc.png)

However, I can't find time records of layers under 'featureEncoder'. And there are two bars captioned unknown, the second of which is strangely long. Some Pooling and LeakyRelu‘s time is also strange, costing nearly 2ms.

![unknown](https://user-images.githubusercontent.com/28486541/127765056-52b29203-e8ef-40d5-8809-5bd9684499c5.png)

I wonder how I can get the proper time measurement. Thanks.

**My Environment**
TensorFlow Version: 1.15.0
Operating System: Ubuntu 16.04
Python Version: 3.6.13
CUDA Version: 10.0
CUDNN Version: 7.6.4
GPU Type: RTX 2080ti
Nvidia Driver Version: 460.67

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time cost increases #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time cost increases #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions