Skip to content

Time cost increases #28

@IwakuraRein

Description

@IwakuraRein

Hi. Thanks for the codes and the detailed instruction.

I implemented sparse convolution into my encoder:

with tf.variable_scope('featureEncoder'):
	auxiShape = (self.inputShape[0], self.inputShape[1], self.inputShape[2], 7)
	featureShape = (self.inputShape[0], self.inputShape[1], self.inputShape[2], 32)
	blockSize = 8
	blockStride = (8,8)
	blockOffset = (0,0)
	blockCount = (self.divup(self.inputShape[1], blockStride[0]), self.divup(self.inputShape[2], blockStride[1]))
	inBlockParams = { "dynamic_bsize": (blockSize, blockSize), "dynamic_boffset": blockOffset, "dynamic_bstride": blockStride }
	outBlockParams = { "dynamic_bsize": (blockSize, blockSize), "dynamic_boffset": blockOffset, "dynamic_bstride": blockStride }
	
	if not self.training:
		indices = sbnet_module.reduce_mask(self.mask, blockCount, tol=0.1, **inBlockParams)
	
		# stack active overlapping tiles to batch dimension
		stack = sbnet_module.sparse_gather(
			auxi, indices.bin_counts, indices.active_block_indices, transpose=False, **inBlockParams)
	else:
		stack = auxi
	# perform dense convolution on a sparse stack of tiles
	stack = self.conv_layer2(stack, 7, 32, name='1')
	stack = tf.nn.leaky_relu(stack)
	stack = self.conv_layer2(stack, 32,32, name='2')
	stack = tf.nn.leaky_relu(stack)
	stack = self.conv_layer2(stack, 32,32, name='3')
	stack = tf.nn.leaky_relu(stack)
	stack = self.conv_layer2(stack, 32,32, name='4')
	stack = tf.nn.leaky_relu(stack)
	stack = self.conv_layer2(stack, 32,32, name='5')
	stack = tf.nn.leaky_relu(stack)

	# write/scatter the tiles back on top of original tensor
	# note that the output tensor is reduced by 1 on each side due to 'VALID' convolution
	if not self.training:
		feature=sbnet_module.sparse_scatter(
			stack, indices.bin_counts, indices.active_block_indices,
			self.lastFeature, transpose=False, add=False, atomic=False, **outBlockParams)
		feature.set_shape(featureShape)
	else:
		feature=stack

self.training is set False when training and True when testing. Variable mask is generated outside the network and fed in via tf.placeholder. So does self.lastFeature.

I tried to measure the inference time with timeline:

feed_dict = {model.source: src, model.target: tgt, model.batch_size:src_hdr.shape[0], model.mask:Mask, model.feature:Feature}
denoised_1_bd, Feature = sess.run([model.fake_image, model.feature], feed_dict, options=run_options, run_metadata=run_metadata)
tl = timeline.Timeline(run_metadata.step_stats)
ctf = tl.generate_chrome_trace_format(show_memory=True)
with open(os.path.join(errorlog_dir, 'timeline.json'),'w') as wd:
	wd.write(ctf)

timeline

However, I can't find time records of layers under 'featureEncoder'. And there are two bars captioned unknown, the second of which is strangely long. Some Pooling and LeakyRelu‘s time is also strange, costing nearly 2ms.

unknown

I wonder how I can get the proper time measurement. Thanks.

My Environment
TensorFlow Version: 1.15.0
Operating System: Ubuntu 16.04
Python Version: 3.6.13
CUDA Version: 10.0
CUDNN Version: 7.6.4
GPU Type: RTX 2080ti
Nvidia Driver Version: 460.67

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions