-
Notifications
You must be signed in to change notification settings - Fork 260
Description
Describe the feature or idea you want to propose
whilst assessing #3232 ran into another possible speed up in GPU whilst talking to the oracle.
The real overhead is here. This line is the big one:
_output_convolution = np.squeeze(_output_convolution.numpy(), axis=-1)numpy() forces synchronisation and copies the convolution output from device to host (GPU to CPU) every time. Then you do NumPy work (+= bias) on CPU.
Then you call TensorFlow again in _get_ppv / _get_max, which means TF has to wrap/convert the NumPy array back into a tensor (often on CPU, sometimes with another host to device copy depending on placement).
That back-and-forth can easily dominate, and it will happen n_kernels * n_batches times.
What I’d change (minimal, high impact)
Keep everything as TF tensors until you have the 2 features, then convert that small 2-column array to NumPy once.
Describe your proposed solution
import tensorflow as tf
class RocketGPU:
@staticmethod
def _get_ppv(x):
return tf.reduce_mean(tf.cast(x > 0, tf.float32), axis=1)
@staticmethod
def _get_max(x):
return tf.reduce_max(x, axis=1)
def _transform(self, X, y=None):
tf.random.set_seed(self.random_state)
X = X.transpose(0, 2, 1)
X_tf = tf.convert_to_tensor(X) # do this once
batch_indices_list = self._generate_batch_indices(n=len(X))
output_features = []
for f in range(self.n_kernels):
output_features_filter = []
kernel = self._list_of_kernels[f]
dilation = self._list_of_dilations[f]
padding = self._list_of_paddings[f]
bias = tf.convert_to_tensor(self._list_of_biases[f], dtype=X_tf.dtype)
for batch_indices in batch_indices_list:
idx = tf.convert_to_tensor(batch_indices, dtype=tf.int32)
xb = tf.gather(X_tf, idx, axis=0)
conv = tf.nn.conv1d(
input=xb,
stride=1,
filters=kernel,
dilations=dilation,
padding=padding,
)
conv = tf.squeeze(conv, axis=-1) + bias
ppv = self._get_ppv(conv)
mx = self._get_max(conv)
feats = tf.stack([ppv, mx], axis=1).numpy() # tiny copy back
output_features_filter.append(feats)
output_features.append(
np.expand_dims(np.concatenate(output_features_filter, axis=0), axis=0)
)
output_rocket = np.concatenate(output_features, axis=0).swapaxes(0, 1)
output_rocket = output_rocket.reshape(
(output_rocket.shape[0], output_rocket.shape[1] * output_rocket.shape[2])
)
return output_rocketDescribe alternatives you've considered, if relevant
No response
Additional context
No response