-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Description
System Info
transformersversion: 4.20.1- Platform: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.13
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.12.0+cu113 (False)
- Tensorflow version (GPU?): 2.8.2 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?: TF : 2.8.2 colab / 2.4 Kaggle
TPU : v2 and v3
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I tried to launch a script with a simple classification problem but got the error "socket close". I tried with deberta small and base so I doubt it is a memory error. Moreover I tried with Kaggle (TPUv3) and Colab (TPUv2). The same script with a roberta base model works perfectly fine. The length I used was 128.
I created the model using this :
def get_model() -> tf.keras.Model:
backbone = TFAutoModel.from_pretrained(cfg.model_name)
input_ids = tf.keras.layers.Input(
shape=(cfg.max_length,),
dtype=tf.int32,
name="input_ids",
)
attention_mask = tf.keras.layers.Input(
shape=(cfg.max_length,),
dtype=tf.int32,
name="attention_mask",
)
x = backbone({"input_ids": input_ids, "attention_mask": attention_mask})[0]
x = x[:, 0, :] # tf.concat([, feature], axis=1)
outputs = tf.keras.layers.Dense(1, activation="sigmoid", dtype="float32")(x)
return tf.keras.Model(
inputs=[input_ids, attention_mask],
outputs=outputs,
)
It also seems that Embedding is not compatible with bfloat16 :
InvalidArgumentError: Exception encountered when calling layer "embeddings" (type TFDebertaV2Embeddings).
cannot compute Mul as input #1(zero-based) was expected to be a bfloat16 tensor but is a float tensor
https://colab.research.google.com/drive/1T4GGCfYy7lAFrgapOtY0KBXPcnEPeTQz?usp=sharing
Expected behavior
A regular training like training roberta. On GPU, the same script is working and use 3 or 4 GB.