Debertav2 debertav3 TPU : socket closed

### System Info

- `transformers` version: 4.20.1
- Platform: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.13
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.12.0+cu113 (False)
- Tensorflow version (GPU?): 2.8.2 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: TF : 2.8.2 colab / 2.4 Kaggle
TPU : v2 and v3



### Who can help?


@Rocketknight1



### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

I tried to launch a script with a simple classification problem but got the error "socket close". I tried with deberta small and base so I doubt it is a memory error. Moreover I tried with Kaggle (TPUv3) and Colab (TPUv2). The same script with a roberta base model works perfectly fine. The length I used was 128.

I created the model using this : 


```
def get_model() -> tf.keras.Model:
    backbone =  TFAutoModel.from_pretrained(cfg.model_name)
    input_ids = tf.keras.layers.Input(
        shape=(cfg.max_length,),
        dtype=tf.int32,
        name="input_ids",
    )
    attention_mask = tf.keras.layers.Input(
        shape=(cfg.max_length,),
        dtype=tf.int32,
        name="attention_mask",
    )
 
    x = backbone({"input_ids": input_ids, "attention_mask": attention_mask})[0]
    x = x[:, 0, :] # tf.concat([, feature], axis=1)
    outputs = tf.keras.layers.Dense(1, activation="sigmoid", dtype="float32")(x)
    return tf.keras.Model(
        inputs=[input_ids, attention_mask],
        outputs=outputs,
    )
```


It also seems that Embedding is not compatible with bfloat16 : 

> 
> InvalidArgumentError: Exception encountered when calling layer "embeddings" (type TFDebertaV2Embeddings).
> 
> cannot compute Mul as input #1(zero-based) was expected to be a bfloat16 tensor but is a float tensor

https://colab.research.google.com/drive/1T4GGCfYy7lAFrgapOtY0KBXPcnEPeTQz?usp=sharing


### Expected behavior

A regular training like training roberta. On GPU, the same script is working and use 3 or 4 GB. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Debertav2 debertav3 TPU : socket closed #18276

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Debertav2 debertav3 TPU : socket closed #18276

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions