Got 'Blas xGEMMBatched launch failed' using BERT + BiLSTM

**You must follow the issue template and provide as much information as possible. otherwise, this issue will be closed.
请按照 issue 模板要求填写信息。如果没有按照 issue 模板填写，将会忽略并关闭这个 issue**

## Check List
Thanks for considering to open an issue. Before you submit your issue, please confirm these boxes are checked.

**You can post pictures, but if specific text or code is required to reproduce the issue, please provide the text in a plain text format for easy copy/paste.**

- [Y] I have searched in [existing issues](https://github.com/BrikerMan/Kashgari/issues?utf8=%E2%9C%93&q=is%3Aissue+) but did not find the same one.
- [Y ] I have read the [documents](https://kashgari.bmio.net)

## Environment

- Debian 11
- Python3.6.8
- requirements.txt:

```txt
cudatoolkit               10.0.130                      0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cudnn                     7.6.5                cuda10.0_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
kashgari                  1.1.5                    pypi_0    pypi
keras                     2.3.1                    pypi_0    pypi
keras-applications        1.0.8                      py_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
keras-bert                0.89.0                   pypi_0    pypi
keras-embed-sim           0.10.0                   pypi_0    pypi
keras-gpt-2               0.17.0                   pypi_0    pypi
keras-layer-normalization 0.16.0                   pypi_0    pypi
keras-multi-head          0.29.0                   pypi_0    pypi
keras-pos-embd            0.13.0                   pypi_0    pypi
keras-position-wise-feed-forward 0.8.0                    pypi_0    pypi
keras-preprocessing       1.1.2              pyhd3eb1b0_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
keras-self-attention      0.51.0                   pypi_0    pypi
keras-transformer         0.40.0                   pypi_0    pypi
numpy                     1.16.4                   pypi_0    pypi
numpy-base                1.19.2           py36hfa32c7d_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorboard               1.14.0           py36hf484d3e_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
tensorflow                1.14.0          gpu_py36h57aa796_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorflow-addons         0.9.1                    pypi_0    pypi
tensorflow-estimator      1.14.0                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorflow-gpu            1.14.0                   pypi_0    pypi

```

And also **nvidia-smi**

```txt

Tue Jan 10 11:08:55 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   39C    P8    19W / 220W |    568MiB /  7979MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1288      G   /usr/lib/xorg/Xorg                223MiB |
|    0   N/A  N/A      1402      G   /usr/bin/gnome-shell               71MiB |
|    0   N/A  N/A      1698      G   ...b/firefox-esr/firefox-esr      171MiB |
|    0   N/A  N/A      1969      G   ...b/firefox-esr/firefox-esr        3MiB |
|    0   N/A  N/A      2217      G   ...RendererForSitePerProcess       90MiB |
|    0   N/A  N/A      9821      G   ...b/firefox-esr/firefox-esr        3MiB |
+-----------------------------------------------------------------------------+

```

My model:

```Python
import pandas as pd
import kashgari
from kashgari.embeddings import BERTEmbedding
from kashgari.tasks.classification import BiLSTM_Model
import numpy
import os

BERT_PATH = r'/chinese_L-12_H-768_A-12'


# 初始化 Embeddings
embed = BERTEmbedding(BERT_PATH,
                     task=kashgari.CLASSIFICATION,
                     sequence_length=64, layer_nums=4)

tokenizer = embed.tokenizer

df = pd.read_excel('data.xlsx')
# 进行分词处理
df['cutted'] = df['review'].apply(lambda x: tokenizer.tokenize(x))
df["label"] = df['label'].astype("str")

# 准备训练测试数据集
train_x = list(df['cutted'][:int(len(df)*0.7)])
train_y = list(df['label'][:int(len(df)*0.7)])

valid_x = list(df['cutted'][int(len(df)*0.7):int(len(df)*0.85)])
valid_y = list(df['label'][int(len(df)*0.7):int(len(df)*0.85)])

test_x = list(df['cutted'][int(len(df)*0.85):])
test_y = list(df['label'][int(len(df)*0.85):])


# 使用 embedding 初始化模型
model = BiLSTM_Model(embed)

# 先只训练一轮
model.fit(train_x, train_y, valid_x, valid_y, batch_size=12, epochs=1)

model.evaluate(test_x, test_y, batch_size=12)


```

## Question

Alter training the model, i got some errors:

```txt
Traceback (most recent call last):
  File "train_model.py", line 41, in <module>
    model.fit(train_x, train_y, valid_x, valid_y, batch_size=12, epochs=1)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/kashgari/tasks/base_model.py", line 321, in fit
    **fit_kwargs)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
    steps_name='steps_per_epoch')
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 264, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1175, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas xGEMMBatched launch failed : a.shape=[144,64,64], b.shape=[144,64,64], m=64, n=64, k=64, batch_size=144
         [[{{node Encoder-1-MultiHeadSelfAttention/Encoder-1-MultiHeadSelfAttention-Attention/MatMul}}]]
         [[metrics/acc/Identity/_1711]]
  (1) Internal: Blas xGEMMBatched launch failed : a.shape=[144,64,64], b.shape=[144,64,64], m=64, n=64, k=64, batch_size=144
         [[{{node Encoder-1-MultiHeadSelfAttention/Encoder-1-MultiHeadSelfAttention-Attention/MatMul}}]]
0 successful operations.
0 derived errors ignored.

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Got 'Blas xGEMMBatched launch failed' using BERT + BiLSTM #497

Check List

Environment

Question

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Got 'Blas xGEMMBatched launch failed' using BERT + BiLSTM #497

Description

Check List

Environment

Question

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions