I was runing the token classification example her : https://github.com/huggingface/transformers/blob/master/examples/token-classification/run.sh with flaubert-large-cased as model name.
I tried to use the downloaded model from https://huggingface.co/flaubert models, but after 100 epochs the results are very bad, the model did not learn any thing.
I Don't understand what is the problem with flaubert-large-cased model.
Note that flaubert-base-cased model give good results on NER taks.
Do you have an idea how can fix this problem please ?