-
Notifications
You must be signed in to change notification settings - Fork 232
Open
Description
Hi,
I want to pretrain SciBERT using additional data, and I want to enlarge the vocabulary with 100 additional "domain-specific" terms which are reserved for such usage.
So I've figured out a way to extract a list of terms from my data.
Let's supposed I have the following most relevant terms:
"polymer"
"materials"
"chemistry"
"polymers"
what should I do with the terms such as polymer and polymers? Include them both? or keep the singular only?
Does anybody have information or recommendation on this?
Metadata
Metadata
Assignees
Labels
No labels