Skip to content

When were the most recent publications of pre-training data included? #116

@kaisugi

Description

@kaisugi

I know that SciBERT is pre-trained by the Semantic Scholar corpus. I also know that the Semantic Scholar corpus is not publicly available.

I am wondering how many new papers are included in the pre-training data. For example, are papers from ACL 2018 included?
The Semantic Scholar Corpus paper was published in 2018 or so, so I'm guessing that's right around the borderline between having a paper...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions