-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Adding Microsoft CodeXGlue Datasets #2357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
lhoestq
merged 63 commits into
huggingface:master
from
ncoop57:microsoft-codexglue-code-to-code-trans
Jun 8, 2021
Merged
Changes from 23 commits
Commits
Show all changes
63 commits
Select commit
Hold shift + click to select a range
02800cf
Microsoft Code X Glue datasets.
madlag 3529e93
Fix in READMEs.
madlag b2595ea
Changing language type to "code"
madlag 56792f9
Changing the dataset to original (=not in the datasets repository)
madlag 9d843c0
Remove template script
33514d3
Revert changes to dummy_data.py script
80dbab6
Remove additional readme template
b1082dd
Add contribution subsection to readmes
bc2e755
Fix camel case
29c03d3
Update desc and cites to use global vars
7284a61
Fix typos
5afdbe0
Merge branch 'huggingface:master' into microsoft-codexglue-code-to-co…
8f8d7e2
Remove extra lines and update contributions
c37ae3a
Fix typos and camel case
5392aa7
Fix styling
67afd5f
Merge branch 'microsoft-codexglue-code-to-code-trans' of https://gith…
22d2da5
Update datasets/code_x_glue_cc_clone_detection_poj_104/generated_defi…
a6cca2f
Add encodings to all open calls
f720467
Convert clone detection poj dataset to use yield instead of writing t…
f451cc1
Fix styling
38e4d76
Remove marker file being written
4f2e277
Fix styling
20c96c6
Merge branch 'huggingface:master' into microsoft-codexglue-code-to-co…
18104c4
Update datasets/code_x_glue_cc_clone_detection_big_clone_bench/README.md
55149c0
Update datasets/code_x_glue_cc_clone_detection_poj_104/README.md
a78cba1
Update datasets/code_x_glue_cc_clone_detection_poj_104/README.md
b4a130c
Update datasets/code_x_glue_cc_cloze_testing_all/README.md
2b0e821
Update datasets/code_x_glue_cc_cloze_testing_maxmin/README.md
f86a919
Update datasets/code_x_glue_tc_text_to_code/README.md
cc4bc03
Update datasets/code_x_glue_tc_text_to_code/README.md
b87aed6
Update datasets/code_x_glue_ct_code_to_text/README.md
96187d7
Update datasets/code_x_glue_ct_code_to_text/README.md
61c34ee
Update datasets/code_x_glue_tt_text_to_text/README.md
2fc0beb
Merge remote-tracking branch 'origin/master' into microsoft-codexglue…
8246dae
Add new TOC outline
3d77085
Fill in new README sections for big clone bench
29e91e9
Fill in new README sections for POJ
d1c5214
Fill in new README sections for the Cloze Test benchmarks
8454b03
Remove extra square bracket
9678ecd
Fill in new README sections for the Code Completion benchmarks
60628a9
Fill in new README sections for the Code Refinement dataset
60a618a
Fill in new README sections for the Code Translation dataset
13194f1
Fill in new README sections for the Code Defect Detection dataset
2fe2177
Change lang tag to code
32d952f
Fill in new README sections for the Code Docstring Generation dataset
b6efa8a
Update task tag
b6a38e7
Fill in new README sections for the Code Search dataset
67826ad
Update task tags
ef41c23
Fill in new README sections for the Code Generation dataset
70ba739
Fill in new README sections for the Code Documentation Translation da…
1920c9c
Fix heading format, update task tags, and add missing spaces for cont…
27c1c93
Rename sections to proper names
2825aaf
Add additional source data subsubsections and add new data on source …
3437146
Add missing subsubsections of Annotations
4186048
Fix missing source_data yaml tag
95fa8ee
Update source_data tag to valid one
5e10c18
Fix additional information subsection heading
8d91fb1
Fix heading format
905ce59
Fix language tag with code
98ed142
Hopefully fix codec issue
1805d08
Moving code_x_glue_cc_clone_detection_poj_104 to code_x_glue_cc_clone…
madlag 020899b
Merge branch 'huggingface:master' into microsoft-codexglue-code-to-co…
b95a4a9
Fix headings and remove special chars
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
116 changes: 116 additions & 0 deletions
116
datasets/code_x_glue_cc_clone_detection_big_clone_bench/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| --- | ||
| annotations_creators: | ||
| - found | ||
| language_creators: | ||
| - found | ||
| languages: | ||
| - code | ||
| licenses: | ||
| - other-C-UDA | ||
| multilinguality: | ||
| - monolingual | ||
| size_categories: | ||
| - n>1M | ||
| source_datasets: | ||
| - original | ||
| task_categories: | ||
| - text-classification | ||
| task_ids: | ||
| - semantic-similarity-classification | ||
| --- | ||
| # Dataset Card for "code_x_glue_cc_clone_detection_big_clone_bench" | ||
|
|
||
| ## Table of Contents | ||
| - [Dataset Description](#dataset-description) | ||
| - [Dataset Summary](#dataset-summary) | ||
| - [Dataset Structure](#dataset-structure) | ||
| - [Data Instances](#data-instances) | ||
| - [Data Fields](#data-fields) | ||
| - [Data Splits](#data-splits) | ||
| - [Additional Information](#additional-information) | ||
| - [Dataset Curators](#dataset-curators) | ||
| - [Licensing Information](#licensing-information) | ||
| - [Citation Information](#citation-information) | ||
|
|
||
| ## [Dataset Description](#dataset-description) | ||
|
|
||
| - **Homepage:** https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Clone-detection-BigCloneBench | ||
|
|
||
| ### [Dataset Summary](#dataset-summary) | ||
|
|
||
| CodeXGLUE Clone-detection-BigCloneBench dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Clone-detection-BigCloneBench | ||
|
|
||
| Given two codes as the input, the task is to do binary classification (0/1), where 1 stands for semantic equivalence and 0 for others. Models are evaluated by F1 score. | ||
| The dataset we use is BigCloneBench and filtered following the paper Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree. | ||
|
|
||
| ## [Dataset Structure](#dataset-structure) | ||
|
|
||
| ### [Data Instances](#data-instances) | ||
|
|
||
| An example of 'test' looks as follows. | ||
| ``` | ||
| { | ||
| "func1": " @Test(expected = GadgetException.class)\n public void malformedGadgetSpecIsCachedAndThrows() throws Exception {\n HttpRequest request = createCacheableRequest();\n expect(pipeline.execute(request)).andReturn(new HttpResponse(\"malformed junk\")).once();\n replay(pipeline);\n try {\n specFactory.getGadgetSpec(createContext(SPEC_URL, false));\n fail(\"No exception thrown on bad parse\");\n } catch (GadgetException e) {\n }\n specFactory.getGadgetSpec(createContext(SPEC_URL, false));\n }\n", | ||
| "func2": " public InputStream getInputStream() throws TGBrowserException {\n try {\n if (!this.isFolder()) {\n URL url = new URL(this.url);\n InputStream stream = url.openStream();\n return stream;\n }\n } catch (Throwable throwable) {\n throw new TGBrowserException(throwable);\n }\n return null;\n }\n", | ||
| "id": 0, | ||
| "id1": 2381663, | ||
| "id2": 4458076, | ||
| "label": false | ||
| } | ||
| ``` | ||
|
|
||
| ### [Data Fields](#data-fields) | ||
|
|
||
| In the following each data field in go is explained for each config. The data fields are the same among all splits. | ||
|
|
||
| #### default | ||
|
|
||
| |field name| type | description | | ||
| |----------|------|---------------------------------------------------| | ||
| |id |int32 | Index of the sample | | ||
| |id1 |int32 | The first function id | | ||
| |id2 |int32 | The second function id | | ||
| |func1 |string| The full text of the first function | | ||
| |func2 |string| The full text of the second function | | ||
| |label |bool | 1 is the functions are not equivalent, 0 otherwise| | ||
|
|
||
| ### [Data Splits](#data-splits) | ||
|
|
||
| | name |train |validation| test | | ||
| |-------|-----:|---------:|-----:| | ||
| |default|901028| 415416|415416| | ||
|
|
||
| ## [Additional Information](#additional-information) | ||
|
|
||
| ### [Dataset Curators](#dataset-curators) | ||
|
|
||
| https://github.com/microsoft, https://github.com/madlag | ||
|
|
||
| ### [Licensing Information](#licensing-information) | ||
|
|
||
| Computational Use of Data Agreement (C-UDA) License. | ||
|
|
||
| ### [Citation Information](#citation-information) | ||
|
|
||
| ``` | ||
| @inproceedings{svajlenko2014towards, | ||
| title={Towards a big data curated benchmark of inter-project code clones}, | ||
| author={Svajlenko, Jeffrey and Islam, Judith F and Keivanloo, Iman and Roy, Chanchal K and Mia, Mohammad Mamun}, | ||
| booktitle={2014 IEEE International Conference on Software Maintenance and Evolution}, | ||
| pages={476--480}, | ||
| year={2014}, | ||
| organization={IEEE} | ||
| } | ||
|
|
||
| @inproceedings{wang2020detecting, | ||
| title={Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree}, | ||
| author={Wang, Wenhan and Li, Ge and Ma, Bo and Xia, Xin and Jin, Zhi}, | ||
| booktitle={2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER)}, | ||
| pages={261--271}, | ||
| year={2020}, | ||
| organization={IEEE} | ||
| } | ||
| ``` | ||
|
|
||
| ### Contributions | ||
| Thanks to @madlag (and partly also @ncoop57) for adding this dataset. | ||
95 changes: 95 additions & 0 deletions
95
...glue_cc_clone_detection_big_clone_bench/code_x_glue_cc_clone_detection_big_clone_bench.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| from typing import List | ||
|
|
||
| import datasets | ||
|
|
||
| from .common import TrainValidTestChild | ||
| from .generated_definitions import DEFINITIONS | ||
|
|
||
|
|
||
| _DESCRIPTION = """Given two codes as the input, the task is to do binary classification (0/1), where 1 stands for semantic equivalence and 0 for others. Models are evaluated by F1 score. | ||
| The dataset we use is BigCloneBench and filtered following the paper Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree.""" | ||
|
|
||
| _CITATION = """@inproceedings{svajlenko2014towards, | ||
| title={Towards a big data curated benchmark of inter-project code clones}, | ||
| author={Svajlenko, Jeffrey and Islam, Judith F and Keivanloo, Iman and Roy, Chanchal K and Mia, Mohammad Mamun}, | ||
| booktitle={2014 IEEE International Conference on Software Maintenance and Evolution}, | ||
| pages={476--480}, | ||
| year={2014}, | ||
| organization={IEEE} | ||
| } | ||
|
|
||
| @inproceedings{wang2020detecting, | ||
| title={Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree}, | ||
| author={Wang, Wenhan and Li, Ge and Ma, Bo and Xia, Xin and Jin, Zhi}, | ||
| booktitle={2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER)}, | ||
| pages={261--271}, | ||
| year={2020}, | ||
| organization={IEEE} | ||
| }""" | ||
|
|
||
|
|
||
| class CodeXGlueCcCloneDetectionBigCloneBench(TrainValidTestChild): | ||
| _DESCRIPTION = _DESCRIPTION | ||
| _CITATION = _CITATION | ||
|
|
||
| _FEATURES = { | ||
| "id": datasets.Value("int32"), # Index of the sample | ||
| "id1": datasets.Value("int32"), # The first function id | ||
| "id2": datasets.Value("int32"), # The second function id | ||
| "func1": datasets.Value("string"), # The full text of the first function | ||
| "func2": datasets.Value("string"), # The full text of the second function | ||
| "label": datasets.Value("bool"), # 1 is the functions are not equivalent, 0 otherwise | ||
| } | ||
|
|
||
| _SUPERVISED_KEYS = ["label"] | ||
|
|
||
| def generate_urls(self, split_name): | ||
| yield "index", f"{split_name}.txt" | ||
| yield "data", "data.jsonl" | ||
|
|
||
| def _generate_examples(self, split_name, file_paths): | ||
| import json | ||
|
|
||
| js_all = {} | ||
|
|
||
| with open(file_paths["data"], encoding="utf-8") as f: | ||
| for idx, line in enumerate(f): | ||
| entry = json.loads(line) | ||
| js_all[int(entry["idx"])] = entry["func"] | ||
|
|
||
| with open(file_paths["index"], encoding="utf-8") as f: | ||
| for idx, line in enumerate(f): | ||
| line = line.strip() | ||
| idx1, idx2, label = [int(i) for i in line.split("\t")] | ||
| func1 = js_all[idx1] | ||
| func2 = js_all[idx2] | ||
|
|
||
| yield idx, dict(id=idx, id1=idx1, id2=idx2, func1=func1, func2=func2, label=(label == 1)) | ||
|
|
||
|
|
||
| CLASS_MAPPING = { | ||
| "CodeXGlueCcCloneDetectionBigCloneBench": CodeXGlueCcCloneDetectionBigCloneBench, | ||
| } | ||
|
|
||
|
|
||
| class CodeXGlueCcCloneDetectionBigCloneBenchMain(datasets.GeneratorBasedBuilder): | ||
| BUILDER_CONFIG_CLASS = datasets.BuilderConfig | ||
| BUILDER_CONFIGS = [ | ||
| datasets.BuilderConfig(name=name, description=info["description"]) for name, info in DEFINITIONS.items() | ||
| ] | ||
|
|
||
| def _info(self): | ||
| name = self.config.name | ||
| info = DEFINITIONS[name] | ||
| if info["class_name"] in CLASS_MAPPING: | ||
| self.child = CLASS_MAPPING[info["class_name"]](info) | ||
| else: | ||
| raise RuntimeError(f"Unknown python class for dataset configuration {name}") | ||
| ret = self.child._info() | ||
| return ret | ||
|
|
||
| def _split_generators(self, dl_manager: datasets.DownloadManager) -> List[datasets.SplitGenerator]: | ||
| return self.child._split_generators(dl_manager=dl_manager) | ||
|
|
||
| def _generate_examples(self, split_name, file_paths): | ||
| return self.child._generate_examples(split_name, file_paths) |
75 changes: 75 additions & 0 deletions
75
datasets/code_x_glue_cc_clone_detection_big_clone_bench/common.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| from typing import List | ||
|
|
||
| import datasets | ||
|
|
||
|
|
||
| # Citation, taken from https://github.com/microsoft/CodeXGLUE | ||
| _DEFAULT_CITATION = """@article{CodeXGLUE, | ||
| title={CodeXGLUE: A Benchmark Dataset and Open Challenge for Code Intelligence}, | ||
| year={2020},}""" | ||
|
|
||
|
|
||
| class Child: | ||
| _DESCRIPTION = None | ||
| _FEATURES = None | ||
| _CITATION = None | ||
| SPLITS = {"train": datasets.Split.TRAIN} | ||
| _SUPERVISED_KEYS = None | ||
|
|
||
| def __init__(self, info): | ||
| self.info = info | ||
|
|
||
| def homepage(self): | ||
| return self.info["project_url"] | ||
|
|
||
| def _info(self): | ||
| # This is the description that will appear on the datasets page. | ||
| return datasets.DatasetInfo( | ||
| description=self.info["description"] + "\n\n" + self._DESCRIPTION, | ||
| features=datasets.Features(self._FEATURES), | ||
| homepage=self.homepage(), | ||
| citation=self._CITATION or _DEFAULT_CITATION, | ||
| supervised_keys=self._SUPERVISED_KEYS, | ||
| ) | ||
|
|
||
| def _split_generators(self, dl_manager: datasets.DownloadManager) -> List[datasets.SplitGenerator]: | ||
| SPLITS = self.SPLITS | ||
| _URL = self.info["raw_url"] | ||
| urls_to_download = {} | ||
| for split in SPLITS: | ||
| if split not in urls_to_download: | ||
| urls_to_download[split] = {} | ||
|
|
||
| for key, url in self.generate_urls(split): | ||
| if not url.startswith("http"): | ||
| url = _URL + "/" + url | ||
| urls_to_download[split][key] = url | ||
|
|
||
| downloaded_files = {} | ||
| for k, v in urls_to_download.items(): | ||
| downloaded_files[k] = dl_manager.download_and_extract(v) | ||
|
|
||
| return [ | ||
| datasets.SplitGenerator( | ||
| name=SPLITS[k], | ||
| gen_kwargs={"split_name": k, "file_paths": downloaded_files[k]}, | ||
| ) | ||
| for k in SPLITS | ||
| ] | ||
|
|
||
| def check_empty(self, entries): | ||
| all_empty = all([v == "" for v in entries.values()]) | ||
| all_non_empty = all([v != "" for v in entries.values()]) | ||
|
|
||
| if not all_non_empty and not all_empty: | ||
| raise RuntimeError("Parallel data files should have the same number of lines.") | ||
|
|
||
| return all_empty | ||
|
|
||
|
|
||
| class TrainValidTestChild(Child): | ||
| SPLITS = { | ||
| "train": datasets.Split.TRAIN, | ||
| "valid": datasets.Split.VALIDATION, | ||
| "test": datasets.Split.TEST, | ||
| } |
1 change: 1 addition & 0 deletions
1
datasets/code_x_glue_cc_clone_detection_big_clone_bench/dataset_infos.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| {"default": {"description": "CodeXGLUE Clone-detection-BigCloneBench dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Clone-detection-BigCloneBench\n\nGiven two codes as the input, the task is to do binary classification (0/1), where 1 stands for semantic equivalence and 0 for others. Models are evaluated by F1 score.\nThe dataset we use is BigCloneBench and filtered following the paper Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree.", "citation": "@inproceedings{svajlenko2014towards,\n title={Towards a big data curated benchmark of inter-project code clones},\n author={Svajlenko, Jeffrey and Islam, Judith F and Keivanloo, Iman and Roy, Chanchal K and Mia, Mohammad Mamun},\n booktitle={2014 IEEE International Conference on Software Maintenance and Evolution},\n pages={476--480},\n year={2014},\n organization={IEEE}\n}\n\n@inproceedings{wang2020detecting,\n title={Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree},\n author={Wang, Wenhan and Li, Ge and Ma, Bo and Xia, Xin and Jin, Zhi},\n booktitle={2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER)},\n pages={261--271},\n year={2020},\n organization={IEEE}\n}", "homepage": "https://github.com/madlag/CodeXGLUE/tree/main/Code-Code/Clone-detection-BigCloneBench", "license": "", "features": {"id": {"dtype": "int32", "id": null, "_type": "Value"}, "id1": {"dtype": "int32", "id": null, "_type": "Value"}, "id2": {"dtype": "int32", "id": null, "_type": "Value"}, "func1": {"dtype": "string", "id": null, "_type": "Value"}, "func2": {"dtype": "string", "id": null, "_type": "Value"}, "label": {"dtype": "bool", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": {"input": "label", "output": ""}, "builder_name": "code_x_glue_cc_clone_detection_big_clone_bench_main", "config_name": "default", "version": {"version_str": "0.0.0", "description": null, "major": 0, "minor": 0, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 2888035757, "num_examples": 901028, "dataset_name": "code_x_glue_cc_clone_detection_big_clone_bench_main"}, "validation": {"name": "validation", "num_bytes": 1371399694, "num_examples": 415416, "dataset_name": "code_x_glue_cc_clone_detection_big_clone_bench_main"}, "test": {"name": "test", "num_bytes": 1220662901, "num_examples": 415416, "dataset_name": "code_x_glue_cc_clone_detection_big_clone_bench_main"}}, "download_checksums": {"https://raw.githubusercontent.com/madlag/CodeXGLUE/main/Code-Code/Clone-detection-BigCloneBench/dataset/train.txt": {"num_bytes": 17043552, "checksum": "29119bfa94673374249c3424809fbe6baaa1f0e87a13e3c727bbd6cdf1224b77"}, "https://raw.githubusercontent.com/madlag/CodeXGLUE/main/Code-Code/Clone-detection-BigCloneBench/dataset/data.jsonl": {"num_bytes": 15174797, "checksum": "d8bc51e62deddcc45bd26c5b57f5add2a2cf377f13b9f6c2fb656fbc8fca4dd2"}, "https://raw.githubusercontent.com/madlag/CodeXGLUE/main/Code-Code/Clone-detection-BigCloneBench/dataset/valid.txt": {"num_bytes": 7861019, "checksum": "e59e8c1321df59b6ab0143165cb603030c55800c00e2d782e06810517b8de1e4"}, "https://raw.githubusercontent.com/madlag/CodeXGLUE/main/Code-Code/Clone-detection-BigCloneBench/dataset/test.txt": {"num_bytes": 7876506, "checksum": "a6c0cf79be34e582fdc64007aa894ed094e4f9ff2e5395a8d2b5c39eeef2737a"}}, "download_size": 47955874, "post_processing_size": null, "dataset_size": 5480098352, "size_in_bytes": 5528054226}} |
Binary file added
BIN
+4 KB
datasets/code_x_glue_cc_clone_detection_big_clone_bench/dummy/default/0.0.0/dummy_data.zip
Binary file not shown.
12 changes: 12 additions & 0 deletions
12
datasets/code_x_glue_cc_clone_detection_big_clone_bench/generated_definitions.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| DEFINITIONS = { | ||
| "default": { | ||
| "class_name": "CodeXGlueCcCloneDetectionBigCloneBench", | ||
| "dataset_type": "Code-Code", | ||
| "description": "CodeXGLUE Clone-detection-BigCloneBench dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Clone-detection-BigCloneBench", | ||
| "dir_name": "Clone-detection-BigCloneBench", | ||
| "name": "default", | ||
| "project_url": "https://github.com/madlag/CodeXGLUE/tree/main/Code-Code/Clone-detection-BigCloneBench", | ||
| "raw_url": "https://raw.githubusercontent.com/madlag/CodeXGLUE/main/Code-Code/Clone-detection-BigCloneBench/dataset", | ||
| "sizes": {"test": 415416, "train": 901028, "validation": 415416}, | ||
| } | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.