Skip to content

Commit cfae8ac

Browse files
Fix language tags in dataset cards (#4826)
1 parent dd4a28d commit cfae8ac

File tree

7 files changed

+792
-39
lines changed

7 files changed

+792
-39
lines changed

datasets/arsentd_lev/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ language_creators:
55
- found
66
language:
77
- apc
8-
- apj
8+
- ajp
99
license:
1010
- other
1111
multilinguality:

datasets/bible_para/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,8 @@ language:
5050
- id
5151
- is
5252
- it
53+
- ja
5354
- jak
54-
- jap
5555
- jiv
5656
- kab
5757
- kbh

datasets/open_subtitles/README.md

Lines changed: 63 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,6 @@ language:
6161
- uk
6262
- ur
6363
- vi
64-
- ze
6564
- zh
6665
language_bcp47:
6766
- pt-BR
@@ -141,7 +140,69 @@ E.g.
141140

142141
### Languages
143142

144-
[More Information Needed]
143+
The languages in the dataset are:
144+
- af
145+
- ar
146+
- bg
147+
- bn
148+
- br
149+
- bs
150+
- ca
151+
- cs
152+
- da
153+
- de
154+
- el
155+
- en
156+
- eo
157+
- es
158+
- et
159+
- eu
160+
- fa
161+
- fi
162+
- fr
163+
- gl
164+
- he
165+
- hi
166+
- hr
167+
- hu
168+
- hy
169+
- id
170+
- is
171+
- it
172+
- ja
173+
- ka
174+
- kk
175+
- ko
176+
- lt
177+
- lv
178+
- mk
179+
- ml
180+
- ms
181+
- nl
182+
- no
183+
- pl
184+
- pt
185+
- pt_br: Portuguese (Brazil) (pt-BR)
186+
- ro
187+
- ru
188+
- si
189+
- sk
190+
- sl
191+
- sq
192+
- sr
193+
- sv
194+
- ta
195+
- te
196+
- th
197+
- tl
198+
- tr
199+
- uk
200+
- ur
201+
- vi
202+
- ze_en: English constituent of Bilingual Chinese-English (subtitles displaying two languages at once, one per line)
203+
- ze_zh: Chinese constituent of Bilingual Chinese-English (subtitles displaying two languages at once, one per line)
204+
- zh_cn: Simplified Chinese (zh-CN, `zh-Hans`)
205+
- zh_tw: Traditional Chinese (zh-TW, `zh-Hant`)
145206

146207
## Dataset Structure
147208

0 commit comments

Comments
 (0)