Fix authentication issues #6127

albertvillanova · 2023-08-07T15:41:25Z

This PR fixes 3 authentication issues:

Fix authentication when passing token.
Fix authentication in Audio.decode_example and Image.decode_example.
Fix authentication to resolve data_files in repositories without script.

This PR also fixes our CI so that we properly test when passing token and we do not use the token stored in HfFolder.

Fix #6126.

Details

Fix authentication when passing `token`

See c0a77dc

The root issue was caused when the token was set in an already instantiated DownloadConfig and thus not propagated to self._storage_options:

download_config.token = token

As this usage pattern is very common, the fix consists in overriding DownloadConfig.__setattr__.

This fixes authentication issues in the following functions:

load_dataset and load_dataset_builder
Dataset.push_to_hub and Dataset.push_to_hub
inspect.get_dataset_config_info, inspect.get_dataset_infos and inspect.get_dataset_split_names

Fix authentication in `Audio.decode_example` and `Image.decode_example`.

See: 58e62af

The token was not set because the repo_id was wrongly tried to be parsed from an HTTP URL ("http://..."), instead of an HFFileSystem URL ("hf://")

Fix authentication to resolve `data_files` in repositories without script

See: e4684fc

This is fixed by passing download_config to the function create_builder_configs_from_metadata_configs

HuggingFaceDocBuilderDev · 2023-08-07T15:49:10Z

The documentation is not available anymore as the PR was closed or merged.

github-actions · 2023-08-07T15:49:17Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006103 / 0.011353 (-0.005250)	0.003588 / 0.011008 (-0.007420)	0.080335 / 0.038508 (0.041827)	0.059634 / 0.023109 (0.036525)	0.356093 / 0.275898 (0.080195)	0.407376 / 0.323480 (0.083896)	0.005343 / 0.007986 (-0.002643)	0.002928 / 0.004328 (-0.001400)	0.062580 / 0.004250 (0.058330)	0.047544 / 0.037052 (0.010491)	0.364305 / 0.258489 (0.105816)	0.421463 / 0.293841 (0.127623)	0.027249 / 0.128546 (-0.101298)	0.008010 / 0.075646 (-0.067636)	0.262543 / 0.419271 (-0.156728)	0.044978 / 0.043533 (0.001445)	0.339344 / 0.255139 (0.084205)	0.395288 / 0.283200 (0.112088)	0.021425 / 0.141683 (-0.120258)	1.439767 / 1.452155 (-0.012387)	1.498081 / 1.492716 (0.005365)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.196976 / 0.018006 (0.178970)	0.435383 / 0.000490 (0.434893)	0.004559 / 0.000200 (0.004359)	0.000071 / 0.000054 (0.000016)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.023653 / 0.037411 (-0.013759)	0.072944 / 0.014526 (0.058418)	0.083651 / 0.176557 (-0.092906)	0.144590 / 0.737135 (-0.592545)	0.084844 / 0.296338 (-0.211494)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.398752 / 0.215209 (0.183543)	3.959539 / 2.077655 (1.881884)	1.935277 / 1.504120 (0.431157)	1.751994 / 1.541195 (0.210799)	1.828386 / 1.468490 (0.359896)	0.500492 / 4.584777 (-4.084284)	3.086630 / 3.745712 (-0.659082)	2.851664 / 5.269862 (-2.418198)	1.869792 / 4.565676 (-2.695885)	0.058509 / 0.424275 (-0.365766)	0.006500 / 0.007607 (-0.001107)	0.467468 / 0.226044 (0.241424)	4.686168 / 2.268929 (2.417240)	2.427632 / 55.444624 (-53.016993)	2.193194 / 6.876477 (-4.683283)	2.408574 / 2.142072 (0.266501)	0.592173 / 4.805227 (-4.213054)	0.125381 / 6.500664 (-6.375283)	0.060679 / 0.075469 (-0.014790)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.236066 / 1.841788 (-0.605722)	18.591689 / 8.074308 (10.517381)	14.138774 / 10.191392 (3.947382)	0.147455 / 0.680424 (-0.532968)	0.016921 / 0.534201 (-0.517280)	0.328129 / 0.579283 (-0.251154)	0.348872 / 0.434364 (-0.085491)	0.380311 / 0.540337 (-0.160026)	0.532901 / 1.386936 (-0.854035)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.005914 / 0.011353 (-0.005438)	0.003614 / 0.011008 (-0.007394)	0.062857 / 0.038508 (0.024349)	0.060633 / 0.023109 (0.037524)	0.419684 / 0.275898 (0.143786)	0.449025 / 0.323480 (0.125546)	0.004595 / 0.007986 (-0.003391)	0.002861 / 0.004328 (-0.001467)	0.063253 / 0.004250 (0.059003)	0.048770 / 0.037052 (0.011718)	0.419838 / 0.258489 (0.161349)	0.465183 / 0.293841 (0.171342)	0.027350 / 0.128546 (-0.101196)	0.008065 / 0.075646 (-0.067582)	0.068321 / 0.419271 (-0.350950)	0.041083 / 0.043533 (-0.002449)	0.400831 / 0.255139 (0.145692)	0.449286 / 0.283200 (0.166086)	0.020472 / 0.141683 (-0.121210)	1.437215 / 1.452155 (-0.014940)	1.503679 / 1.492716 (0.010963)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.230764 / 0.018006 (0.212758)	0.420774 / 0.000490 (0.420285)	0.004012 / 0.000200 (0.003812)	0.000069 / 0.000054 (0.000014)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.026009 / 0.037411 (-0.011402)	0.077943 / 0.014526 (0.063417)	0.087281 / 0.176557 (-0.089276)	0.139422 / 0.737135 (-0.597713)	0.089090 / 0.296338 (-0.207248)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.417298 / 0.215209 (0.202088)	4.152303 / 2.077655 (2.074648)	2.179996 / 1.504120 (0.675877)	2.020619 / 1.541195 (0.479424)	2.085241 / 1.468490 (0.616751)	0.501111 / 4.584777 (-4.083666)	3.079849 / 3.745712 (-0.665863)	2.820607 / 5.269862 (-2.449255)	1.863988 / 4.565676 (-2.701688)	0.057662 / 0.424275 (-0.366613)	0.006778 / 0.007607 (-0.000830)	0.498661 / 0.226044 (0.272616)	4.986503 / 2.268929 (2.717574)	2.620676 / 55.444624 (-52.823949)	2.297546 / 6.876477 (-4.578931)	2.458148 / 2.142072 (0.316075)	0.599490 / 4.805227 (-4.205738)	0.125102 / 6.500664 (-6.375562)	0.061411 / 0.075469 (-0.014059)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.323816 / 1.841788 (-0.517971)	18.462614 / 8.074308 (10.388306)	13.845826 / 10.191392 (3.654434)	0.146115 / 0.680424 (-0.534309)	0.016862 / 0.534201 (-0.517339)	0.335449 / 0.579283 (-0.243834)	0.343792 / 0.434364 (-0.090572)	0.394068 / 0.540337 (-0.146269)	0.536378 / 1.386936 (-0.850558)

github-actions · 2023-08-07T15:52:18Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006825 / 0.011353 (-0.004527)	0.004005 / 0.011008 (-0.007003)	0.085504 / 0.038508 (0.046996)	0.077252 / 0.023109 (0.054143)	0.351891 / 0.275898 (0.075993)	0.383404 / 0.323480 (0.059924)	0.004153 / 0.007986 (-0.003833)	0.003344 / 0.004328 (-0.000985)	0.064936 / 0.004250 (0.060685)	0.057653 / 0.037052 (0.020601)	0.368155 / 0.258489 (0.109666)	0.406122 / 0.293841 (0.112282)	0.032049 / 0.128546 (-0.096497)	0.008698 / 0.075646 (-0.066949)	0.292394 / 0.419271 (-0.126878)	0.053634 / 0.043533 (0.010101)	0.358273 / 0.255139 (0.103134)	0.378441 / 0.283200 (0.095242)	0.026928 / 0.141683 (-0.114755)	1.458718 / 1.452155 (0.006563)	1.536231 / 1.492716 (0.043515)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.213956 / 0.018006 (0.195950)	0.458620 / 0.000490 (0.458130)	0.002718 / 0.000200 (0.002519)	0.000078 / 0.000054 (0.000023)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.027870 / 0.037411 (-0.009541)	0.083922 / 0.014526 (0.069396)	0.152056 / 0.176557 (-0.024501)	0.151584 / 0.737135 (-0.585552)	0.095698 / 0.296338 (-0.200641)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.407762 / 0.215209 (0.192553)	4.074324 / 2.077655 (1.996669)	2.089929 / 1.504120 (0.585809)	1.920024 / 1.541195 (0.378829)	2.013410 / 1.468490 (0.544920)	0.486056 / 4.584777 (-4.098721)	3.656869 / 3.745712 (-0.088843)	3.304008 / 5.269862 (-1.965854)	2.074363 / 4.565676 (-2.491313)	0.057293 / 0.424275 (-0.366982)	0.007240 / 0.007607 (-0.000367)	0.482696 / 0.226044 (0.256652)	4.833251 / 2.268929 (2.564322)	2.570391 / 55.444624 (-52.874233)	2.220619 / 6.876477 (-4.655857)	2.426316 / 2.142072 (0.284243)	0.584811 / 4.805227 (-4.220416)	0.134907 / 6.500664 (-6.365757)	0.061115 / 0.075469 (-0.014354)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.251969 / 1.841788 (-0.589818)	19.601611 / 8.074308 (11.527303)	14.190217 / 10.191392 (3.998825)	0.166296 / 0.680424 (-0.514128)	0.018334 / 0.534201 (-0.515867)	0.395172 / 0.579283 (-0.184111)	0.410440 / 0.434364 (-0.023924)	0.462263 / 0.540337 (-0.078074)	0.645504 / 1.386936 (-0.741432)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006991 / 0.011353 (-0.004362)	0.004084 / 0.011008 (-0.006924)	0.065208 / 0.038508 (0.026700)	0.077809 / 0.023109 (0.054699)	0.386472 / 0.275898 (0.110574)	0.418686 / 0.323480 (0.095206)	0.005346 / 0.007986 (-0.002640)	0.003416 / 0.004328 (-0.000912)	0.066209 / 0.004250 (0.061958)	0.057517 / 0.037052 (0.020465)	0.407684 / 0.258489 (0.149195)	0.425438 / 0.293841 (0.131597)	0.032166 / 0.128546 (-0.096380)	0.008662 / 0.075646 (-0.066985)	0.071712 / 0.419271 (-0.347560)	0.049764 / 0.043533 (0.006231)	0.394882 / 0.255139 (0.139743)	0.403589 / 0.283200 (0.120389)	0.023688 / 0.141683 (-0.117995)	1.468488 / 1.452155 (0.016334)	1.533118 / 1.492716 (0.040401)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.252949 / 0.018006 (0.234943)	0.447355 / 0.000490 (0.446865)	0.011721 / 0.000200 (0.011521)	0.000107 / 0.000054 (0.000052)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.031444 / 0.037411 (-0.005968)	0.089390 / 0.014526 (0.074864)	0.100103 / 0.176557 (-0.076454)	0.153301 / 0.737135 (-0.583835)	0.101336 / 0.296338 (-0.195003)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.408574 / 0.215209 (0.193365)	4.073135 / 2.077655 (1.995480)	2.086550 / 1.504120 (0.582430)	1.930651 / 1.541195 (0.389457)	2.013548 / 1.468490 (0.545058)	0.477235 / 4.584777 (-4.107542)	3.547545 / 3.745712 (-0.198167)	3.321957 / 5.269862 (-1.947905)	2.057705 / 4.565676 (-2.507971)	0.056730 / 0.424275 (-0.367545)	0.007882 / 0.007607 (0.000275)	0.487297 / 0.226044 (0.261253)	4.874184 / 2.268929 (2.605255)	2.631129 / 55.444624 (-52.813496)	2.235755 / 6.876477 (-4.640722)	2.463329 / 2.142072 (0.321257)	0.578308 / 4.805227 (-4.226919)	0.132726 / 6.500664 (-6.367938)	0.064883 / 0.075469 (-0.010586)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.347564 / 1.841788 (-0.494223)	20.192973 / 8.074308 (12.118665)	14.563553 / 10.191392 (4.372161)	0.168244 / 0.680424 (-0.512180)	0.018638 / 0.534201 (-0.515563)	0.394789 / 0.579283 (-0.184494)	0.419677 / 0.434364 (-0.014687)	0.480274 / 0.540337 (-0.060063)	0.641204 / 1.386936 (-0.745732)

github-actions · 2023-08-07T17:16:15Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.005939 / 0.011353 (-0.005413)	0.003457 / 0.011008 (-0.007551)	0.079985 / 0.038508 (0.041477)	0.056492 / 0.023109 (0.033383)	0.312356 / 0.275898 (0.036458)	0.354038 / 0.323480 (0.030558)	0.004551 / 0.007986 (-0.003435)	0.002828 / 0.004328 (-0.001501)	0.062369 / 0.004250 (0.058119)	0.044712 / 0.037052 (0.007660)	0.318244 / 0.258489 (0.059755)	0.361977 / 0.293841 (0.068136)	0.026460 / 0.128546 (-0.102086)	0.007928 / 0.075646 (-0.067719)	0.261378 / 0.419271 (-0.157894)	0.044209 / 0.043533 (0.000676)	0.313931 / 0.255139 (0.058792)	0.339553 / 0.283200 (0.056354)	0.019776 / 0.141683 (-0.121907)	1.443126 / 1.452155 (-0.009029)	1.508149 / 1.492716 (0.015432)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.183801 / 0.018006 (0.165795)	0.427967 / 0.000490 (0.427477)	0.002028 / 0.000200 (0.001828)	0.000062 / 0.000054 (0.000007)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.023697 / 0.037411 (-0.013715)	0.072128 / 0.014526 (0.057602)	0.083701 / 0.176557 (-0.092855)	0.142821 / 0.737135 (-0.594315)	0.082276 / 0.296338 (-0.214063)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.434427 / 0.215209 (0.219218)	4.325962 / 2.077655 (2.248308)	2.277115 / 1.504120 (0.772995)	2.093736 / 1.541195 (0.552541)	2.127984 / 1.468490 (0.659494)	0.502336 / 4.584777 (-4.082441)	3.023243 / 3.745712 (-0.722469)	2.805154 / 5.269862 (-2.464708)	1.821273 / 4.565676 (-2.744403)	0.057480 / 0.424275 (-0.366795)	0.006365 / 0.007607 (-0.001242)	0.508258 / 0.226044 (0.282213)	5.087950 / 2.268929 (2.819022)	2.705029 / 55.444624 (-52.739596)	2.378392 / 6.876477 (-4.498085)	2.515380 / 2.142072 (0.373307)	0.589283 / 4.805227 (-4.215944)	0.125719 / 6.500664 (-6.374945)	0.061074 / 0.075469 (-0.014395)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.221895 / 1.841788 (-0.619893)	18.025917 / 8.074308 (9.951609)	13.556901 / 10.191392 (3.365509)	0.142614 / 0.680424 (-0.537809)	0.016731 / 0.534201 (-0.517469)	0.328374 / 0.579283 (-0.250910)	0.342553 / 0.434364 (-0.091811)	0.374502 / 0.540337 (-0.165836)	0.534173 / 1.386936 (-0.852763)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.005817 / 0.011353 (-0.005536)	0.003500 / 0.011008 (-0.007509)	0.062240 / 0.038508 (0.023732)	0.058128 / 0.023109 (0.035019)	0.424014 / 0.275898 (0.148116)	0.468453 / 0.323480 (0.144973)	0.004641 / 0.007986 (-0.003345)	0.002821 / 0.004328 (-0.001508)	0.062180 / 0.004250 (0.057930)	0.047578 / 0.037052 (0.010526)	0.427367 / 0.258489 (0.168878)	0.467889 / 0.293841 (0.174048)	0.027144 / 0.128546 (-0.101403)	0.007969 / 0.075646 (-0.067678)	0.067764 / 0.419271 (-0.351508)	0.040719 / 0.043533 (-0.002814)	0.423663 / 0.255139 (0.168524)	0.458556 / 0.283200 (0.175356)	0.019196 / 0.141683 (-0.122487)	1.471546 / 1.452155 (0.019392)	1.547541 / 1.492716 (0.054825)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.228777 / 0.018006 (0.210770)	0.406663 / 0.000490 (0.406173)	0.003688 / 0.000200 (0.003488)	0.000075 / 0.000054 (0.000021)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.025494 / 0.037411 (-0.011917)	0.076339 / 0.014526 (0.061814)	0.084233 / 0.176557 (-0.092324)	0.136995 / 0.737135 (-0.600140)	0.085443 / 0.296338 (-0.210895)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.420441 / 0.215209 (0.205232)	4.187018 / 2.077655 (2.109363)	2.142139 / 1.504120 (0.638019)	1.974530 / 1.541195 (0.433335)	2.027321 / 1.468490 (0.558831)	0.498116 / 4.584777 (-4.086661)	2.988514 / 3.745712 (-0.757198)	2.782046 / 5.269862 (-2.487816)	1.821725 / 4.565676 (-2.743951)	0.057711 / 0.424275 (-0.366564)	0.006664 / 0.007607 (-0.000944)	0.491015 / 0.226044 (0.264971)	4.921037 / 2.268929 (2.652108)	2.574964 / 55.444624 (-52.869661)	2.251703 / 6.876477 (-4.624774)	2.361154 / 2.142072 (0.219082)	0.593362 / 4.805227 (-4.211865)	0.126107 / 6.500664 (-6.374557)	0.061840 / 0.075469 (-0.013630)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.327459 / 1.841788 (-0.514328)	18.062960 / 8.074308 (9.988652)	13.669253 / 10.191392 (3.477861)	0.130719 / 0.680424 (-0.549705)	0.016564 / 0.534201 (-0.517637)	0.335821 / 0.579283 (-0.243462)	0.341691 / 0.434364 (-0.092673)	0.392651 / 0.540337 (-0.147686)	0.529650 / 1.386936 (-0.857286)

github-actions · 2023-08-07T17:41:27Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.009625 / 0.011353 (-0.001728)	0.005354 / 0.011008 (-0.005654)	0.114350 / 0.038508 (0.075842)	0.086637 / 0.023109 (0.063528)	0.465381 / 0.275898 (0.189483)	0.490411 / 0.323480 (0.166931)	0.006575 / 0.007986 (-0.001411)	0.004287 / 0.004328 (-0.000041)	0.093134 / 0.004250 (0.088884)	0.060209 / 0.037052 (0.023156)	0.459570 / 0.258489 (0.201080)	0.523320 / 0.293841 (0.229479)	0.047943 / 0.128546 (-0.080603)	0.014764 / 0.075646 (-0.060882)	0.383887 / 0.419271 (-0.035384)	0.069864 / 0.043533 (0.026331)	0.469122 / 0.255139 (0.213983)	0.509953 / 0.283200 (0.226753)	0.037800 / 0.141683 (-0.103883)	1.877589 / 1.452155 (0.425434)	2.014913 / 1.492716 (0.522197)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.309146 / 0.018006 (0.291140)	0.644390 / 0.000490 (0.643900)	0.005017 / 0.000200 (0.004817)	0.000102 / 0.000054 (0.000048)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.032964 / 0.037411 (-0.004447)	0.103236 / 0.014526 (0.088711)	0.119950 / 0.176557 (-0.056607)	0.207674 / 0.737135 (-0.529461)	0.117278 / 0.296338 (-0.179060)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.605464 / 0.215209 (0.390255)	6.027805 / 2.077655 (3.950150)	2.719725 / 1.504120 (1.215605)	2.262752 / 1.541195 (0.721558)	2.330310 / 1.468490 (0.861820)	0.862537 / 4.584777 (-3.722240)	5.347080 / 3.745712 (1.601368)	4.792170 / 5.269862 (-0.477691)	3.103694 / 4.565676 (-1.461983)	0.103646 / 0.424275 (-0.320629)	0.009411 / 0.007607 (0.001804)	0.743052 / 0.226044 (0.517008)	7.289684 / 2.268929 (5.020755)	3.436530 / 55.444624 (-52.008094)	2.722440 / 6.876477 (-4.154036)	2.952380 / 2.142072 (0.810308)	1.047688 / 4.805227 (-3.757539)	0.212724 / 6.500664 (-6.287940)	0.081473 / 0.075469 (0.006004)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.714437 / 1.841788 (-0.127351)	24.384330 / 8.074308 (16.310022)	22.444162 / 10.191392 (12.252770)	0.226264 / 0.680424 (-0.454160)	0.030530 / 0.534201 (-0.503671)	0.473999 / 0.579283 (-0.105284)	0.575005 / 0.434364 (0.140641)	0.542789 / 0.540337 (0.002451)	0.776079 / 1.386936 (-0.610857)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.009366 / 0.011353 (-0.001987)	0.005239 / 0.011008 (-0.005769)	0.085116 / 0.038508 (0.046608)	0.089600 / 0.023109 (0.066491)	0.485778 / 0.275898 (0.209880)	0.540054 / 0.323480 (0.216574)	0.006290 / 0.007986 (-0.001695)	0.004054 / 0.004328 (-0.000274)	0.083535 / 0.004250 (0.079284)	0.067200 / 0.037052 (0.030148)	0.519520 / 0.258489 (0.261031)	0.544049 / 0.293841 (0.250208)	0.054300 / 0.128546 (-0.074246)	0.013650 / 0.075646 (-0.061996)	0.102515 / 0.419271 (-0.316757)	0.063054 / 0.043533 (0.019522)	0.491724 / 0.255139 (0.236585)	0.547498 / 0.283200 (0.264298)	0.039266 / 0.141683 (-0.102416)	1.801226 / 1.452155 (0.349071)	1.861778 / 1.492716 (0.369061)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.313009 / 0.018006 (0.295003)	0.587695 / 0.000490 (0.587205)	0.004972 / 0.000200 (0.004772)	0.000110 / 0.000054 (0.000055)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.029230 / 0.037411 (-0.008181)	0.091154 / 0.014526 (0.076628)	0.110505 / 0.176557 (-0.066052)	0.164204 / 0.737135 (-0.572932)	0.107812 / 0.296338 (-0.188526)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.610535 / 0.215209 (0.395326)	6.162517 / 2.077655 (4.084862)	2.866718 / 1.504120 (1.362598)	2.542412 / 1.541195 (1.001218)	2.584136 / 1.468490 (1.115645)	0.874319 / 4.584777 (-3.710458)	5.257184 / 3.745712 (1.511472)	4.705840 / 5.269862 (-0.564022)	2.971708 / 4.565676 (-1.593969)	0.099026 / 0.424275 (-0.325249)	0.009142 / 0.007607 (0.001535)	0.728660 / 0.226044 (0.502615)	7.560922 / 2.268929 (5.291994)	3.439521 / 55.444624 (-52.005103)	2.854730 / 6.876477 (-4.021746)	3.088951 / 2.142072 (0.946879)	0.973621 / 4.805227 (-3.831606)	0.209792 / 6.500664 (-6.290872)	0.081107 / 0.075469 (0.005638)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.716809 / 1.841788 (-0.124978)	24.386927 / 8.074308 (16.312619)	20.715524 / 10.191392 (10.524131)	0.260831 / 0.680424 (-0.419592)	0.030701 / 0.534201 (-0.503500)	0.490018 / 0.579283 (-0.089265)	0.590424 / 0.434364 (0.156060)	0.589942 / 0.540337 (0.049604)	0.798094 / 1.386936 (-0.588842)

github-actions · 2023-08-08T06:57:20Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006592 / 0.011353 (-0.004761)	0.003880 / 0.011008 (-0.007128)	0.083761 / 0.038508 (0.045253)	0.075966 / 0.023109 (0.052857)	0.315291 / 0.275898 (0.039393)	0.355920 / 0.323480 (0.032440)	0.004972 / 0.007986 (-0.003014)	0.003053 / 0.004328 (-0.001275)	0.063553 / 0.004250 (0.059302)	0.050794 / 0.037052 (0.013742)	0.317681 / 0.258489 (0.059192)	0.361991 / 0.293841 (0.068150)	0.028119 / 0.128546 (-0.100427)	0.008203 / 0.075646 (-0.067443)	0.271756 / 0.419271 (-0.147516)	0.046701 / 0.043533 (0.003168)	0.316520 / 0.255139 (0.061381)	0.350499 / 0.283200 (0.067300)	0.022399 / 0.141683 (-0.119284)	1.416017 / 1.452155 (-0.036138)	1.503087 / 1.492716 (0.010371)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.208250 / 0.018006 (0.190244)	0.470345 / 0.000490 (0.469856)	0.003687 / 0.000200 (0.003487)	0.000073 / 0.000054 (0.000019)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.026163 / 0.037411 (-0.011248)	0.083315 / 0.014526 (0.068789)	0.088541 / 0.176557 (-0.088015)	0.150078 / 0.737135 (-0.587057)	0.088862 / 0.296338 (-0.207476)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.404911 / 0.215209 (0.189702)	4.059257 / 2.077655 (1.981602)	1.890987 / 1.504120 (0.386867)	1.726608 / 1.541195 (0.185413)	1.767479 / 1.468490 (0.298989)	0.518826 / 4.584777 (-4.065951)	3.212145 / 3.745712 (-0.533567)	3.029933 / 5.269862 (-2.239929)	2.000203 / 4.565676 (-2.565474)	0.059631 / 0.424275 (-0.364644)	0.006707 / 0.007607 (-0.000900)	0.485741 / 0.226044 (0.259697)	4.871938 / 2.268929 (2.603010)	2.418856 / 55.444624 (-53.025769)	2.084847 / 6.876477 (-4.791630)	2.207992 / 2.142072 (0.065920)	0.614354 / 4.805227 (-4.190873)	0.128932 / 6.500664 (-6.371732)	0.062342 / 0.075469 (-0.013127)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.325792 / 1.841788 (-0.515995)	19.718995 / 8.074308 (11.644687)	15.278535 / 10.191392 (5.087143)	0.146719 / 0.680424 (-0.533705)	0.017718 / 0.534201 (-0.516483)	0.335709 / 0.579283 (-0.243574)	0.378060 / 0.434364 (-0.056304)	0.391135 / 0.540337 (-0.149202)	0.548045 / 1.386936 (-0.838891)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006504 / 0.011353 (-0.004849)	0.003742 / 0.011008 (-0.007266)	0.064405 / 0.038508 (0.025897)	0.077618 / 0.023109 (0.054509)	0.365325 / 0.275898 (0.089427)	0.408109 / 0.323480 (0.084629)	0.004909 / 0.007986 (-0.003076)	0.002972 / 0.004328 (-0.001356)	0.063933 / 0.004250 (0.059682)	0.052916 / 0.037052 (0.015863)	0.370891 / 0.258489 (0.112402)	0.412134 / 0.293841 (0.118293)	0.028171 / 0.128546 (-0.100375)	0.008150 / 0.075646 (-0.067497)	0.069248 / 0.419271 (-0.350024)	0.042353 / 0.043533 (-0.001180)	0.368117 / 0.255139 (0.112978)	0.397548 / 0.283200 (0.114348)	0.022967 / 0.141683 (-0.118716)	1.472740 / 1.452155 (0.020586)	1.524028 / 1.492716 (0.031311)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.256854 / 0.018006 (0.238848)	0.471499 / 0.000490 (0.471009)	0.009609 / 0.000200 (0.009409)	0.000109 / 0.000054 (0.000054)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.027978 / 0.037411 (-0.009433)	0.086741 / 0.014526 (0.072215)	0.091189 / 0.176557 (-0.085368)	0.146117 / 0.737135 (-0.591018)	0.092358 / 0.296338 (-0.203980)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.426356 / 0.215209 (0.211147)	4.263782 / 2.077655 (2.186127)	2.178198 / 1.504120 (0.674078)	2.015405 / 1.541195 (0.474211)	2.055966 / 1.468490 (0.587476)	0.507531 / 4.584777 (-4.077246)	3.175967 / 3.745712 (-0.569745)	3.055697 / 5.269862 (-2.214165)	1.987663 / 4.565676 (-2.578014)	0.058452 / 0.424275 (-0.365823)	0.006944 / 0.007607 (-0.000663)	0.502534 / 0.226044 (0.276489)	5.024693 / 2.268929 (2.755765)	2.754971 / 55.444624 (-52.689653)	2.470845 / 6.876477 (-4.405632)	2.698675 / 2.142072 (0.556602)	0.602357 / 4.805227 (-4.202871)	0.129490 / 6.500664 (-6.371174)	0.065127 / 0.075469 (-0.010342)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.398487 / 1.841788 (-0.443301)	19.692279 / 8.074308 (11.617971)	15.124064 / 10.191392 (4.932672)	0.148938 / 0.680424 (-0.531486)	0.017418 / 0.534201 (-0.516783)	0.340480 / 0.579283 (-0.238803)	0.377223 / 0.434364 (-0.057141)	0.405303 / 0.540337 (-0.135034)	0.548923 / 1.386936 (-0.838013)

github-actions · 2023-08-08T13:29:26Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006433 / 0.011353 (-0.004920)	0.004002 / 0.011008 (-0.007006)	0.084130 / 0.038508 (0.045622)	0.070628 / 0.023109 (0.047519)	0.312372 / 0.275898 (0.036474)	0.343993 / 0.323480 (0.020513)	0.003936 / 0.007986 (-0.004050)	0.003336 / 0.004328 (-0.000993)	0.064715 / 0.004250 (0.060465)	0.052511 / 0.037052 (0.015458)	0.314092 / 0.258489 (0.055603)	0.363152 / 0.293841 (0.069311)	0.030898 / 0.128546 (-0.097648)	0.008396 / 0.075646 (-0.067250)	0.288083 / 0.419271 (-0.131188)	0.051654 / 0.043533 (0.008122)	0.315252 / 0.255139 (0.060113)	0.346756 / 0.283200 (0.063556)	0.025167 / 0.141683 (-0.116515)	1.487265 / 1.452155 (0.035110)	1.557528 / 1.492716 (0.064812)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.206517 / 0.018006 (0.188510)	0.458359 / 0.000490 (0.457869)	0.003719 / 0.000200 (0.003519)	0.000070 / 0.000054 (0.000016)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.029631 / 0.037411 (-0.007780)	0.083856 / 0.014526 (0.069330)	0.340431 / 0.176557 (0.163875)	0.153864 / 0.737135 (-0.583271)	0.095951 / 0.296338 (-0.200388)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.379182 / 0.215209 (0.163973)	3.783396 / 2.077655 (1.705741)	1.835932 / 1.504120 (0.331813)	1.667563 / 1.541195 (0.126369)	1.739309 / 1.468490 (0.270818)	0.478957 / 4.584777 (-4.105820)	3.521974 / 3.745712 (-0.223738)	3.237635 / 5.269862 (-2.032227)	2.000300 / 4.565676 (-2.565377)	0.056389 / 0.424275 (-0.367887)	0.007242 / 0.007607 (-0.000365)	0.452642 / 0.226044 (0.226598)	4.524339 / 2.268929 (2.255411)	2.346210 / 55.444624 (-53.098414)	1.957196 / 6.876477 (-4.919281)	2.180051 / 2.142072 (0.037979)	0.570205 / 4.805227 (-4.235022)	0.131346 / 6.500664 (-6.369318)	0.059327 / 0.075469 (-0.016142)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.244709 / 1.841788 (-0.597079)	19.566277 / 8.074308 (11.491969)	14.172598 / 10.191392 (3.981206)	0.166493 / 0.680424 (-0.513931)	0.018281 / 0.534201 (-0.515920)	0.391608 / 0.579283 (-0.187675)	0.402642 / 0.434364 (-0.031722)	0.464974 / 0.540337 (-0.075364)	0.637565 / 1.386936 (-0.749371)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006929 / 0.011353 (-0.004424)	0.004114 / 0.011008 (-0.006894)	0.064589 / 0.038508 (0.026081)	0.083334 / 0.023109 (0.060225)	0.391280 / 0.275898 (0.115382)	0.426157 / 0.323480 (0.102678)	0.005336 / 0.007986 (-0.002650)	0.003395 / 0.004328 (-0.000934)	0.064560 / 0.004250 (0.060310)	0.057094 / 0.037052 (0.020042)	0.398959 / 0.258489 (0.140470)	0.432470 / 0.293841 (0.138629)	0.031412 / 0.128546 (-0.097134)	0.008670 / 0.075646 (-0.066976)	0.071249 / 0.419271 (-0.348022)	0.048934 / 0.043533 (0.005401)	0.384207 / 0.255139 (0.129068)	0.407992 / 0.283200 (0.124792)	0.024492 / 0.141683 (-0.117191)	1.467788 / 1.452155 (0.015634)	1.541011 / 1.492716 (0.048295)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.279607 / 0.018006 (0.261600)	0.448899 / 0.000490 (0.448410)	0.020990 / 0.000200 (0.020790)	0.000132 / 0.000054 (0.000078)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.030313 / 0.037411 (-0.007099)	0.089209 / 0.014526 (0.074684)	0.101024 / 0.176557 (-0.075532)	0.153468 / 0.737135 (-0.583667)	0.103219 / 0.296338 (-0.193120)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.429176 / 0.215209 (0.213967)	4.302234 / 2.077655 (2.224580)	2.291103 / 1.504120 (0.786983)	2.126257 / 1.541195 (0.585062)	2.207090 / 1.468490 (0.738600)	0.484643 / 4.584777 (-4.100134)	3.557429 / 3.745712 (-0.188283)	3.253804 / 5.269862 (-2.016058)	2.026087 / 4.565676 (-2.539589)	0.057793 / 0.424275 (-0.366482)	0.007761 / 0.007607 (0.000154)	0.504819 / 0.226044 (0.278775)	5.046868 / 2.268929 (2.777940)	2.773149 / 55.444624 (-52.671475)	2.398036 / 6.876477 (-4.478440)	2.608094 / 2.142072 (0.466021)	0.630499 / 4.805227 (-4.174729)	0.135496 / 6.500664 (-6.365168)	0.061329 / 0.075469 (-0.014140)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.327124 / 1.841788 (-0.514664)	19.889796 / 8.074308 (11.815488)	14.196100 / 10.191392 (4.004708)	0.161963 / 0.680424 (-0.518461)	0.018529 / 0.534201 (-0.515672)	0.392325 / 0.579283 (-0.186958)	0.404836 / 0.434364 (-0.029528)	0.475898 / 0.540337 (-0.064439)	0.633563 / 1.386936 (-0.753373)

github-actions · 2023-08-08T15:24:59Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006390 / 0.011353 (-0.004963)	0.003683 / 0.011008 (-0.007325)	0.081274 / 0.038508 (0.042766)	0.062193 / 0.023109 (0.039083)	0.355360 / 0.275898 (0.079462)	0.396471 / 0.323480 (0.072992)	0.003569 / 0.007986 (-0.004416)	0.003928 / 0.004328 (-0.000400)	0.062292 / 0.004250 (0.058041)	0.049700 / 0.037052 (0.012648)	0.354604 / 0.258489 (0.096115)	0.419436 / 0.293841 (0.125595)	0.027151 / 0.128546 (-0.101395)	0.007954 / 0.075646 (-0.067692)	0.262231 / 0.419271 (-0.157041)	0.045483 / 0.043533 (0.001950)	0.354285 / 0.255139 (0.099146)	0.385178 / 0.283200 (0.101978)	0.021183 / 0.141683 (-0.120500)	1.420785 / 1.452155 (-0.031370)	1.531545 / 1.492716 (0.038829)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.202298 / 0.018006 (0.184292)	0.442172 / 0.000490 (0.441683)	0.003565 / 0.000200 (0.003366)	0.000074 / 0.000054 (0.000020)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.024229 / 0.037411 (-0.013183)	0.074352 / 0.014526 (0.059826)	0.087530 / 0.176557 (-0.089026)	0.146478 / 0.737135 (-0.590658)	0.085145 / 0.296338 (-0.211194)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.388395 / 0.215209 (0.173186)	3.877623 / 2.077655 (1.799968)	1.882444 / 1.504120 (0.378324)	1.707871 / 1.541195 (0.166676)	1.772132 / 1.468490 (0.303642)	0.491937 / 4.584777 (-4.092840)	3.057947 / 3.745712 (-0.687765)	2.822390 / 5.269862 (-2.447471)	1.879719 / 4.565676 (-2.685957)	0.056830 / 0.424275 (-0.367445)	0.006415 / 0.007607 (-0.001192)	0.458945 / 0.226044 (0.232900)	4.594502 / 2.268929 (2.325574)	2.339677 / 55.444624 (-53.104948)	1.983750 / 6.876477 (-4.892727)	2.173792 / 2.142072 (0.031719)	0.580390 / 4.805227 (-4.224838)	0.124568 / 6.500664 (-6.376096)	0.061694 / 0.075469 (-0.013775)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.265108 / 1.841788 (-0.576680)	18.415254 / 8.074308 (10.340946)	13.963829 / 10.191392 (3.772437)	0.148926 / 0.680424 (-0.531498)	0.016919 / 0.534201 (-0.517282)	0.331082 / 0.579283 (-0.248201)	0.345777 / 0.434364 (-0.088587)	0.381123 / 0.540337 (-0.159214)	0.543297 / 1.386936 (-0.843639)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006121 / 0.011353 (-0.005232)	0.003717 / 0.011008 (-0.007291)	0.063653 / 0.038508 (0.025144)	0.063723 / 0.023109 (0.040613)	0.360233 / 0.275898 (0.084335)	0.398353 / 0.323480 (0.074873)	0.004696 / 0.007986 (-0.003290)	0.002876 / 0.004328 (-0.001452)	0.063057 / 0.004250 (0.058806)	0.050258 / 0.037052 (0.013206)	0.362946 / 0.258489 (0.104457)	0.403260 / 0.293841 (0.109419)	0.027738 / 0.128546 (-0.100809)	0.008025 / 0.075646 (-0.067621)	0.068781 / 0.419271 (-0.350491)	0.042114 / 0.043533 (-0.001419)	0.363546 / 0.255139 (0.108407)	0.385640 / 0.283200 (0.102440)	0.021757 / 0.141683 (-0.119926)	1.482364 / 1.452155 (0.030209)	1.571859 / 1.492716 (0.079143)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.235628 / 0.018006 (0.217622)	0.439909 / 0.000490 (0.439419)	0.003070 / 0.000200 (0.002870)	0.000075 / 0.000054 (0.000020)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.027045 / 0.037411 (-0.010366)	0.080413 / 0.014526 (0.065887)	0.088953 / 0.176557 (-0.087603)	0.141907 / 0.737135 (-0.595228)	0.090604 / 0.296338 (-0.205735)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.423250 / 0.215209 (0.208041)	4.216510 / 2.077655 (2.138855)	2.162946 / 1.504120 (0.658826)	2.014561 / 1.541195 (0.473366)	2.086347 / 1.468490 (0.617857)	0.496591 / 4.584777 (-4.088186)	3.089594 / 3.745712 (-0.656118)	2.853640 / 5.269862 (-2.416221)	1.878149 / 4.565676 (-2.687527)	0.056914 / 0.424275 (-0.367361)	0.006762 / 0.007607 (-0.000845)	0.493470 / 0.226044 (0.267426)	4.929966 / 2.268929 (2.661037)	2.640885 / 55.444624 (-52.803739)	2.335950 / 6.876477 (-4.540527)	2.565866 / 2.142072 (0.423793)	0.585433 / 4.805227 (-4.219794)	0.124969 / 6.500664 (-6.375695)	0.062361 / 0.075469 (-0.013108)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.369144 / 1.841788 (-0.472644)	19.037582 / 8.074308 (10.963274)	14.069141 / 10.191392 (3.877749)	0.146469 / 0.680424 (-0.533954)	0.016911 / 0.534201 (-0.517290)	0.336802 / 0.579283 (-0.242482)	0.336411 / 0.434364 (-0.097953)	0.392360 / 0.540337 (-0.147977)	0.536078 / 1.386936 (-0.850858)

* Fix hf_token fixture * Do not store token but pass it explicitly * Fix test with no token * Fix style * Test private load_dataset_builder and get_dataset_config_info * Fix DownloadConfig to pass token to storage_options * Set config HUB_DATASETS_HFFS_URL * Use HUB_DATASETS_HFFS_URL in Audio/Image decode_example * Pass download_config create_builder_configs_from_metadata_configs

albertvillanova added 4 commits August 7, 2023 17:30

Fix hf_token fixture

7c19c5b

Do not store token but pass it explicitly

9adcc2c

Fix test with no token

de3f003

Fix style

9c7a0d5

Test private load_dataset_builder and get_dataset_config_info

c65806b

Fix DownloadConfig to pass token to storage_options

c0a77dc

albertvillanova added 2 commits August 8, 2023 08:42

Set config HUB_DATASETS_HFFS_URL

dbf2cce

Use HUB_DATASETS_HFFS_URL in Audio/Image decode_example

58e62af

Pass download_config create_builder_configs_from_metadata_configs

e4684fc

albertvillanova marked this pull request as ready for review August 8, 2023 13:47

albertvillanova mentioned this pull request Aug 8, 2023

Private datasets do not load when passing token #6126

Closed

albertvillanova changed the title ~~Fix authentication when passing token~~ Fix authentication issues Aug 8, 2023

albertvillanova merged commit 12cfc11 into main Aug 8, 2023

albertvillanova deleted the fix-6126 branch August 8, 2023 15:16

This was referenced Aug 9, 2023

upgrade datasets to 2.14 huggingface/dataset-viewer#1550

Closed

Update datasets to 2.14.4 huggingface/dataset-viewer#1652

Closed

dwyatte mentioned this pull request Jan 15, 2024

Support cloud storage in load_dataset #5281

Open

Fix authentication issues #6127

Fix authentication issues #6127

Uh oh!

Conversation

albertvillanova commented Aug 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Fix authentication when passing token

Fix authentication in Audio.decode_example and Image.decode_example.

Fix authentication to resolve data_files in repositories without script

Uh oh!

HuggingFaceDocBuilderDev commented Aug 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 7, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Uh oh!

github-actions bot commented Aug 7, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Uh oh!

github-actions bot commented Aug 7, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Uh oh!

github-actions bot commented Aug 7, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Uh oh!

github-actions bot commented Aug 8, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Uh oh!

github-actions bot commented Aug 8, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

albertvillanova commented Aug 7, 2023 •

edited

Loading

Fix authentication when passing `token`

Fix authentication in `Audio.decode_example` and `Image.decode_example`.

Fix authentication to resolve `data_files` in repositories without script

HuggingFaceDocBuilderDev commented Aug 7, 2023 •

edited

Loading