-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Description
Describe the bug
The .map function cannot hash under python 3.9. Tried to use the solution here, but still get the same message:
Parameter 'function'=<function map_to_pred at 0x7fa0b49ead30> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
Steps to reproduce the bug
def map_to_pred(batch):
"""
Perform inference on an audio batch
Parameters:
batch (dict): A dictionary containing audio data and other related information.
Returns:
dict: The input batch dictionary with added prediction and transcription fields.
"""
audio = batch['audio']
input_features = processor(
audio['array'], sampling_rate=audio['sampling_rate'], return_tensors="pt").input_features
input_features = input_features.to('cuda')
with torch.no_grad():
predicted_ids = model.generate(input_features)
preds = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
batch['prediction'] = processor.tokenizer._normalize(preds)
batch["transcription"] = processor.tokenizer._normalize(batch['transcription'])
return batch
MODEL_CARD = "openai/whisper-small"
MODEL_NAME = MODEL_CARD.rsplit('/', maxsplit=1)[-1]
model = WhisperForConditionalGeneration.from_pretrained(MODEL_CARD)
processor = AutoProcessor.from_pretrained(
MODEL_CARD, language="english", task="transcribe")
model = torch.compile(model)
dt = load_dataset("audiofolder", data_dir=config['DATA']['dataset'], split="test")
dt = dt.cast_column("audio", Audio(sampling_rate=16000))
result = coraal_dt.map(map_to_pred, num_proc=16)Expected behavior
Hashed and cached dataset starts inferencing
Environment info
transformersversion: 4.35.0- Platform: Linux-5.14.0-284.30.1.el9_2.x86_64-x86_64-with-glibc2.34
- Python version: 3.9.18
- Huggingface_hub version: 0.17.3
- Safetensors version: 0.4.0
- Accelerate version: 0.24.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Metadata
Metadata
Assignees
Labels
No labels