-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Closed
Description
Is your feature request related to a problem? Please describe.
When executing prepare_data in FineTuningDataModule, the dataset_kwargs parameter is not passed. We may need it later when tokenize_dataset.
Describe the solution you'd like
I suggest adding this parameter.
if not self.train_path_packed.is_file():
prepare_packed_sequence_data(
input_path=self.train_path,
output_path=self.train_path_packed,
packed_sequence_size=self.packed_sequence_size,
tokenizer=self.tokenizer,
max_seq_length=self.seq_length,
seed=self.seed,
output_metadata_path=self.pack_metadata,
dataset_kwargs=self.dataset_kwargs, # here
)
if not self.validation_path_packed.is_file():
prepare_packed_sequence_data(
input_path=self.validation_path,
output_path=self.validation_path_packed,
packed_sequence_size=self.packed_sequence_size,
tokenizer=self.tokenizer,
max_seq_length=self.seq_length,
seed=self.seed,
output_metadata_path=self.pack_metadata,
dataset_kwargs=self.dataset_kwargs, # here
)Describe alternatives you've considered
No
Additional context
Metadata
Metadata
Assignees
Labels
No labels