Skip to content

Conversation

@albertvillanova
Copy link
Member

Perform refactoring to decouple extract functionality.

@albertvillanova albertvillanova marked this pull request as ready for review May 3, 2021 13:06
@albertvillanova albertvillanova added this to the 1.7 milestone May 3, 2021
@albertvillanova albertvillanova added the refactoring Restructuring existing code without changing its external behavior label May 3, 2021
@albertvillanova albertvillanova modified the milestones: 1.7, 1.8 May 31, 2021
@albertvillanova albertvillanova modified the milestones: 1.8, 1.9 Jun 8, 2021
@albertvillanova
Copy link
Member Author

albertvillanova commented Jul 6, 2021

Hi @lhoestq,

Once that #2578 has been merged, I would like to ask you to have a look at this PR: it implements the same logic as the one in #2578 but for all the other file compression formats.

Thanks.

@albertvillanova albertvillanova requested a review from lhoestq July 6, 2021 10:48
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so clean ! Thanks :)

if self._do_extract(output_path, force_extract):
try:
self.extractor.extract(input_path, output_path, extractor=extractor)
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only catch only the exception that happens when the archive formatted is not identified ? Which errors would you like to catch here ?

Otherwise this could catch unexpected errors.

You can also add the message of the original error to the EnvironmentError as additional information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right: in the original code there was no exception catch.

Indeed, in the original code the raise of EnvironmentError was dead code:

  • previously if none of the if conditions was met there was a return (return output_path),
    • See
      if (
      not is_zipfile(output_path)
      and not tarfile.is_tarfile(output_path)
      and not is_gzip(output_path)
      and not is_xz(output_path)
      and not is_rarfile(output_path)
      and not ZstdExtractor.is_extractable(output_path)
      ):
      return output_path
  • and this part of the code was in the else clause after all of the if conditions repeated again: the else clause was not possible to be executed because of the previous condition I mentioned above
    • See:
      if tarfile.is_tarfile(output_path):
      tar_file = tarfile.open(output_path)
      tar_file.extractall(output_path_extracted)
      tar_file.close()
      elif is_gzip(output_path):
      os.rmdir(output_path_extracted)
      with gzip.open(output_path, "rb") as gzip_file:
      with open(output_path_extracted, "wb") as extracted_file:
      shutil.copyfileobj(gzip_file, extracted_file)
      elif is_zipfile(output_path): # put zip file to the last, b/c it is possible wrongly detected as zip
      with ZipFile(output_path, "r") as zip_file:
      zip_file.extractall(output_path_extracted)
      zip_file.close()
      elif is_xz(output_path):
      os.rmdir(output_path_extracted)
      with lzma.open(output_path) as compressed_file:
      with open(output_path_extracted, "wb") as extracted_file:
      shutil.copyfileobj(compressed_file, extracted_file)
      elif is_rarfile(output_path):
      if config.RARFILE_AVAILABLE:
      import rarfile
      rf = rarfile.RarFile(output_path)
      rf.extractall(output_path_extracted)
      rf.close()
      else:
      raise EnvironmentError("Please pip install rarfile")
      elif ZstdExtractor.is_extractable(output_path):
      os.rmdir(output_path_extracted)
      ZstdExtractor.extract(output_path, output_path_extracted)
      else:
      raise EnvironmentError("Archive format of {} could not be identified".format(output_path))

I would suggest just remove this exception raise and the corresponding try and the exception catch.

@albertvillanova
Copy link
Member Author

I think all is done @lhoestq ;)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks !

@lhoestq lhoestq merged commit 7353a41 into huggingface:master Jul 8, 2021
@albertvillanova albertvillanova modified the milestones: 1.9, 1.10 Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring Restructuring existing code without changing its external behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants