-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
I have occasionally run into this error when scraping several Fandom wikis lately:
[fandom][error] An unexpected error occurred: KeyError - 'metadata'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[fandom][debug]
Traceback (most recent call last):
File "/home/[redacted]/programming/gallery-dl/gallery_dl/job.py", line 153, in run
for msg in extractor:
^^^^^^^^^
File "/home/[redacted]/programming/gallery-dl/gallery_dl/extractor/wikimedia.py", line 107, in items
self.prepare_image(image)
~~~~~~~~~~~~~~~~~~^^^^^^^
File "/home/[redacted]/programming/gallery-dl/gallery_dl/extractor/wikimedia.py", line 85, in prepare_image
for m in image["metadata"] or ()}
~~~~~^^^^^^^^^^^^
KeyError: 'metadata'
Upon closer inspection, it appears that this is an expected behavior, though it is only documented in the changelog of MediaWiki version 1.34:
In the response to queries that use
prop=imageinfo, entries for non-existing files (indicated by thefilemissingfield) now omit the following fields, since they are meaningless in this context:timestamp,userhidden,user,userid,anon,size,width,height,pagecount,duration,commenthidden,parsedcomment,comment,thumburl,thumbwidth,thumbheight,thumbmime,thumberror,url,sha1,metadata,extmetadata,commonmetadata,mime,mediadtype,bitdepth. Clients that process these fields should first check iffilemissingis set. Fields that are supported even if the file is missing include:canonicaltitle,archivename(deleted files only),descriptionurl,descriptionshorturl.
So far I have only seen this happen with image-revisions greater than 1. Here's a quick fix that I've been using personally, since I noticed that entries with filemissing set seem to always be returned as the last element of imageinfo:
diff --git a/gallery_dl/extractor/wikimedia.py b/gallery_dl/extractor/wikimedia.py
index 2e8136f1..a103a06b 100644
--- a/gallery_dl/extractor/wikimedia.py
+++ b/gallery_dl/extractor/wikimedia.py
@@ -104,6 +104,12 @@ class WikimediaExtractor(BaseExtractor):
yield Message.Directory, info
for info["num"], image in enumerate(images, 1):
+ # https://www.mediawiki.org/wiki/Release_notes/1.34
+ if "filemissing" in image:
+ self.log.warning(
+ "File %s (or its revision) is missing",
+ image["canonicaltitle"].partition(":")[2])
+ continue
self.prepare_image(image)
image.update(info)
yield Message.Url, image["url"], imageThis would break the continuity of the sequence number if invalid entries appear in the middle of imageinfo, and I'm not sure if that would be acceptable.