Skip to content

[Bellazon] Duplicated files from quote blocks, and a couple of other unexpected behaviors #8247

@taskhawk

Description

@taskhawk

In Bellazon it seems common to quote previous posts in the thread to comment about a particular picture(s)/video(s), but the files are included again in the quote, becoming part of the new post and thus, end up duplicated.

For example:
https://www.bellazon.com/main/topic/66334-charly-jordan/page/3/#findComment-4602713

3 files are downloaded, 2 of which were already downloaded previously from the quoted post.

I guess technically this could be solved with a download archive, but there's no good key for this at the moment. {filename} seems to be user generated so there's risk of collisions, whatever small the chance may be.

I noticed the images have some hash(?) at the end of their URL (IMG_4918.JPG.81beb237ad51d0bbfcca73fc60eb838c.JPG) so that could work, with something like "archive-format": "{thread[id]}-{hash}". I tested from different IPs and devices and stays the same, so it's not a sneaky download key as I first thought.

However, that's a less useful archive format, at least for me, because now I don't know which entry corresponds to which file easily. Knowing that is useful as it allows me to quickly modify or remove specific entries if I need to for whatever reason. In my case I'm using:

"archive-format": "{thread[id]}_{post[id]}_{num:>02}",
"filename": "{post[id]}_{num:>02}.{extension}",

So I'm thinking that the best approach would be to simply ignore any files inside a quote block, at least for thread runs. It may be desired when downloading individual posts, though.


I noticed a couple of quirks in the output of the thread run, curiously right alongside the previous example. First:

https://www.bellazon.com/main/topic/66334-charly-jordan/page/3/#findComment-4602714

It doesn't have files but when it gets processed it tries to download something and it shows the following:

...
./bellazon/Charly Jordan/4602713_03.jpg
[downloader.http][warning] HTML response
[download][error] Failed to download 4602714_01.part
./bellazon/Charly Jordan/4616745_01.jpg
...

Second. Right after that post, there's this:

https://www.bellazon.com/main/topic/66334-charly-jordan/page/3/#findComment-4603172

It's a single image post, with the image being exactly the same as one towards the end of a previous post, so a duplicate, except that in this case it wasn't processed at all. No file was downloaded and its line in the output should have appeared right after the Failed to download message in the output excerpt I posted above. It should have appeared as:

./bellazon/Charly Jordan/4603172_01.jpg

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions