[tiktok] Remove yt-dlp dependency, add story, liked posts, saved posts, reposts support#8715
Conversation
…passes I think the original yt-dlp solution assumes that if a device ID works once, it will always work. Plus, my approach would cause needless retries in certain cases if hasMorePrevious does end up being wrong like the original algorithm accounts for. So let's copy the original algorithm here, too.
This comment was marked as resolved.
This comment was marked as resolved.
It acts as a nice guard against that account suddenly having lots of posts to extract
[tiktok] Fix avatar naming convention to match that of posts
This was largely achieved using the story/batch/item_list endpoint
|
@mikf This ended up being quite a large PR, but I'd say the code is at least a bit cleaner than it was when I originally brought this PR up. I have also updated the PR description. It's ready for review now (whenever you have time). A nice-to-have would be to implement a date range option for TikTok (as requested in issue #7966). I don't know how the filter option works, but I should think the approach outlined in that issue should work for individual posts, right? It would still be nice to have it for user extraction so that we can stop item list requests early where possible. I was able to maintain the index range support thanks to a [potential] misuse of the Perhaps copying the approach used by yt-dlp could be suitable here? Even supporting only |
I'll try to take a closer look next week / over Christmas & New Years. |
We should aim to avoid having pinned posts returned before non-pinned ones
This comment was marked as resolved.
This comment was marked as resolved.
|
I will stop adding new features to this already huge PR now. Any new features need to be added in follow-up PRs. |
KeyboardInterrupt & SystemExit inherit from BaseException (not Exception) and therefore don't need special handling
- rename 'tiktok-user-extractor' to 'ytdl'
|
|
|
There are still several things I'd like to improve, like using the data from API responses directly instead of using it only to build post URLs and extracting the same data from or but it works well enough to merge this as is, if everyone is OK with that. |
That's a good point, I didn't think of that 😅 Works for me 👍 Let's get this merged. It would probably be easier to review future changes with smaller PRs anyway. Thanks for improving and reviewing the code! |

This PR removes the yt-dlp dependency completely from the TikTok extractor, whilst maintaining support for it as a fallback option. This involved adding the ability to extract video URLs directly from the rehydration data, as well as porting the user profile extraction code from yt-dlp. If a user wants to extract using yt-dlp, they must assign
"ytdl"to the newtiktok-user-extractoroption, and they can also do the same for videos by assigning"ytdl"to thevideosoption.This also lets us perform additional filtering that yt-dlp doesn't, including differentiating between video and photo posts, without requiring all of the extra requests (per post) needed to do so previously.
This PR also introduces story support. I probably should've done it as a separate PR but its introduction impacted the way I wanted to approach the removal of yt-dlp. Support for stories is twofold:
falseto the newpostsoption). You can disable story extraction by assigningfalseto thestoriesoption.https://www.tiktok.com/followinglink (requires cookies).The
photosoption has been added. It works in the same way asvideosandaudio, i.e. it can be used to filter out photo posts if set toFalse. Probably a bit counterintuitive for a program likegallery-dlbut I figured that people would probably want to do this, especially now that we are separating our extractor fromyt-dlpand adding features thatyt-dlpdoes not have.order-postssupport for user profiles has also been added (but it will only work if the user provides cookies: if they don't provide cookies, the extractor will only be able to extract posts in descending order). It supports"desc"and"asc"/"reverse", and it also supports"popular"ordering, which can be found on the webapp:On top of all of this, liked post, saved post, and repost extraction have all been added (but only in descending order as the TikTok API doesn't appear to support anything else right now).
This PR also fixes a small issue with avatars. Previously, they would be stored in a folder named after the user identifier given directly in the URL, instead of the user's unique ID (and would include the same within the photo's file name). I've now changed this to match the behaviour of video, photo and audio downloads (which always try to use the unique ID where available), although I have maintained the previous behaviour as a fallback in case a unique ID can't be found for whatever reason.
I have updated the TikTok tests and added some more.
Closes: #7246
Closes: #8035
Closes: #8466 (this should already have been solved now that the bundled yt-dlp version has been bumped recently)