-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[francetv] stop using retired FranceTV API and enable new one #29996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
youtube_dl/YoutubeDL.py
Outdated
| except (OSError, IOError): | ||
| self.report_error('Cannot write subtitles file ' + sub_filename) | ||
| return | ||
| elif sub_info.get('downloader') is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would callable(sub_info.get('downloader')) be a safer test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good one. Done.
|
You might also consider this, if your other changes haven't populated the properties mentioned. |
|
Hi @sarnoud, I reviewed and tested your code with fix for FranceTV extractor :
Everything is OK :
But there is a bug with the infos returned as JSON => not serializable This issue breaks Youtube-DL hook with mpv (Mplayer). |
|
Presumably Also at l.2077: |
|
Good catches! Submitted changes. |
|
After these both commits, JSON output is correct. But after some analysis, I don't understand your code to get subtitle/downloader : With this lambda, |
|
Yes you are correct. There were 2 ways to get captions: via an url ('url' attrribute) or via providing the data directly ('data' attribute). The issue with captions now is that they are delivered as a m3u8 stream that needs to be downloaded. So there would be 2 options: i- download captions all the time and put them in 'data', or ii- provide a lambda that will only do the download when needed. The lambda is also the thing that was not getting properly serialized in the json.dump calls BTW. Does that make sense? |
|
Possibly the subtitle generator should be evaluated when generating JSON rather than just emitting "not serializable"? |
|
Technically possible, but the download itself is quite involved though. See for yourself with something like: I would advise against it for every JSON serialization? |
|
@sarnoud thanks a lot for the fix ! works well 👍 |
As this is adding a new element to the extractor-core API, I think it needs a bit more thought. Presumably the reason for not downloading the captions in the extractor is to avoid unnecessarily downloading them without second-guessing the core logic ( So instead of returning the actual subtitles, we return a downloader function, essentially a closure, and on first glance this seems like a cleaner solution than the However I believe that these are really two alternatives of the same sort. One is a URL that has to be downloaded as a plain text page; the new case is a URL that has to be downloaded as M3U8. Surely we should deal with both cases in the same logic? And, just as the core shouldn't be playing with random IEs, I'm not sure that an extractor should know about the downloader's Here's my proposal that addresses both concerns and (if it works) has some other benefits:
Provided that there aren't significant unwanted side-effects in using the |
|
@dirkf I understand your motivation to encourage the |
…eep the download logic in YoutubeDl
|
Sure, understandable point. There's a halfway house: instead of replacing the existing |
|
Unfortunately it's not easy to scan the extractors for use of |
|
Yep. |
|
It's good for me (if it works with FranceTV). If real maintainers reappear they might prefer the 2-stage approach but it would be easy to switch to that. |
|
Thanks @dirkf . What would be the next step? You would approve to trigger the workflow and get it merged? |
I found two issues. It's possible that I've not correctly built the binary from git, but:
For example: and then in mpv: I can manually get the correct file, by using the -f flag, but it no longer does the right thing automatically. The subtitles look like this: |
Does FranceTV have a separate programme page for the AD version (as with BBC)? If so, the AD should only be used with that page's URL. Otherwise the AD should only be fetched if selected with
How does that manifest itself? Although I'm no VTT expert, the text you quote looks like what is specified in RFC8216 section 3.5 and its sources. From UK I get 403 on both M3U8 and MPD sources, so I can't test further. In |
For this issue, we must set With this modification, HLS-Audio with no audio description is always chosen for "Best Audio" format. I had also a format note for audio description to display it with -F (list formats) flag. |
There could be several audio tracks for the same programme on the same web pages, French, French with audio description and sometimes the original version "Version Originale" which could be English, Italian etc... On the France.tv web page you can make a selection, but if you use the browser based player it defaults to the French audio track even if the original program was in Italian for example. The audio described is never the default as far as I can tell. When previously using the -F and then -f flags to pick streams I found it a bit unreliable, as you could often get French no matter which steam you picked with youtube-dl, even though the web player would pay the Italian for example... Montalbano sounds really strange in French... |
I can confirm in the past that the VO audio tracks were often NOT present in the -F listings or if it seemed like they were that when you downloaded and played them they were all french audio even if they said otherwise. Yet the VO audio tracks were playable from the site which means they were available. This just means that the files were not being picked up correctly by the francetv IE in the past. My thought is that the new PR introduced here will pick up these VO files in the new mpd formats that weren't being picked up by the old francetv IE. So far with my limited testing of the new PR, The bad part is that the current The only way to solve this is to add "descriptive" text to the AD and VO format ids. But that's only possible if that type of text is available in the source manifests. |
I followed this issue and commit a bit on the The ability to downloading subtitles isn't really a 'fix' if they're out of order and don't work. I'm going to clone sarnaud's PR and see for myself where things stand in this regard. I presume the old HLS subtitles work fine though since they were working before |
Apparently yt-dlp has a whole module for reassembling WebVTT from fragment downloads, which might address the several ToDo items in the code related to WebVTT. |
|
The subtitle downloading and reassembling code was submitted to youtube-dl first in #6144. Ofc, further improvements have been made to it in yt-dlp/yt-dlp#247. Hope either of these PRs and the discussions in them are helpful to you guys |
Done |
Thanks to both of you for that! I just cloned sarnaud's PR repo but my guess is it would make more sense to clone yt-dlp to pick up the vtt ordering fixes already there and then try to merge sarnaud's PR #29996 into it vs the other way around due to the number of files changed in yt-dlp #247. @pukkandan are you going to merge in sarnaud's #29996 to yt-dlp or wait on it? My merge skills are a little rusty. I presume I just need to open up remote to sarnaud fork, fetch his PR branch, merge to yt-dlp master (or my own branch of it) and fix conflicts if they arise. Sound right? Thanks |
| info = { | ||
| 'title': None, | ||
| 'subtitle': None, | ||
| 'image': None, | ||
| 'subtitles': {}, | ||
| 'duration': None, | ||
| 'videos': [], | ||
| 'formats': [], | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems rather silly to stuff extracted information into an info dictionary only to extract each key back one by one at the end. The churn of changing formats into info['formats'] later in the extractor inflates the diff size and makes code harder to review without actually accomplishing anything.
| 'id': video_id, | ||
| 'title': self._live_title(title) if is_live else title, | ||
| 'title': self._live_title(info['title']) if is_live else info['title'], | ||
| 'description': clean_html(info.get('synopsis')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer extracted. The info dictionary issue is masking a bug here.
| 'duration': int_or_none(info.get('real_duration')) or parse_duration(info.get('duree')), | ||
| 'thumbnail': info.get('image'), | ||
| 'duration': int_or_none(info.get('duration')), | ||
| 'timestamp': int_or_none(try_get(info, lambda x: x['diffusion']['timestamp'])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise.
| self._sort_formats(formats) | ||
| for f in info['formats']: | ||
| preference = 100 | ||
| if f['format_id'].startswith('dash-audio_qtz=96000') or (f['format_id'].find('Description') >= 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if f['format_id'].startswith('dash-audio_qtz=96000') or (f['format_id'].find('Description') >= 0): | |
| if f.get('language') == 'qtz': |
should work as well, does it not?
|
@sarnoud We could have a fix for DRM videos here: |
Original PR: ytdl-org/youtube-dl#29996 Closes: #970, ytdl-org/youtube-dl#29956, ytdl-org/youtube-dl#29957, ytdl-org/youtube-dl#29969, ytdl-org/youtube-dl#29990, ytdl-org/youtube-dl#30010 Authored by: fstirlitz, sarnoud
|
@sarnoud I know that youtube-dl has no vocation to bypass DRMs, but it did it before with France TV videos, so… Is there a way to code such a command in the francetv.py extractor?
(Based on the |
|
sorry - I got lost. is there any latest patch and howto for downloading e.g. https://www.france.tv/france-2/journal-20h00/2822769-edition-du-vendredi-22-octobre-2021.html |
Original PR: ytdl-org/youtube-dl#29996 Closes: yt-dlp#970, ytdl-org/youtube-dl#29956, ytdl-org/youtube-dl#29957, ytdl-org/youtube-dl#29969, ytdl-org/youtube-dl#29990, ytdl-org/youtube-dl#30010 Authored by: fstirlitz, sarnoud
|
@sarnoud If you’re still around there, since I have updated Python to 3.10, your patch does not work anymore when I replace the original files by your patched ones: |
Please follow the guide below
xinto all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
The initial goal is to fix the newly broken francetv extractor (see issue #29956 (comment))
Few changes: