Skip to content

Conversation

@sarnoud
Copy link

@sarnoud sarnoud commented Sep 20, 2021

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • [ X] I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • [ X] Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

The initial goal is to fix the newly broken francetv extractor (see issue #29956 (comment))

Few changes:

  1. extractor fixes for new backend
  2. removed obsolete extractors in the francetv.py file (they now all redirect to the main site)
  3. Fixed tests
  4. implemented the ability to extract caption streams from m3u8 (behind include_subtitles flag)
  5. implemented a new field 'download' that is a lambda function to retrieve subtitles (along 'data' and 'url')

@sarnoud sarnoud changed the title Sarnoud francetv [francetv] stop using retired Francetv API Sep 20, 2021
@sarnoud sarnoud changed the title [francetv] stop using retired Francetv API [francetv] stop using retired FranceTV API and enable new one Sep 20, 2021
except (OSError, IOError):
self.report_error('Cannot write subtitles file ' + sub_filename)
return
elif sub_info.get('downloader') is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would callable(sub_info.get('downloader')) be a safer test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one. Done.

@dirkf
Copy link
Contributor

dirkf commented Sep 20, 2021

You might also consider this, if your other changes haven't populated the properties mentioned.

@lcheylus
Copy link

lcheylus commented Sep 20, 2021

Hi @sarnoud,

I reviewed and tested your code with fix for FranceTV extractor :

  • get GIT repository from your sarnoud-francetv branch
  • install/test it in a virtual env (Python 3.9 on Debian testing).

Everything is OK :

  • extract and video formats (HLS and Dash)
  • get subtitles
  • tests with test/test_download.py for FranceTV, FranceTVSite and FranceTVInfo extractors.

But there is a bug with the infos returned as JSON => not serializable

$ youtube-dl -j https://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2019_3569073.html
WARNING: [FranceTV] Unknown MIME type application/mp4 in DASH manifest
Traceback (most recent call last):
  File "/home/fox/dev/youtube-dl.git/env/bin/youtube-dl", line 11, in <module>
    load_entry_point('youtube-dl==2021.6.6', 'console_scripts', 'youtube-dl')()
  File "/home/fox/dev/youtube-dl.git/env/lib/python3.9/site-packages/youtube_dl-2021.6.6-py3.9.egg/youtube_dl/__init__.py", line 475, in main
    _real_main(argv)
  File "/home/fox/dev/youtube-dl.git/env/lib/python3.9/site-packages/youtube_dl-2021.6.6-py3.9.egg/youtube_dl/__init__.py", line 465, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/fox/dev/youtube-dl.git/env/lib/python3.9/site-packages/youtube_dl-2021.6.6-py3.9.egg/youtube_dl/YoutubeDL.py", line 2070, in download
    res = self.extract_info(
  File "/home/fox/dev/youtube-dl.git/env/lib/python3.9/site-packages/youtube_dl-2021.6.6-py3.9.egg/youtube_dl/YoutubeDL.py", line 808, in extract_info
    return self.__extract_info(url, ie, download, extra_info, process)
(...)
  File "/home/fox/dev/youtube-dl.git/env/lib/python3.9/site-packages/youtube_dl-2021.6.6-py3.9.egg/youtube_dl/YoutubeDL.py", line 1774, in __forced_printings
    self.to_stdout(json.dumps(info_dict))
  File "/usr/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type function is not JSON serializable

This issue breaks Youtube-DL hook with mpv (Mplayer).

@dirkf
Copy link
Contributor

dirkf commented Sep 20, 2021

Presumably YoutubeDL.py l.1774 needs a default parameter to json.dumps() like the one added to json.dump() at utils.py l.1833?

            # json.dumps may need a default=lambda x: ...
            self.to_stdout(json.dumps(info_dict))

Also at l.2077:

                    self.to_stdout(json.dumps(res))

@sarnoud
Copy link
Author

sarnoud commented Sep 20, 2021

Good catches! Submitted changes.

@lcheylus
Copy link

After these both commits, JSON output is correct.

But after some analysis, I don't understand your code to get subtitle/downloader : youtube_dl/extractor/francetv.py lines 209-211 :

for lang, sts in info['subtitles'].items():
    for st in sts:
        st['downloader'] = lambda ydl, filename: PROTOCOL_MAP['m3u8_native'](ydl, ydl.params).download(filename, st)

With this lambda, st['downloader'] is a reference to a Python function => not JSON serializable.

@sarnoud
Copy link
Author

sarnoud commented Sep 20, 2021

Yes you are correct.
Check the change in YoutubeDL.py:1882.

There were 2 ways to get captions: via an url ('url' attrribute) or via providing the data directly ('data' attribute).

The issue with captions now is that they are delivered as a m3u8 stream that needs to be downloaded. So there would be 2 options: i- download captions all the time and put them in 'data', or ii- provide a lambda that will only do the download when needed.
I implemented ii- and this is the 'downloader' attribute basically.

                elif callable(sub_info.get('downloader')):
                    sub_info.get('downloader')(self, encodeFilename(sub_filename))

The lambda is also the thing that was not getting properly serialized in the json.dump calls BTW.

Does that make sense?

@dirkf
Copy link
Contributor

dirkf commented Sep 20, 2021

Possibly the subtitle generator should be evaluated when generating JSON rather than just emitting "not serializable"?

@sarnoud
Copy link
Author

sarnoud commented Sep 20, 2021

Technically possible, but the download itself is quite involved though. See for yourself with something like:
python3 -m youtube_dl --all-subs --skip-download https://www.france.tv/france-2/les-invisibles/les-invisibles-saison-1/2748331-pachelbel.html

I would advise against it for every JSON serialization?

@renalid
Copy link
Contributor

renalid commented Sep 21, 2021

@sarnoud thanks a lot for the fix ! works well 👍
Do you think there is a way to get (back) the description of the video ?

@dirkf
Copy link
Contributor

dirkf commented Sep 21, 2021

...
There were 2 ways to get captions: via an url ('url' attrribute) or via providing the data directly ('data' attribute).

The issue with captions now is that they are delivered as a m3u8 stream that needs to be downloaded. So there would be 2 options: i- download captions all the time and put them in 'data', or ii- provide a lambda that will only do the download when needed.
I implemented ii- and this is the 'downloader' attribute basically.
...

As this is adding a new element to the extractor-core API, I think it needs a bit more thought.

Presumably the reason for not downloading the captions in the extractor is to avoid unnecessarily downloading them without second-guessing the core logic (YoutubeDL.py) that makes that decision?

So instead of returning the actual subtitles, we return a downloader function, essentially a closure, and on first glance this seems like a cleaner solution than the ['url'] mechanism: as currently implemented, this has the core logic find some random extractor instance to use for downloading the subtitles.

However I believe that these are really two alternatives of the same sort. One is a URL that has to be downloaded as a plain text page; the new case is a URL that has to be downloaded as M3U8. Surely we should deal with both cases in the same logic? And, just as the core shouldn't be playing with random IEs, I'm not sure that an extractor should know about the downloader's PROTOCOL_MAP, as in the downloader lambda.

Here's my proposal that addresses both concerns and (if it works) has some other benefits:

  • scrap the ['downloader'] mechanism (sorry) -- benefit - no need to worry about unserializable items for JSON;
  • from the extractor, pass the m3u8 URL as st['url'] and also set st['protocol'] as 'm3u8_native' -- benefit - the subtitle link is available in JSON outputs;
  • in the core method _process_info() where it handles subtitles, remove this
            ie = self.get_info_extractor(info_dict['extractor_key'])
  • in the conditional branch where sub_info['data'] is None, replace the try statement with something like this:
                        # essentially the lambda downloader here
                        # pass 1st self.params to respect --external_downloader, --hls-prefer-native?
                        fd = get_suitable_downloader(sub_info, self.params)(self, self.params)
                        try:
                            if self.params.get('verbose'):
                                self.to_screen('[debug] Invoking subtitle downloader on %r' % sub_info.get('url'))
                            # the FD is supposed to encodeFilename() ?
                            if not fd.download(sub_filename, sub_info):
                                # depending on the FD, it may catch errors and return False, or not
                                raise DownloadError('subtitle download failed')
                        except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error, OSError, IOError, YoutubeDLError) as err:
                            self.report_warning('Unable to download subtitle for "%s": %s' %
                                                (sub_lang, error_to_compat_str(err)))
                            continue

Provided that there aren't significant unwanted side-effects in using the FileDownloader (but it's basically working like the lambda was), I think that this would be a better structure.

@chzurb
Copy link

chzurb commented Sep 21, 2021

@dirkf I understand your motivation to encourage the download aspect of PR changes from @sarnoud to be reengineered back into ['url'] but one benefit of leaving them as he has implemented them at least initially is that it doesn't break backward compatibility or introduce any potential regression with the existing use of ['url'] by every other IE. Perhaps it's better to let it come in the way he has it now as separate ['download] which allows it to get some real world production use in isolation from only francetv IE users. The integration back into url might then be worked in parallel for a later PR in and of itself like you suggest. Just my 2 cents.

@dirkf
Copy link
Contributor

dirkf commented Sep 21, 2021

Sure, understandable point.

There's a halfway house: instead of replacing the existing try statement, insert my proposed replacement before with a test elif sub_info.get('protocol'):. Then, as you suggest, a subsequent PR could combine the last two conditional branches, based on the valuable debugging of FranceTV users.

@sarnoud
Copy link
Author

sarnoud commented Sep 21, 2021

I implemented @dirkf proposal - which I agree makes sense.

@chzurb you're right that there is a risk of regression as it touches the core logic - even if it shouldn't? Tried downloading subtitles from a couple sites and it seems to work - but definitely not exhaustive.

@dirkf
Copy link
Contributor

dirkf commented Sep 21, 2021

Unfortunately it's not easy to scan the extractors for use of ['url'] in subtitles as the same key is heavily used throughout (131 of 787 mention the string). Maybe the old FranceTV was the only one, or one of a few?

@sarnoud
Copy link
Author

sarnoud commented Sep 21, 2021

Yep.
So are we willing to go with it?

@dirkf
Copy link
Contributor

dirkf commented Sep 21, 2021

It's good for me (if it works with FranceTV). If real maintainers reappear they might prefer the 2-stage approach but it would be easy to switch to that.

@sarnoud
Copy link
Author

sarnoud commented Sep 21, 2021

Thanks @dirkf . What would be the next step? You would approve to trigger the workflow and get it merged?

@ajt-en-france
Copy link

ajt-en-france commented Sep 23, 2021

I implemented @dirkf proposal - which I agree makes sense.

@chzurb you're right that there is a risk of regression as it touches the core logic - even if it shouldn't? Tried downloading subtitles from a couple sites and it seems to work - but definitely not exhaustive.

I found two issues. It's possible that I've not correctly built the binary from git, but:

  1. It pulls the first audio file, which may be the audio description, not the regular audio track
  2. The subtitles it pulls are malformed and I can't use them

For example:

$ youtube-dl -F https://www.france.tv/france-2/les-invisibles/les-invisibles-saison-1/2764261-garenne.html
[FranceTVSite] 2764261-garenne: Downloading webpage
[FranceTV] 9f3d9cfc-e977-4480-9a19-8c574e10821b: Downloading desktop video JSON
[FranceTV] 9f3d9cfc-e977-4480-9a19-8c574e10821b: Downloading mobile video JSON
[FranceTV] 9f3d9cfc-e977-4480-9a19-8c574e10821b: Downloading signed dash manifest URL
[FranceTV] 9f3d9cfc-e977-4480-9a19-8c574e10821b: Downloading MPD manifest
WARNING: [FranceTV] Unknown MIME type application/mp4 in DASH manifest
[FranceTV] 9f3d9cfc-e977-4480-9a19-8c574e10821b: Downloading signed hls manifest URL
[FranceTV] 9f3d9cfc-e977-4480-9a19-8c574e10821b: Downloading m3u8 information
[info] Available formats for 9f3d9cfc-e977-4480-9a19-8c574e10821b:
format code                          extension  resolution note
hls-audio-aacl-96-Audio_Description  mp4        audio only [qtz] 
hls-audio-aacl-96-Audio_Français     mp4        audio only [fr] 
dash-audio_fre=96000                 m4a        audio only [fr] DASH audio   96k , m4a_dash container, mp4a.40.2 (48000Hz)
dash-audio_qtz=96000                 m4a        audio only [qtz] DASH audio   96k , m4a_dash container, mp4a.40.2 (48000Hz)
dash-video=400000                    mp4        384x216    DASH video  400k , mp4_dash container, avc1.42C01E, 25fps, video only
hls-522                              mp4        384x216     522k , avc1.42C01E, 25.0fps, video only
dash-video=950000                    mp4        640x360    DASH video  950k , mp4_dash container, avc1.4D401F, 25fps, video only
hls-1105                             mp4        640x360    1105k , avc1.4D401F, 25.0fps, video only
dash-video=1400000                   mp4        960x540    DASH video 1400k , mp4_dash container, avc1.4D401F, 25fps, video only
hls-1582                             mp4        960x540    1582k , avc1.4D401F, 25.0fps, video only
dash-video=2000000                   mp4        1280x720   DASH video 2000k , mp4_dash container, avc1.64001F, 25fps, video only
hls-2218                             mp4        1280x720   2218k , avc1.64001F, 25.0fps, video only (best)

and then in mpv:

$ mpv Les_invisibles_-_Garenne.mp4 
 (+) Video --vid=1 (*) (h264 1280x720 25.000fps)
 (+) Audio --aid=1 --alang=qtz (*) (aac 2ch 48000Hz)
File tags:
 Title: Les invisibles - Garenne
AO: [pulse] 48000Hz stereo 2ch float
VO: [gpu] 1280x720 yuv420p

I can manually get the correct file, by using the -f flag, but it no longer does the right thing automatically.

The subtitles look like this:

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:900000,LOCAL:00:00:00.000

00:00:00.000 --> 00:00:33.033
...

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:1800000,LOCAL:00:00:00.000

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:2700000,LOCAL:00:00:00.000

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:3600000,LOCAL:00:00:00.000

00:00:03.300 --> 00:00:05.733
-Voilà.
-Allez, les filles !

00:00:06.166 --> 00:00:07.500
-Hello !

00:00:07.766 --> 00:00:10.366
-Et voilà les plus belles !

@dirkf
Copy link
Contributor

dirkf commented Sep 23, 2021

...
I found two issues. It's possible that I've not correctly built the binary from git, but:

1. It pulls the first audio file, which may be the audio description, not the regular audio track

Does FranceTV have a separate programme page for the AD version (as with BBC)? If so, the AD should only be used with that page's URL. Otherwise the AD should only be fetched if selected with -f.

2. The subtitles it pulls are malformed and I can't use them

How does that manifest itself? Although I'm no VTT expert, the text you quote looks like what is specified in RFC8216 section 3.5 and its sources.

From UK I get 403 on both M3U8 and MPD sources, so I can't test further.

In extractor/francetv.py, now that _extract_m3u8_formats() can return either False or (formats, subtitles), it would be good to test before assuming that its result is a tuple, as here:

            elif ext == 'm3u8':                                                 
                res = self._extract_m3u8_formats(                               
                    sign(video_url, format_id), video_id, 'mp4',                
                    entry_protocol='m3u8_native', m3u8_id=format_id,            
                    fatal=False, include_subtitles=True)                        
                if not res:                                                     
                    continue                                                    
                format, subtitle = res                                          

@lcheylus
Copy link

lcheylus commented Sep 23, 2021

I found two issues. It's possible that I've not correctly built the binary from git, but:

1. It pulls the first audio file, which may be the audio description, not the regular audio track

For this issue, we must set format['preference'] before sorting formats : extractor/francetv.py line 202

for f in info['formats']:    
    if f['format_id'].startswith('dash-audio_qtz') or f['format_id'].find('Audio Description') > 0:    
        f['preference'] = -1    
        f['format_note'] = "Audio description"
    elif f['format_id'].startswith('dash-audio'):    
        f['preference'] = 10    
    elif f['format_id'].startswith('hls-audio'):    
        f['preference'] = 20    
    else:    
        f['preference'] = 100    
    
self._sort_formats(info['formats'])

With this modification, HLS-Audio with no audio description is always chosen for "Best Audio" format. I had also a format note for audio description to display it with -F (list formats) flag.

@ajt-en-france
Copy link

ajt-en-france commented Sep 24, 2021

...
I found two issues. It's possible that I've not correctly built the binary from git, but:

1. It pulls the first audio file, which may be the audio description, not the regular audio track

Does FranceTV have a separate programme page for the AD version (as with BBC)? If so, the AD should only be used with that page's URL. Otherwise the AD should only be fetched if selected with -f.

There could be several audio tracks for the same programme on the same web pages, French, French with audio description and sometimes the original version "Version Originale" which could be English, Italian etc... On the France.tv web page you can make a selection, but if you use the browser based player it defaults to the French audio track even if the original program was in Italian for example. The audio described is never the default as far as I can tell.

When previously using the -F and then -f flags to pick streams I found it a bit unreliable, as you could often get French no matter which steam you picked with youtube-dl, even though the web player would pay the Italian for example... Montalbano sounds really strange in French...

@boulderob
Copy link

There could be several audio tracks for the same programme on the same web pages, French, French with audio description and sometimes the original version "Version Originale" which could be English, Italian etc... On the France.tv web page you can make a selection, but if you use the browser based player it defaults to the French audio track even if the original program was in Italian for example. The audio described is never the default as far as I can tell.

When previously using the -F and then -f flags to pick streams I found it a bit unreliable, as you could often get French no matter which steam you picked with youtube-dl, even though the web player would pay the Italian for example... Montalbano sounds really strange in French...

I can confirm in the past that the VO audio tracks were often NOT present in the -F listings or if it seemed like they were that when you downloaded and played them they were all french audio even if they said otherwise. Yet the VO audio tracks were playable from the site which means they were available. This just means that the files were not being picked up correctly by the francetv IE in the past.

My thought is that the new PR introduced here will pick up these VO files in the new mpd formats that weren't being picked up by the old francetv IE. So far with my limited testing of the new PR, qtz either picks up the AD or the VO depending on the video source. That's a good sign.

The bad part is that the current qtz format IDs don't always include the text "VO" or "Audio Description" in the format ids. So you have to guess when downloading with -f after viewing the specs with -F what it's going to be. Usually this is pretty easy because VO sources are usually not AD and vice versa based on my past usage on the site anyway.

The only way to solve this is to add "descriptive" text to the AD and VO format ids. But that's only possible if that type of text is available in the source manifests.

@boulderob
Copy link

The subtitles look like this:

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:900000,LOCAL:00:00:00.000

00:00:00.000 --> 00:00:33.033
...

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:1800000,LOCAL:00:00:00.000

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:2700000,LOCAL:00:00:00.000

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:3600000,LOCAL:00:00:00.000

00:00:03.300 --> 00:00:05.733
-Voilà.
-Allez, les filles !

00:00:06.166 --> 00:00:07.500
-Hello !

00:00:07.766 --> 00:00:10.366
-Et voilà les plus belles !

I followed this issue and commit a bit on the yt-dlp site a few months ago to fix the broken subtitles but I haven't tested it yet. If you can download subtitles but they are out of synch it really is still broken because they are unusable. I would think the order of the subtitle segments is already fixed in the manifest so I'm not sure how they get out of synch during a download.

The ability to downloading subtitles isn't really a 'fix' if they're out of order and don't work. I'm going to clone sarnaud's PR and see for myself where things stand in this regard. I presume the old HLS subtitles work fine though since they were working before

@dirkf
Copy link
Contributor

dirkf commented Sep 25, 2021

The ability to downloading subtitles isn't really a 'fix' if they're out of order and don't work. I'm going to clone sarnaud's PR and see for myself where things stand in this regard. I presume the old HLS subtitles work fine though since they were working before

Apparently yt-dlp has a whole module for reassembling WebVTT from fragment downloads, which might address the several ToDo items in the code related to WebVTT.

@pukkandan
Copy link
Contributor

pukkandan commented Sep 25, 2021

The subtitle downloading and reassembling code was submitted to youtube-dl first in #6144. Ofc, further improvements have been made to it in yt-dlp/yt-dlp#247. Hope either of these PRs and the discussions in them are helpful to you guys

@sarnoud
Copy link
Author

sarnoud commented Sep 25, 2021

  •                    * "url": A URL pointing to the subtitles resource
    
  •               With "url", a "protocol" entry (as for "formats" above)
    
  •               may be provided to indicate how the URL should be
    
  •               processed; by default it is a file downloaded by HTTP(S)
    

Done

@boulderob
Copy link

boulderob commented Sep 25, 2021

The ability to downloading subtitles isn't really a 'fix' if they're out of order and don't work. I'm going to clone sarnaud's PR and see for myself where things stand in this regard. I presume the old HLS subtitles work fine though since they were working before

Apparently yt-dlp has a whole module for reassembling WebVTT from fragment downloads, which might address the several ToDo items in the code related to WebVTT.

The subtitle downloading and reassembling code was submitted to youtube-dl first in #6144. Ofc, further improvements have been made to it in yt-dlp/yt-dlp#247. Hope either of these PRs and the discussions in them are helpful to you guys

Thanks to both of you for that! I just cloned sarnaud's PR repo but my guess is it would make more sense to clone yt-dlp to pick up the vtt ordering fixes already there and then try to merge sarnaud's PR #29996 into it vs the other way around due to the number of files changed in yt-dlp #247. @pukkandan are you going to merge in sarnaud's #29996 to yt-dlp or wait on it?

My merge skills are a little rusty. I presume I just need to open up remote to sarnaud fork, fetch his PR branch, merge to yt-dlp master (or my own branch of it) and fix conflicts if they arise. Sound right?

Thanks

Comment on lines +95 to +103
info = {
'title': None,
'subtitle': None,
'image': None,
'subtitles': {},
'duration': None,
'videos': [],
'formats': [],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems rather silly to stuff extracted information into an info dictionary only to extract each key back one by one at the end. The churn of changing formats into info['formats'] later in the extractor inflates the diff size and makes code harder to review without actually accomplishing anything.

'id': video_id,
'title': self._live_title(title) if is_live else title,
'title': self._live_title(info['title']) if is_live else info['title'],
'description': clean_html(info.get('synopsis')),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer extracted. The info dictionary issue is masking a bug here.

'duration': int_or_none(info.get('real_duration')) or parse_duration(info.get('duree')),
'thumbnail': info.get('image'),
'duration': int_or_none(info.get('duration')),
'timestamp': int_or_none(try_get(info, lambda x: x['diffusion']['timestamp'])),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise.

self._sort_formats(formats)
for f in info['formats']:
preference = 100
if f['format_id'].startswith('dash-audio_qtz=96000') or (f['format_id'].find('Description') >= 0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if f['format_id'].startswith('dash-audio_qtz=96000') or (f['format_id'].find('Description') >= 0):
if f.get('language') == 'qtz':

should work as well, does it not?

@Trit34
Copy link

Trit34 commented Oct 1, 2021

@sarnoud We could have a fix for DRM videos here:
#29956 (comment)

@Trit34
Copy link

Trit34 commented Oct 10, 2021

@sarnoud I know that youtube-dl has no vocation to bypass DRMs, but it did it before with France TV videos, so… Is there a way to code such a command in the francetv.py extractor?

ffmpeg -i "https://replayftv-vh.akamaihd.net/i/streaming-adaptatif_media-secure_france-dom-tom/2021/S40/J3/1045549522-615df4c0e3747-,standard1,standard2,standard3,standard4,qaa,.mp4.csmil/master.m3u8?hdnea=exp=1633877480~acl=%2fi%2fstreaming-adaptatif_media-secure_france-dom-tom%2f2021%2fS40%2fJ3%2f1045549522-615df4c0e3747-,standard1,standard2,standard3,standard4,qaa,.mp4.csmil*~hmac=326b54f56efb3fcb82c8c94862f25cac02cc295ed78901c87e90f41cee7fee8a" -c copy -bsf:a aac_adtstoasc "Murdoch.mp4"

(Based on the https://replayftv-vh.akamaihd.net/i/streaming-adaptatif_media-secure_france-dom-tom/YEAR/SWEEK/JDAY/PLURIMEDIA_ID-MANIFEST_ID_REVERSED-,standard1,standard2,standard3,standard4,qaa,.mp4.csmil/master.m3u8 pattern given by TiA4f8R: #29956 (comment))

@kdliss
Copy link

kdliss commented Oct 23, 2021

sorry - I got lost. is there any latest patch and howto for downloading e.g. https://www.france.tv/france-2/journal-20h00/2822769-edition-du-vendredi-22-octobre-2021.html
(I am sitting in China and having no direct stream or tunnel)
thanks

@Trit34
Copy link

Trit34 commented Dec 13, 2021

@sarnoud If you’re still around there, since I have updated Python to 3.10, your patch does not work anymore when I replace the original files by your patched ones:

$ youtube-dl -F https://www.france.tv/documentaires/art-culture/2941931-rochefort-noiret-marielle-les-copains-d-abord.html
Traceback (most recent call last):
  File "/usr/bin/youtube-dl", line 33, in <module>
    sys.exit(load_entry_point('youtube-dl==2021.6.6', 'console_scripts', 'youtube-dl')())
  File "/usr/bin/youtube-dl", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 162, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/lib/python3.10/site-packages/youtube_dl/__init__.py", line 3, in <module>
    from .common import FileDownloader
ModuleNotFoundError: No module named 'youtube_dl.common'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.