Skip to content

Conversation

@Kagami
Copy link
Contributor

@Kagami Kagami commented Apr 26, 2016

Fixes #9321

This is WIP, I don't know how to handle HLS subtitles yet. But otherwise everything seems to be working.

@Kagami Kagami force-pushed the vlive-hls branch 4 times, most recently from f5f93ec to bcd4e31 Compare April 26, 2016 17:01
@Kagami Kagami changed the title Add support for live vlive.tv videos [vlive] Add support for live videos Apr 26, 2016
@Kagami
Copy link
Contributor Author

Kagami commented Apr 27, 2016

Added correct sub URLs, though both ffmpeg with -i url.m3u8 out.vtt and youtube-dl with --write-sub can't currently download them. Any ideas?

@Kagami
Copy link
Contributor Author

Kagami commented Apr 27, 2016

Get it, ffmpeg actually downloads HLS WebVTT fine, just in a bit strange way. Though youtube-dl --write-sub currently can't handle such streams. So, in order to support it properly, we need to:

  • Indicate HLS/stream in subtitle entry, e.g. {'ext': 'vtt', 'url': 'http://', 'hls': True}
  • Add support for dumping HLS to --write-sub routine
  • Add option to not download video in case of --write-sub because it doesn't make sense in our case: subs writing will work until stream is ended, so user will need to run youtube-dl with different options twice; or maybe spawn --write-sub in separate process?

What do you think?

@yan12125
Copy link
Collaborator

FYI: Seems #6144 implements downloading subtitles from m3u8 playlists.

@Kagami
Copy link
Contributor Author

Kagami commented Apr 27, 2016

Thanks, didn't know about it. Though it seems to be stalled? If that would be easier for you, I can comment subtitle field for now and make separate PR. Grabbing live video streams is useful even without subs.

@yan12125
Copy link
Collaborator

In view-source:http://static.vlive.tv/desktop/resources/generated/js/page/2016033117/video.min.js:

            case "VOD_ON_AIR":
                if (a.vodId && a.vodInKey) {
                    this._vodOnAir(a)
                } else {
                    if (a.isLive) {
                        this._liveEnd()
                    } else {
                        this._comingSoon()
                    }
                }
                break

Maybe youtube-dl should follow the same logic?

for f in formats],
'vie': [{'ext': 'vtt', 'url': self._get_sub_url(f['url'], 'vie')}
for f in formats],
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all live videos have subtitles. For example http://www.vlive.tv/video/7871/1996-%EC%95%88%EC%8A%B9%EC%A4%80-%EC%9C%A4%EC%A0%95%EC%9E%AC-%EB%9D%BC%EC%9D%B4%EB%B8%8C does not have a subtitle when I just checked. By the way, I guess _get_live_sub_url is a better function name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found anything similar in Live API, seems like the only way to determine subtitle status is to download playlist.m3u8 and parse it manually. It looks like this:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="Vietnamese",FORCED=NO,AUTOSELECT=YES,URI="subtitlelist_lvie.m3u8",LANGUAGE="vie"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",FORCED=NO,AUTOSELECT=YES,URI="subtitlelist_leng.m3u8",LANGUAGE="eng"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="zcn",FORCED=NO,AUTOSELECT=YES,URI="subtitlelist_lzcn.m3u8",LANGUAGE="zcn"
#EXT-X-STREAM-INF:BANDWIDTH=1464974,CODECS="avc1.66.22,mp4a.40.2",RESOLUTION=640x368,SUBTITLES="subs"
chunklist.m3u8?__agda__=1461853927_d2d91835e251112f931019235adf7e99

Are you ok with that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing should be the way. It's better than hard-coded values in general. An approach is modifying _extract_m3u8_formats so that it can handle subtitles, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An approach is modifying _extract_m3u8_formats so that it can handle subtitles, too.

i already proposed this in #8820 as a first step to handle WEBVTT m3u8 subtitles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be very cool if we can get it merged, don't want to reimplement already written code.

@yan12125
Copy link
Collaborator

For your previous question on how to download live subtitles, there are two options:

  • Implemented it in pure Python HlsFD
  • Create a thread for downloading subtitles.

There's an issue on ihe second approach. We need to add an option to disable subtitle downloading in ffmpeg to prevent duplicated traffic.
If you already have an implementation, please don't include it in this PR. Open a new one - it's not a good idea to add too many features in a single PR.

@Kagami
Copy link
Contributor Author

Kagami commented Apr 28, 2016

Implemented it in pure Python HlsFD
Create a thread for downloading subtitles

If we won't spawn another thread for --write-sub then we won't be able to download video, so I think we need to spawn thread anyway?

We need to add an option to disable subtitle downloading in ffmpeg to prevent duplicated traffic

What do you mean by duplicated? ffmpeg only follows chunklist.m3u8 stream which doesn't contain subtitles as far as I can tell.

Open a new one

Do you want me to leave subtitle URLs handling in this PR? --write-sub won't work, though someone might use those URLs with youtube-dl --dump-json.

Also, I'm not sure regarding subtitles streams for various qualities. Flash player will use 1200.stream/subtitlelist_l<lang>.m3u8 with 1200.stream/playlist.m3u8 and 400.stream/subtitlelist_l<lang>.m3u8 with 400.stream/playlist.m3u8, though 1200.stream/subtitlelist_l<lang>.m3u8 and 400.stream/subtitlelist_l<lang>.m3u8 seems to be the same. Do we need to provide them all or only one?

@yan12125
Copy link
Collaborator

Haven't looked into the spec of M3U8 manifests. If each video segment corresponds to exactly one subtitle segment, then the subtitle can be downloaded immediately after the video segment.

What do you mean by duplicated?

I have seen "ffmpeg can handle subtitles" somewhere. I found it's not merged yet (https://trac.ffmpeg.org/ticket/2833), so the poster may be using a custom fork of ffmpeg.

Do you want me to leave subtitle URLs handling in this PR?

Comment it out and add some comments.

@Kagami
Copy link
Contributor Author

Kagami commented Apr 28, 2016

Btw, ffmpeg HLS dumper seems to be very slow, audio goes out of sync for high-res streams. I get much better results with livestreamer "hlsvariant://..." -o out.ts. We might want to implement --hls-prefer-livestreamer. Or improve hls.py though I don't know protocol well and don't have that much time so I would like to use external downloaders if you don't mind.

@yan12125
Copy link
Collaborator

Could you try again by --hls-prefer-native to youtube-dl? By default youtube-dl uses ffmpeg to download HLS streams, and --hls-prefer-native select the embedded pure Python (incomplete) implementation of the HLS protocol.

@Kagami
Copy link
Contributor Author

Kagami commented Apr 28, 2016

I tried it, but native downloader seems to be not working on vlive streams. You could try it right away with python -m youtube_dl http://www.vlive.tv/video/7880 --hls-prefer-native on my branch. Output:

$ python -m youtube_dl 'http://www.vlive.tv/video/7880' --hls-prefer-native -v
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['http://www.vlive.tv/video/7880', '--hls-prefer-native', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.04.24
[debug] Git HEAD: 14b6e54
[debug] Python version 3.4.3 - Linux-4.5.0-gentoo-x86_64-Intel-R-_Core-TM-_i7-3820_CPU_@_3.60GHz-with-gentoo-2.2
[debug] exe versions: ffmpeg N-79663-ge639f50, ffprobe N-79663-ge639f50, rtmpdump 2.4
[debug] Proxy map: {}
[vlive] 7880: Downloading webpage
[vlive] 7880: Downloading JSON status
[debug] Invoking downloader on 'http://vlive.hls.edgesuite.net/hk002/400.stream/playlist.m3u8?__gda__=1461877422_547a10c5467c1b3aabe2b6d13b08be52'
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 1
[download] Destination: [V LIVE] MTV - Minh Hang on the stage-7880.mp4
[download] 100% of 211.00B in 00:00
[ffmpeg] Fixing malformated aac bitstream in "[V LIVE] MTV - Minh Hang on the stage-7880.mp4"
[debug] ffmpeg command line: ffmpeg -y -i 'file:[V LIVE] MTV - Minh Hang on the stage-7880.mp4' -c copy -f mp4 -bsf:a aac_adtstoasc 'file:[V LIVE] MTV - Minh Hang on the stage-7880.temp.mp4'
^C
ERROR: Interrupted by user

@Kagami
Copy link
Contributor Author

Kagami commented Apr 28, 2016

If I pass it chunklist.m3u8 URL instead it downloads only 3 fragments (~5 seconds of video) and exits. Seems like it's not suited for live translations?

@yan12125
Copy link
Collaborator

Yep HlsFD does not support live streams yet. A thread is necessary.


def _live(self, video_id, webpage, live_params):
formats = [{
'url': vid['cdnUrl'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should extract chunklists via self._extract_m3u8_formats here.

@Kagami
Copy link
Contributor Author

Kagami commented Apr 28, 2016

A bit unrelated question. Is it possible to get subtitle URL without downloading it, similar to --get-url for video?

@yan12125
Copy link
Collaborator

No direct options. You can dump the JSON dict with -j and everything is in it.

@Kagami
Copy link
Contributor Author

Kagami commented Apr 28, 2016

@yan12125 addressed all comments. Removed subtitles handling because it depends on #8820 (#6144 also has similar code).

@@ -1,8 +1,12 @@
# coding: utf-8
from __future__ import division
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge them: from __future__ import division, unicode_literals

r'vlive\.tv\.video\.ajax\.request\.handler\.init\(\s*"[0-9]+"\s*,\s*"[^"]*"\s*,\s*"[^"]+"\s*,\s*"([^"]+)"',
webpage, 'key')
video_params = self._search_regex(
r'vlive\.tv\.video\.ajax\.request\.handler\.init\((.*)\)',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use ([^;]+) instead of (.*). It reduces the probability of matching something unexpected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semicolon might appear in live params, it's not safe to match like that.

Copy link
Collaborator

@yan12125 yan12125 Apr 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK Then (.+)

@Kagami
Copy link
Contributor Author

Kagami commented Apr 29, 2016

@yan12125 addressed all comments. Tested on few lives, seems like it works as expected.

@Kagami
Copy link
Contributor Author

Kagami commented Apr 29, 2016

There is issue with default format ids generated by _extract_m3u8_formats. It appends bitrate after the dash but it constantly changes in case of vlive. I think we should delete everything after - because there is only single video stream for each m3u8.

@yan12125
Copy link
Collaborator

You can override format_id.

@Kagami
Copy link
Contributor Author

Kagami commented Apr 29, 2016

Extended _extract_m3u8_formats.

@yan12125
Copy link
Collaborator

What's the meaning of media_name_fallback? Not quite intuitive. Maybe live, which defaults to False, is better.

@Kagami
Copy link
Contributor Author

Kagami commented Apr 29, 2016

Fixed.

@@ -1139,7 +1139,8 @@ def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
if m3u8_id:
format_id.append(m3u8_id)
last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') != 'SUBTITLES' else None
format_id.append(last_media_name if last_media_name else '%d' % (tbr if tbr else len(formats)))
if last_media_name or not live:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be and.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add some comments for why not adding bitrates for live streams.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be and

We may result in identical format ids in that case. Though there might be several streams without NAME too. What about simple counter? E.g. fid, fid-1, fid-2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@remitamine remitamine Apr 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though there might be several streams without NAME too.

according to the standard NAME attribute is REQUIRED.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a feature is already implemented

Ah, that's cool, fixed.

according to the standard NAME attribute is REQUIRED

Not in case of vlive... (See example playlist above.)

@yan12125
Copy link
Collaborator

Looks good. The final bit: add 'is_live' field to the returned dict.

@Kagami
Copy link
Contributor Author

Kagami commented Apr 29, 2016

Fixed and squashed.

@yan12125 yan12125 merged commit b24d633 into ytdl-org:master Apr 29, 2016
@yan12125
Copy link
Collaborator

Merged. Thanks for the work and sorry for the rushes.

@Kagami
Copy link
Contributor Author

Kagami commented Apr 29, 2016

Thanks!

@Kagami Kagami deleted the vlive-hls branch April 29, 2016 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants