From patchwork Thu Jan 2 06:14:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: bloomtom X-Patchwork-Id: 17128 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 67E1744B3D3 for ; Thu, 2 Jan 2020 08:14:37 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3E67F68A4AB; Thu, 2 Jan 2020 08:14:37 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-40131.protonmail.ch (mail-40131.protonmail.ch [185.70.40.131]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2860168A4AB for ; Thu, 2 Jan 2020 08:14:31 +0200 (EET) Date: Thu, 02 Jan 2020 06:14:22 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=all8bits.com; s=protonmail; t=1577945670; bh=mp5dlfsT73QVFq9SMei3xMBuWlRg9+0GMHpszNqNFUs=; h=Date:To:From:Reply-To:Subject:Feedback-ID:From; b=3C1YeZavb3vS7mFo5DWlVrP2esOWRlxZI77LqIBVYV2RChL5hlIDDZ6P8uaeX2IMr u+g5MshUHH9SIpo2D5XZIPRT0jHPtNglf5SpxlVvLdwZUimmU80qdeYB3QHc0Q8qd+ 6WqZ52xx3jnivSQ0bGZ6LqZdOXUmepXGk8IDKzNI= To: "ffmpeg-devel@ffmpeg.org" From: bloomtom Message-ID: Feedback-ID: urJMyr31LALQL64CHRVie8YxGoMLqOSP58pscN63DB-9VP1hifOB8KvUx2T2Z12eFBIVgY3noa5LVaZeDB4rDw==:Ext:ProtonMail MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=7.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on mail.protonmail.ch Subject: [FFmpeg-devel] [PATCH] Removes linebreaks forbidden by the WEBVTT spec on encode X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" libavformat/webvttenc.c: The WEBVTT spec only allows one sequential linebreak (\r, \n) character within packet data. Two or more linebreaks in a row signifies the end of a data packet. Previous behavior allows data to be orphaned outside packets parsed by the spec in the best case, but some parsers simply refuse to process such vtt files. This patch shims packet data writing, skipping linebreak characters at the start and end of packet data, and replacing any number of sequential linebreaks between valid characters with a single linefeed. tests/ref/fate/sub-webvttenc: Modified to expect the new behavior in webvttenc. Signed-off-by: Tom Bloom --- libavformat/webvttenc.c | 43 ++++++++++++++++++++++++++++++++++-- tests/ref/fate/sub-webvttenc | 4 ---- 2 files changed, 41 insertions(+), 6 deletions(-) A (Normal spaces followed by a letter) A (No hard spaces followed by a letter) @@ -147,13 +145,11 @@ A (No hard spaces followed by a letter) Show this: \TEST and this: \-) 00:58.501 --> 01:00.500 - A letter followed by 05 hard spaces: A\h\h\h\h\h A letter followed by normal spaces: A A letter followed by no hard spaces: A 05 hard spaces between letters: A\h\h\h\h\hA 5 normal spaces between letters: A A - ^--Forced line break 01:00.501 --> 01:02.500 diff --git a/libavformat/webvttenc.c b/libavformat/webvttenc.c index 61b7f54622..8da2818aec 100644 --- a/libavformat/webvttenc.c +++ b/libavformat/webvttenc.c @@ -1,5 +1,6 @@ /* * Copyright (c) 2013 Matthew Heaney + * Copyright (c) 2020 Thomas Bloom * * This file is part of FFmpeg. * @@ -62,6 +63,42 @@ static int webvtt_write_header(AVFormatContext *ctx) return 0; } +static int is_linebreak(char c) +{ + return c == '\n' || c == '\r'; +} + +static int webvtt_write_data(AVIOContext *pb, uint8_t *pkt, int pkt_len) +{ + int start = 0; + int written = 0; + + // Fast forward to first non-linebreak. + while(start < pkt_len - 1 && is_linebreak(pkt[start])) { + start++; + } + + for (int i = start; i < pkt_len; i++) { + while(is_linebreak(pkt[i])) { + if (i == pkt_len - 1) { + // Hit end with no stop in linebreaks. + return written; + } + else if (!is_linebreak(pkt[i+1])) { + // write a single linefeed to cover all skipped. + avio_printf(pb, "\n"); + written++; + } + i++; + } + + avio_write(pb, &pkt[i], 1); + written++; + } + + return written; +} + static int webvtt_write_packet(AVFormatContext *ctx, AVPacket *pkt) { AVIOContext *pb = ctx->pb; @@ -88,8 +125,10 @@ static int webvtt_write_packet(AVFormatContext *ctx, AVPacket *pkt) avio_printf(pb, "\n"); - avio_write(pb, pkt->data, pkt->size); - avio_printf(pb, "\n"); + if (webvtt_write_data(pb, pkt->data, pkt->size) > 0) { + // Data not empty. Write a linefeed to divide packets in output. + avio_printf(pb, "\n"); + } return 0; } diff --git a/tests/ref/fate/sub-webvttenc b/tests/ref/fate/sub-webvttenc index 45ae0b6131..012f10a8ba 100644 --- a/tests/ref/fate/sub-webvttenc +++ b/tests/ref/fate/sub-webvttenc @@ -128,14 +128,12 @@ also hide these tags: but show this: {normal text} 00:54.501 --> 01:00.500 - \ N is a forced line break \ h is a hard space Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed. The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hhard\hspace.\h:-D 00:54.501 --> 00:56.500 - \h\h\h\h\hA (05 hard spaces followed by a letter)