From patchwork Sat Mar 30 00:08:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Tomas_H=C3=A4rdin?= X-Patchwork-Id: 47665 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:9f96:b0:1a3:b6bb:3029 with SMTP id mm22csp2270747pzb; Fri, 29 Mar 2024 17:09:08 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWo1HXrpeaBvo3YR4wMwi66hgfpG5jRIp9Zsmo7jUd4hEIZ6AXZrA5looQztZ760nNd5BSKsrMreAy/54Z85pKAqAgvi0t34dGjHQ== X-Google-Smtp-Source: AGHT+IHP80xgSm+jBDfMMV0oJsMzTVwcOxFmDDra02reZVr5iQJcXXXhn17bvHF6HS7hgjNrTdiB X-Received: by 2002:a17:906:6dd2:b0:a46:485a:3163 with SMTP id j18-20020a1709066dd200b00a46485a3163mr1975969ejt.6.1711757348555; Fri, 29 Mar 2024 17:09:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1711757348; cv=none; d=google.com; s=arc-20160816; b=Fy7mYSaXNCLLKBc7bIZ/Xqnl2eK89wr6DX4J44Ogj/gYO6FlhB+pkM8p4OrEhC11v8 hIK3wDEW38K/mBXchXPMuXSTtv+6MvZ+AGkm1Gp3dWkGIUjhZu98tPmvdvOQhLiAIFOn vjBHjCTB5R0Lw/vdbgopSX7gJeRR0vTPhxV2PmO1AQZ/7fMC/w4frD4r4cjLwxlFxekn SCQ/2UvYKKhe86N6NrR588haJNczyrovGrHRK/IvuJAIXTYayZ4kBlv8ha1+tT+CR/b9 NB27LcMOD5KbjPT5MOLrITp+Sx4GNbCpxTqHPFXynbNi6BPGkLn7ucXb0cNSiPkrXSid OYgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :mime-version:user-agent:references:in-reply-to:date:to:from :message-id:delivered-to; bh=1a8/pmOrwZ5F0pXfSRCuEC2ArLavSxU0GvkFNDl/LgU=; fh=e5zN9xSzcxLA6bGo3lF+CqTbY/oLwzApV03EO/RBfgQ=; b=Gt+g3IWntRa4qHG8yP3DmOrNEZ/Ik0ex40rE7HSd0rcPLimzBqtFhpvB5gu3r0bwt8 w9qSh5KAyXxoZWmry864DWRm94mwIBTfT0cba3zjGQFs75Qn+x8a9dGWpiiYD2NZZI8N oYOEgBNtQ4Tdnk3ZPTFpgoBhGhJjlK1TyK0bAaZO4EudmDEeOlRsm7HCzpAJb7Yiu8wf /9m0G7F2HeAur4ugV4Pbpbm3Zn4+hjd2UTYRrBKUub+ieTnlq+8YrGLZx7CmpSLj6dG/ ++Gu1SyCzisWfkws1/GBDU1reXkKcor9TDjmEeoe0Frl7WBOSbBJnY7TFHMmsbPLbv69 KojA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dk17-20020a170907941100b00a4749f4632dsi2381943ejc.43.2024.03.29.17.09.08; Fri, 29 Mar 2024 17:09:08 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 41F2B68D466; Sat, 30 Mar 2024 02:09:05 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from glom.nmugroup.com (glom.nmugroup.com [193.183.80.6]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D981A68D33F for ; Sat, 30 Mar 2024 02:08:58 +0200 (EET) Received: from localhost (localhost [127.0.0.1]) by glom.nmugroup.com (Postfix) with ESMTP id 6BEFA5429BA5 for ; Sat, 30 Mar 2024 01:08:58 +0100 (CET) Received: from [192.168.1.110] (217-211-185-91-no2430.tbcn.telia.com [217.211.185.91]) (Authenticated sender: git01) by glom.nmugroup.com (Postfix) with ESMTPSA id 3A2315429ACC for ; Sat, 30 Mar 2024 01:08:58 +0100 (CET) Message-ID: From: Tomas =?iso-8859-1?q?H=E4rdin?= To: FFmpeg development discussions and patches Date: Sat, 30 Mar 2024 01:08:57 +0100 In-Reply-To: References: User-Agent: Evolution 3.46.4-2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] lavf/subtitles: Add ff_text_peek_r16(), only accept \r, \n, \r\n and \r\r\n line endings X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: bbCCEN/lyEKX Here's an alternative first patch that rolls patch 1+3 into one. I'd like some feedback on this before I continue hacking on patch 2. While I don't like that we accept any old broken srt file, especially without knowing what software made it, I'm not completely opposed to compromising in this specific case. But I'd rather we didn't, and stuck to \r, \n and \r\n. What I really don't want is runs of \r being eaten without being "terminated" by a \n, because this messes up Mac support. /Tomas From 2ec68c51e4599b8493a2e103793f571451d872d3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= Date: Thu, 28 Mar 2024 20:30:37 +0100 Subject: [PATCH 1/2] lavf/subtitles: Add ff_text_peek_r16(), only accept \r, \n, \r\n and \r\r\n line endings --- libavformat/subtitles.c | 53 +++++++++++++++++++++++++++++++++++++---- libavformat/subtitles.h | 5 ++++ 2 files changed, 53 insertions(+), 5 deletions(-) diff --git a/libavformat/subtitles.c b/libavformat/subtitles.c index 3413763c7b..01187df6ab 100644 --- a/libavformat/subtitles.c +++ b/libavformat/subtitles.c @@ -22,6 +22,7 @@ #include "subtitles.h" #include "avio_internal.h" #include "libavutil/avstring.h" +#include "libavutil/intreadwrite.h" void ff_text_init_avio(void *s, FFTextReader *r, AVIOContext *pb) { @@ -106,6 +107,42 @@ int ff_text_peek_r8(FFTextReader *r) return c; } +int ff_text_peek_r16(FFTextReader *r) +{ + int c1, c2; + if (r->buf_pos < r->buf_len - 1) + return AV_RB16(&r->buf[r->buf_pos]); + + // missing one or two bytes + c1 = ff_text_r8(r); + if (avio_feof(r->pb)) + return 0; + + if (r->buf_pos == r->buf_len - 1) { + // missing one byte + r->buf[0] = r->buf[r->buf_pos]; + r->buf[1] = c1; + r->buf_pos = 0; + r->buf_len = 2; + return AV_RB16(r->buf); + } + + // missing two bytes + c2 = ff_text_r8(r); + if (avio_feof(r->pb)) { + r->buf[0] = c1; + r->buf_pos = 0; + r->buf_len = 1; + return 0; + } + + r->buf[0] = c1; + r->buf[1] = c2; + r->buf_pos = 0; + r->buf_len = 2; + return AV_RB16(r->buf); +} + AVPacket *ff_subtitles_queue_insert(FFDemuxSubtitlesQueue *q, const uint8_t *event, size_t len, int merge) { @@ -446,11 +483,12 @@ int ff_subtitles_read_chunk(AVIOContext *pb, AVBPrint *buf) ptrdiff_t ff_subtitles_read_line(FFTextReader *tr, char *buf, size_t size) { size_t cur = 0; + unsigned char c; if (!size) return 0; buf[0] = '\0'; while (cur + 1 < size) { - unsigned char c = ff_text_r8(tr); + c = ff_text_r8(tr); if (!c) return ff_text_eof(tr) ? cur : AVERROR_INVALIDDATA; if (c == '\r' || c == '\n') @@ -458,9 +496,14 @@ ptrdiff_t ff_subtitles_read_line(FFTextReader *tr, char *buf, size_t size) buf[cur++] = c; buf[cur] = '\0'; } - while (ff_text_peek_r8(tr) == '\r') - ff_text_r8(tr); - if (ff_text_peek_r8(tr) == '\n') - ff_text_r8(tr); + if (c == '\r') { + if (ff_text_peek_r8(tr) == '\n') + ff_text_r8(tr); + else if (ff_text_peek_r16(tr) == AV_RB16("\r\n")) { + // ticket5032-rrn.srt has \r\r\n + ff_text_r8(tr); + ff_text_r8(tr); + } + } return cur; } diff --git a/libavformat/subtitles.h b/libavformat/subtitles.h index 88665663c5..2a92044976 100644 --- a/libavformat/subtitles.h +++ b/libavformat/subtitles.h @@ -94,6 +94,11 @@ int ff_text_eof(FFTextReader *r); */ int ff_text_peek_r8(FFTextReader *r); +/** + * Like ff_text_peek_r8(), but peek two bytes and return them as a big-endian number. + */ +int ff_text_peek_r16(FFTextReader *r); + /** * Read the given number of bytes (in UTF-8). On error or EOF, \0 bytes are * written. -- 2.39.2 From patchwork Thu Mar 28 23:06:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Tomas_H=C3=A4rdin?= X-Patchwork-Id: 47611 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:9f96:b0:1a3:b6bb:3029 with SMTP id mm22csp1665200pzb; Thu, 28 Mar 2024 16:06:50 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXFIeMPsuu6YYPD0SA8jPjBteth2lWy/jknUyaRErCF5w1PnCNsfM5RvGtb0oYf54e33mzMvvwfPpEmk0Th+QVaYem2slAn06tnUg== X-Google-Smtp-Source: AGHT+IH1UXBnUAkIiGrAJw3eg0yCdcbwDWFV8SP03AJkgxQyLAf7uPsNgY3oGxjXJy1RmTRGFYhL X-Received: by 2002:a17:907:9041:b0:a4e:2653:d54c with SMTP id az1-20020a170907904100b00a4e2653d54cmr308486ejc.6.1711667210145; Thu, 28 Mar 2024 16:06:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1711667210; cv=none; d=google.com; s=arc-20160816; b=qhWIErBs5dHWhc04jFGAiACRky9D5jDa0NwTua6hbVJ3z86AwF7ilJtaf5N0wKd07I ooQaUKNUWHQq0X32Egx1MYkM4ghnGVgxFxCMCcEzByvN2uIHqSer61zXosmFMJ6+Lhwc ZEqXOU65EtLllUFhmkavQmIif3HL3/AjD5kfY3NsRsvzTvJeC5Eml4p2+JZ7byAiruCq eeSyx0NXW4/qcqGovQVGXiR5IsefMWXH93EDjcw8P2srhxKTWuGC3z7EmhR0cQkMwdoE vt5EMdJn4H+bkj4MreybCf6zFXPXvoLokztOfwiia4DwIj+UkyS485jxRU/F07AxfJjX ii0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :mime-version:user-agent:references:in-reply-to:date:to:from :message-id:delivered-to; bh=nFlPthoVEW+l2kdQFyMPhjE6Oj1Sx+WYhpzpB65G/z8=; fh=e5zN9xSzcxLA6bGo3lF+CqTbY/oLwzApV03EO/RBfgQ=; b=h71ihT7LDzTYpkBjXt0I+BEtFCCudKtyLt1AD33C0C5wpSh5Bb7xLJHOA7h/zzaby3 6s6Zwk2N5gMO+U5xUQjTXpiWs/I+JHkmi3w30FO4lJHpq06wy6lnLmSvAW7Y16sP2+kI W7XADeBKUW8Gs9Fxtxc7v6mvd1Nj5GZrUGsPqR41GCWSFpb42K+tz+7KB+iSItywuSjv 5wOfpqbWjyaUMl3Lk8qV/GuAvUkhjzwKC4pwtYOei90C7/wRjrfkZZ4JSKkl0m7OcIaT KIfeqZQh5bPOGiDt1kyNjOtyJfaSVs0bVXp7MIexGaC9OpKoLIEq9oXfiYWiCQBHzsCX thNw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d24-20020a1709067f1800b00a4e086884eesi1132610ejr.850.2024.03.28.16.06.49; Thu, 28 Mar 2024 16:06:50 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 708B968D391; Fri, 29 Mar 2024 01:06:46 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from glom.nmugroup.com (glom.nmugroup.com [193.183.80.6]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3C63368D509 for ; Fri, 29 Mar 2024 01:06:39 +0200 (EET) Received: from localhost (localhost [127.0.0.1]) by glom.nmugroup.com (Postfix) with ESMTP id C13D65429F80 for ; Fri, 29 Mar 2024 00:06:38 +0100 (CET) Received: from debian.lan (unknown [IPv6:2a00:66c0:a::72c]) (Authenticated sender: git01) by glom.nmugroup.com (Postfix) with ESMTPSA id 7CDCE5429E47 for ; Fri, 29 Mar 2024 00:06:37 +0100 (CET) Message-ID: <6e0b44bc6313c994833c2c7a6622e9d8c8fc324f.camel@haerdin.se> From: Tomas =?iso-8859-1?q?H=E4rdin?= To: FFmpeg development discussions and patches Date: Fri, 29 Mar 2024 00:06:36 +0100 In-Reply-To: References: User-Agent: Evolution 3.46.4-2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] lavf/subtitles: Unfix ticket #5032 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ixEj+kJPcZre Obviously the most controversial patch in this set. I'd like to know what program exactly that produced the broken file in ticket #5032 so that we can send the guilty party to parsing jail More seriously, eating any run of CR CR CR... is incredibly broken and is part of the reason for the convoluted parsing logic in srtdec before this patchset. A compromise solution could be to peek two bytes forward and see if the line ending is CR CR LF. My guess is the file was produced by running unix2dos on a file with CR LF, thus converting the LF to CR LF. webvttdec is better with this, probably because these kinds of line ending is explicitly banned by the WebVTT spec. That is, CR CR LF is to be treated as two line endings, no ifs or buts about it. /Tomas From 784672af12da1d5fefeefad83cffa4b31f8b50fd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= Date: Thu, 28 Mar 2024 23:30:06 +0100 Subject: [PATCH 3/3] lavf/subtitles: Unfix ticket #5032 This ticket should have been marked as wontfix so as to not encourage completely broken muxers. --- libavformat/subtitles.c | 10 ++-------- tests/fate/subtitles.mak | 3 --- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/libavformat/subtitles.c b/libavformat/subtitles.c index bda549abd0..11fa89f5eb 100644 --- a/libavformat/subtitles.c +++ b/libavformat/subtitles.c @@ -459,13 +459,7 @@ ptrdiff_t ff_subtitles_read_line(FFTextReader *tr, char *buf, size_t size) buf[cur++] = c; buf[cur] = '\0'; } - // don't eat \n\n - if (c == '\r') { - // sub/ticket5032-rrn.srt has \r\r\n - while (ff_text_peek_r8(tr) == '\r') - ff_text_r8(tr); - if (ff_text_peek_r8(tr) == '\n') - ff_text_r8(tr); - } + if (c == '\r' && ff_text_peek_r8(tr) == '\n') + ff_text_r8(tr); return cur; } diff --git a/tests/fate/subtitles.mak b/tests/fate/subtitles.mak index 90412e9ac1..940cd9a9ec 100644 --- a/tests/fate/subtitles.mak +++ b/tests/fate/subtitles.mak @@ -73,9 +73,6 @@ fate-sub-srt: CMD = fmtstdout ass -i $(TARGET_SAMPLES)/sub/SubRip_capability_tes FATE_SUBTITLES_ASS-$(call DEMDEC, SRT, SUBRIP) += fate-sub-srt-badsyntax fate-sub-srt-badsyntax: CMD = fmtstdout ass -i $(TARGET_SAMPLES)/sub/badsyntax.srt -FATE_SUBTITLES-$(call ALLYES, SRT_DEMUXER SUBRIP_DECODER SRT_MUXER) += fate-sub-srt-rrn-remux -fate-sub-srt-rrn-remux: CMD = fmtstdout srt -i $(TARGET_SAMPLES)/sub/ticket5032-rrn.srt -c:s copy - FATE_SUBTITLES-$(call ALLYES, SRT_DEMUXER SUBRIP_DECODER SRT_MUXER) += fate-sub-srt-madness-timeshift fate-sub-srt-madness-timeshift: CMD = fmtstdout srt -itsoffset 3.14 -i $(TARGET_SAMPLES)/sub/madness.srt -c:s copy -- 2.39.2