From patchwork Sat Mar 30 00:08:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Tomas_H=C3=A4rdin?= X-Patchwork-Id: 47665 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:9f96:b0:1a3:b6bb:3029 with SMTP id mm22csp2270747pzb; Fri, 29 Mar 2024 17:09:08 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWo1HXrpeaBvo3YR4wMwi66hgfpG5jRIp9Zsmo7jUd4hEIZ6AXZrA5looQztZ760nNd5BSKsrMreAy/54Z85pKAqAgvi0t34dGjHQ== X-Google-Smtp-Source: AGHT+IHP80xgSm+jBDfMMV0oJsMzTVwcOxFmDDra02reZVr5iQJcXXXhn17bvHF6HS7hgjNrTdiB X-Received: by 2002:a17:906:6dd2:b0:a46:485a:3163 with SMTP id j18-20020a1709066dd200b00a46485a3163mr1975969ejt.6.1711757348555; Fri, 29 Mar 2024 17:09:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1711757348; cv=none; d=google.com; s=arc-20160816; b=Fy7mYSaXNCLLKBc7bIZ/Xqnl2eK89wr6DX4J44Ogj/gYO6FlhB+pkM8p4OrEhC11v8 hIK3wDEW38K/mBXchXPMuXSTtv+6MvZ+AGkm1Gp3dWkGIUjhZu98tPmvdvOQhLiAIFOn vjBHjCTB5R0Lw/vdbgopSX7gJeRR0vTPhxV2PmO1AQZ/7fMC/w4frD4r4cjLwxlFxekn SCQ/2UvYKKhe86N6NrR588haJNczyrovGrHRK/IvuJAIXTYayZ4kBlv8ha1+tT+CR/b9 NB27LcMOD5KbjPT5MOLrITp+Sx4GNbCpxTqHPFXynbNi6BPGkLn7ucXb0cNSiPkrXSid OYgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :mime-version:user-agent:references:in-reply-to:date:to:from :message-id:delivered-to; bh=1a8/pmOrwZ5F0pXfSRCuEC2ArLavSxU0GvkFNDl/LgU=; fh=e5zN9xSzcxLA6bGo3lF+CqTbY/oLwzApV03EO/RBfgQ=; b=Gt+g3IWntRa4qHG8yP3DmOrNEZ/Ik0ex40rE7HSd0rcPLimzBqtFhpvB5gu3r0bwt8 w9qSh5KAyXxoZWmry864DWRm94mwIBTfT0cba3zjGQFs75Qn+x8a9dGWpiiYD2NZZI8N oYOEgBNtQ4Tdnk3ZPTFpgoBhGhJjlK1TyK0bAaZO4EudmDEeOlRsm7HCzpAJb7Yiu8wf /9m0G7F2HeAur4ugV4Pbpbm3Zn4+hjd2UTYRrBKUub+ieTnlq+8YrGLZx7CmpSLj6dG/ ++Gu1SyCzisWfkws1/GBDU1reXkKcor9TDjmEeoe0Frl7WBOSbBJnY7TFHMmsbPLbv69 KojA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dk17-20020a170907941100b00a4749f4632dsi2381943ejc.43.2024.03.29.17.09.08; Fri, 29 Mar 2024 17:09:08 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 41F2B68D466; Sat, 30 Mar 2024 02:09:05 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from glom.nmugroup.com (glom.nmugroup.com [193.183.80.6]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D981A68D33F for ; Sat, 30 Mar 2024 02:08:58 +0200 (EET) Received: from localhost (localhost [127.0.0.1]) by glom.nmugroup.com (Postfix) with ESMTP id 6BEFA5429BA5 for ; Sat, 30 Mar 2024 01:08:58 +0100 (CET) Received: from [192.168.1.110] (217-211-185-91-no2430.tbcn.telia.com [217.211.185.91]) (Authenticated sender: git01) by glom.nmugroup.com (Postfix) with ESMTPSA id 3A2315429ACC for ; Sat, 30 Mar 2024 01:08:58 +0100 (CET) Message-ID: From: Tomas =?iso-8859-1?q?H=E4rdin?= To: FFmpeg development discussions and patches Date: Sat, 30 Mar 2024 01:08:57 +0100 In-Reply-To: References: User-Agent: Evolution 3.46.4-2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] lavf/subtitles: Add ff_text_peek_r16(), only accept \r, \n, \r\n and \r\r\n line endings X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: bbCCEN/lyEKX Here's an alternative first patch that rolls patch 1+3 into one. I'd like some feedback on this before I continue hacking on patch 2. While I don't like that we accept any old broken srt file, especially without knowing what software made it, I'm not completely opposed to compromising in this specific case. But I'd rather we didn't, and stuck to \r, \n and \r\n. What I really don't want is runs of \r being eaten without being "terminated" by a \n, because this messes up Mac support. /Tomas From 2ec68c51e4599b8493a2e103793f571451d872d3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= Date: Thu, 28 Mar 2024 20:30:37 +0100 Subject: [PATCH 1/2] lavf/subtitles: Add ff_text_peek_r16(), only accept \r, \n, \r\n and \r\r\n line endings --- libavformat/subtitles.c | 53 +++++++++++++++++++++++++++++++++++++---- libavformat/subtitles.h | 5 ++++ 2 files changed, 53 insertions(+), 5 deletions(-) diff --git a/libavformat/subtitles.c b/libavformat/subtitles.c index 3413763c7b..01187df6ab 100644 --- a/libavformat/subtitles.c +++ b/libavformat/subtitles.c @@ -22,6 +22,7 @@ #include "subtitles.h" #include "avio_internal.h" #include "libavutil/avstring.h" +#include "libavutil/intreadwrite.h" void ff_text_init_avio(void *s, FFTextReader *r, AVIOContext *pb) { @@ -106,6 +107,42 @@ int ff_text_peek_r8(FFTextReader *r) return c; } +int ff_text_peek_r16(FFTextReader *r) +{ + int c1, c2; + if (r->buf_pos < r->buf_len - 1) + return AV_RB16(&r->buf[r->buf_pos]); + + // missing one or two bytes + c1 = ff_text_r8(r); + if (avio_feof(r->pb)) + return 0; + + if (r->buf_pos == r->buf_len - 1) { + // missing one byte + r->buf[0] = r->buf[r->buf_pos]; + r->buf[1] = c1; + r->buf_pos = 0; + r->buf_len = 2; + return AV_RB16(r->buf); + } + + // missing two bytes + c2 = ff_text_r8(r); + if (avio_feof(r->pb)) { + r->buf[0] = c1; + r->buf_pos = 0; + r->buf_len = 1; + return 0; + } + + r->buf[0] = c1; + r->buf[1] = c2; + r->buf_pos = 0; + r->buf_len = 2; + return AV_RB16(r->buf); +} + AVPacket *ff_subtitles_queue_insert(FFDemuxSubtitlesQueue *q, const uint8_t *event, size_t len, int merge) { @@ -446,11 +483,12 @@ int ff_subtitles_read_chunk(AVIOContext *pb, AVBPrint *buf) ptrdiff_t ff_subtitles_read_line(FFTextReader *tr, char *buf, size_t size) { size_t cur = 0; + unsigned char c; if (!size) return 0; buf[0] = '\0'; while (cur + 1 < size) { - unsigned char c = ff_text_r8(tr); + c = ff_text_r8(tr); if (!c) return ff_text_eof(tr) ? cur : AVERROR_INVALIDDATA; if (c == '\r' || c == '\n') @@ -458,9 +496,14 @@ ptrdiff_t ff_subtitles_read_line(FFTextReader *tr, char *buf, size_t size) buf[cur++] = c; buf[cur] = '\0'; } - while (ff_text_peek_r8(tr) == '\r') - ff_text_r8(tr); - if (ff_text_peek_r8(tr) == '\n') - ff_text_r8(tr); + if (c == '\r') { + if (ff_text_peek_r8(tr) == '\n') + ff_text_r8(tr); + else if (ff_text_peek_r16(tr) == AV_RB16("\r\n")) { + // ticket5032-rrn.srt has \r\r\n + ff_text_r8(tr); + ff_text_r8(tr); + } + } return cur; } diff --git a/libavformat/subtitles.h b/libavformat/subtitles.h index 88665663c5..2a92044976 100644 --- a/libavformat/subtitles.h +++ b/libavformat/subtitles.h @@ -94,6 +94,11 @@ int ff_text_eof(FFTextReader *r); */ int ff_text_peek_r8(FFTextReader *r); +/** + * Like ff_text_peek_r8(), but peek two bytes and return them as a big-endian number. + */ +int ff_text_peek_r16(FFTextReader *r); + /** * Read the given number of bytes (in UTF-8). On error or EOF, \0 bytes are * written. -- 2.39.2