[FFmpeg-devel,1/2] doc/ffmpeg: rewrite the detailed description chapter

Message ID	20241004074613.21038-1-anton@khirnov.net
State	New
Headers	show Delivered-To: ffmpegpatchwork2@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; From: Anton Khirnov <anton@khirnov.net> To: ffmpeg-devel@ffmpeg.org Date: Fri, 4 Oct 2024 09:46:10 +0200 Message-ID: <20241004074613.21038-1-anton@khirnov.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] doc/ffmpeg: rewrite the detailed description chapter Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Series	[FFmpeg-devel,1/2] doc/ffmpeg: rewrite the detailed description chapter \| expand [FFmpeg-devel,1/2] doc/ffmpeg: rewrite the detailed description chapter [FFmpeg-devel,2/2] doc/ffmpeg.texi: add a diagram for the loopback decoder example

Context	Check	Description
yinshiyou/make_loongarch64	success	Make finished
yinshiyou/make_fate_loongarch64	success	Make fate finished
andriy/make_x86	success	Make finished
andriy/make_fate_x86	success	Make fate finished

diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi index de140067ae..e17c17bcd7 100644 --- a/doc/ffmpeg.texi +++ b/doc/ffmpeg.texi @@ -87,140 +87,405 @@ The format option may be needed for raw input files. @chapter Detailed description @c man begin DETAILED DESCRIPTION -The transcoding process in @command{ffmpeg} for each output can be described by -the following diagram: +@command{ffmpeg} builds a transcoding pipeline out of the components listed +below. The program's operation then consists of input data chunks flowing from +the sources down the pipes towards the sinks, while being transformed by the +components they encounter along the way. +The following kinds of components are available: +@itemize +@item +@emph{Demuxers} (short for "demultiplexers") read an input source in order to +extract + +@itemize +@item +global properties such as metadata or chapters; +@item +list of input elementary streams and their properties +@end itemize + +One demuxer instance is created for each @option{-i} option, and sends encoded +@emph{packets} to @emph{decoders} or @emph{muxers}. + +In other literature, demuxers are sometimes called @emph{splitters}, because +their main function is splitting a file into elementary streams (though some +files only contain one elementary stream). + +A schematic representation of a demuxer looks like this: @verbatim - _______ ______________ -| | | | -| input | demuxer | encoded data | decoder -| file | ---------> | packets | -----+ -|_______| |______________| | - v - _________ - | | - | decoded | - | frames | - |_________| - ________ ______________ | -| | | | | -| output | <-------- | encoded data | <----+ -| file | muxer | packets | encoder -|________| |______________| - - +┌──────────┬───────────────────────┐ +│ demuxer │ │ packets for stream 0 +╞══════════╡ elementary stream 0 ├──────────────────────⮞ +│ │ │ +│ global ├───────────────────────┤ +│properties│ │ packets for stream 1 +│ and │ elementary stream 1 ├──────────────────────⮞ +│ metadata │ │ +│ ├───────────────────────┤ +│ │ │ +│ │ ........... │ +│ │ │ +│ ├───────────────────────┤ +│ │ │ packets for stream N +│ │ elementary stream N ├──────────────────────⮞ +│ │ │ +└──────────┴───────────────────────┘ + ⯅ + │ + │ read from file, network stream, + │ grabbing device, etc. + │ @end verbatim -@command{ffmpeg} calls the libavformat library (containing demuxers) to read -input files and get packets containing encoded data from them. When there are -multiple input files, @command{ffmpeg} tries to keep them synchronized by -tracking lowest timestamp on any active input stream. +@item +@emph{Decoders} receive encoded (compressed) @emph{packets} for an audio, video, +or subtitle elementary stream, and decode them into raw @emph{frames} (arrays of +pixels for video, PCM for audio). A decoder is typically associated with (and +receives its input from) an elementary stream in a @emph{demuxer}, but sometimes +may also exist on its own (see @ref{Loopback decoders}). + +A schematic representation of a decoder looks like this: +@verbatim + ┌─────────┐ + packets │ │ raw frames +─────────⮞│ decoder ├────────────⮞ + │ │ + └─────────┘ +@end verbatim + +@item +@emph{Filtergraphs} process and transform raw audio or video @emph{frames}. A +filtergraph consists of one or more individual @emph{filters} linked into a +graph. Filtergraphs come in two flavors - @emph{simple} and @emph{complex}, +configured with the @option{-filter} and @option{-filter_complex} options, +respectively. + +A simple filtergraph is associated with an @emph{output elementary stream}; it +receives the input to be filtered from a @emph{decoder} and sends filtered +output to that output stream's @emph{encoder}. + +A simple video filtergraph that performs deinterlacing (using the @code{yadif} +deinterlacer) followed by resizing (using the @code{scale} filter) can look like +this: +@verbatim + + ┌────────────────────────┐ + │ simple filtergraph │ + frames from ╞════════════════════════╡ frames for + a decoder │ ┌───────┐ ┌───────┐ │ an encoder +────────────⮞├─⮞│ yadif ├─⮞│ scale ├─⮞│────────────⮞ + │ └───────┘ └───────┘ │ + └────────────────────────┘ +@end verbatim + +A complex filtergraph is standalone and not associated with any specific stream. +It may have multiple (or zero) inputs, potentially of different types (audio or +video), each of which receiving data either from a decoder or another complex +filtergraph's outputs. It also has one or more outputs that feed either an +encoder or another complex filtergraph's input. + +The following example diagram represents a complex filtergraph with 3 inputs and +2 outputs (all video): +@verbatim + ┌─────────────────────────────────────────────────┐ + │ complex filtergraph │ + ╞═════════════════════════════════════════════════╡ + frames ├───────┐ ┌─────────┐ ┌─────────┐ ┌────────┤ frames +─────────⮞│input 0├─⮞│ overlay ├─────⮞│ overlay ├─⮞│output 0├────────⮞ + ├───────┘ │ │ │ │ └────────┤ + frames ├───────┐╭⮞│ │ ╭⮞│ │ │ +─────────⮞│input 1├╯ └─────────┘ │ └─────────┘ │ + ├───────┘ │ │ + frames ├───────┐ ┌─────┐ ┌─────┬─╯ ┌────────┤ frames +─────────⮞│input 2├⮞│scale├⮞│split├───────────────⮞│output 1├────────⮞ + ├───────┘ └─────┘ └─────┘ └────────┤ + └─────────────────────────────────────────────────┘ +@end verbatim +Frames from second input are overlaid over those from the first. Frames from the +third input are rescaled, then the duplicated into two identical streams. One of +them is overlaid over the combined first two inputs, with the result exposed as +the filtergraph's first output. The other duplicate ends up being the +filtergraph's second output. + +@item +@emph{Encoders} receive raw audio, video, or subtitle @emph{frames} and encode +them into encoded @emph{packets}. The encoding (compression) process is +typically lossy - it degrades stream quality to make the output smaller; some +encoders are @emph{lossless}, but at the cost of much higher output size. A +video or audio encoder receives its input from some filtergraph's output, +subtitle encoders receive input from a decoder (since subtitle filtering is not +supported yet). Every encoder is associated with some muxer's @emph{output +elementary stream} and sends its output to that muxer. + +A schematic representation of an encoder looks like this: +@verbatim + ┌─────────┐ + raw frames │ │ packets +────────────⮞│ encoder ├─────────⮞ + │ │ + └─────────┘ +@end verbatim + +@item +@emph{Muxers} (short for "multiplexers") receive encoded @emph{packets} for +their elementary streams from encoders (the @emph{transcoding} path) or directly +from demuxers (the @emph{streamcopy} path), interleave them (when there is more +than one elementary stream), and write the resulting bytes into the output file +(or pipe, network stream, etc.). + +A schematic representation of a muxer looks like this: +@verbatim + ┌──────────────────────┬───────────┐ + packets for stream 0 │ │ muxer │ +──────────────────────⮞│ elementary stream 0 ╞═══════════╡ + │ │ │ + ├──────────────────────┤ global │ + packets for stream 1 │ │properties │ +──────────────────────⮞│ elementary stream 1 │ and │ + │ │ metadata │ + ├──────────────────────┤ │ + │ │ │ + │ ........... │ │ + │ │ │ + ├──────────────────────┤ │ + packets for stream N │ │ │ +──────────────────────⮞│ elementary stream N │ │ + │ │ │ + └──────────────────────┴─────┬─────┘ + │ + write to file, network stream, │ + grabbing device, etc. │ + │ + ▼ +@end verbatim + +@end itemize + +@section Streamcopy +The simplest pipeline in @command{ffmpeg} is single-stream +@emph{streamcopy}, that is copying one @emph{input elementary stream}'s packets +without decoding, filtering, or encoding them. As an example, consider an input +file called @file{INPUT.mkv} with 3 elementary streams, from which we take the +second and write it to file @file{OUTPUT.mp4}. A schematic representation of +such a pipeline looks like this: +@verbatim +┌──────────┬─────────────────────┐ +│ demuxer │ │ unused +╞══════════╡ elementary stream 0 ├────────╳ +│ │ │ +│INPUT.mkv ├─────────────────────┤ ┌──────────────────────┬───────────┐ +│ │ │ packets │ │ muxer │ +│ │ elementary stream 1 ├─────────⮞│ elementary stream 0 ╞═══════════╡ +│ │ │ │ │OUTPUT.mp4 │ +│ ├─────────────────────┤ └──────────────────────┴───────────┘ +│ │ │ unused +│ │ elementary stream 2 ├────────╳ +│ │ │ +└──────────┴─────────────────────┘ +@end verbatim + +The above pipeline can be constructed with the following commandline: +@example +ffmpeg -i INPUT.mkv -map 0:1 -c copy OUTPUT.mp4 +@end example + +In this commandline +@itemize + +@item +there is a single input @file{INPUT.mkv}; + +@item +there are no input options for this input; + +@item +there is a single output @file{OUTPUT.mp4}; + +@item +there are two output options for this output: + +@itemize +@item +@code{-map 0:1} selects the input stream to be used - from input with index 0 +(i.e. the first one) the stream with index 1 (i.e. the second one); + +@item +@code{-c copy} selects the @code{copy} encoder, i.e. streamcopy with no decoding +or encoding. +@end itemize + +@end itemize + +Streamcopy is useful for changing the elementary stream count, container format, +or modifying container-level metadata. Since there is no decoding or encoding, +it is very fast and there is no quality loss. However, it might not work in some +cases because of a variety of factors (e.g. certain information required by the +target container is not available in the source). Applying filters is obviously +also impossible, since filters work on decoded frames. + +More complex streamcopy scenarios can be constructed - e.g. combining streams +from two input files into a single output: +@verbatim +┌──────────┬────────────────────┐ ┌────────────────────┬───────────┐ +│ demuxer 0│ │ packets │ │ muxer │ +╞══════════╡elementary stream 0 ├────────⮞│elementary stream 0 ╞═══════════╡ +│INPUT0.mkv│ │ │ │OUTPUT.mp4 │ +└──────────┴────────────────────┘ ├────────────────────┤ │ +┌──────────┬────────────────────┐ │ │ │ +│ demuxer 1│ │ packets │elementary stream 1 │ │ +╞══════════╡elementary stream 0 ├────────⮞│ │ │ +│INPUT1.aac│ │ └────────────────────┴───────────┘ +└──────────┴────────────────────┘ +@end verbatim +that can be built by the commandline +@example +ffmpeg -i INPUT0.mkv -i INPUT1.aac -map 0:0 -map 1:0 -c copy OUTPUT.mp4 +@end example + +The output @option{-map} option is used twice here, creating two streams in the +output file - one fed by the first input and one by the second. The single +instance of the @option{-c} option selects streamcopy for both of those streams. +You could also use multiple instances of this option together with +@ref{Stream specifiers} to apply different values to each stream, as will be +demonstrated in following sections. + +A converse scenario is splitting multiple streams from a single input into +multiple outputs: +@verbatim +┌──────────┬─────────────────────┐ ┌───────────────────┬───────────┐ +│ demuxer │ │ packets │ │ muxer 0 │ +╞══════════╡ elementary stream 0 ├─────────⮞│elementary stream 0╞═══════════╡ +│ │ │ │ │OUTPUT0.mp4│ +│INPUT.mkv ├─────────────────────┤ └───────────────────┴───────────┘ +│ │ │ packets ┌───────────────────┬───────────┐ +│ │ elementary stream 1 ├─────────⮞│ │ muxer 1 │ +│ │ │ │elementary stream 0╞═══════════╡ +└──────────┴─────────────────────┘ │ │OUTPUT1.mp4│ + └───────────────────┴───────────┘ +@end verbatim +built with +@example +ffmpeg -i INPUT.mkv -map 0:0 -c copy OUTPUT0.mp4 -map 0:1 -c copy OUTPUT1.mp4 +@end example +Note how a separate instance of the @option{-c} option is needed for every +output file even though their values are the same. This is because non-global +options (which is most of them) only apply in the context of the file before +which they are placed. + +These examples can of course be further generalized into arbitrary remappings +of any number of inputs into any number of outputs. + +@section Trancoding +@emph{Transcoding} is the process of decoding a stream and then encoding it +again. Since encoding tends to be computationally expensive and in most cases +degrades the stream quality (i.e. it is @emph{lossy}), you should only transcode +when you need to and perform streamcopy otherwise. Typical reasons to transcode +are: + +@itemize +@item +applying filters - e.g. resizing, deinterlacing, or overlaying video; resampling +or mixing audio; + +@item +you want to feed the stream to something that cannot decode the original codec. +@end itemize +Note that @command{ffmpeg} will transcode all audio, video, and subtitle streams +unless you specify @option{-c copy} for them. + +Consider an example pipeline that reads an input file with one audio and one +video stream, transcodes the video and copies the audio into a single output +file. This can be schematically represented as follows +@verbatim +┌──────────┬─────────────────────┐ +│ demuxer │ │ audio packets +╞══════════╡ stream 0 (audio) ├─────────────────────────────────────╮ +│ │ │ │ +│INPUT.mkv ├─────────────────────┤ video ┌─────────┐ raw │ +│ │ │ packets │ video │ video frames │ +│ │ stream 1 (video) ├─────────⮞│ decoder ├──────────────╮ │ +│ │ │ │ │ │ │ +└──────────┴─────────────────────┘ └─────────┘ │ │ + ▼ ▼ + │ │ +┌──────────┬─────────────────────┐ video ┌─────────┐ │ │ +│ muxer │ │ packets │ video │ │ │ +╞══════════╡ stream 0 (video) │⮜─────────┤ encoder ├──────────────╯ │ +│ │ │ │(libx264)│ │ +│OUTPUT.mp4├─────────────────────┤ └─────────┘ │ +│ │ │ │ +│ │ stream 1 (audio) │⮜────────────────────────────────────╯ +│ │ │ +└──────────┴─────────────────────┘ +@end verbatim +and implemented with the following commandline: +@example +ffmpeg -i INPUT.mkv -map 0:v -map 0:a -c:v libx264 -c:a copy OUTPUT.mp4 +@end example +Note how it uses stream specifiers @code{:v} and @code{:a} to select input +streams and apply different values of the @option{-c} option to them; see the +@ref{Stream specifiers} section for more details. -Encoded packets are then passed to the decoder (unless streamcopy is selected -for the stream, see further for a description). The decoder produces -uncompressed frames (raw video/PCM audio/...) which can be processed further by -filtering (see next section). After filtering, the frames are passed to the -encoder, which encodes them and outputs encoded packets. Finally, those are -passed to the muxer, which writes the encoded packets to the output file. @section Filtering -Before encoding, @command{ffmpeg} can process raw audio and video frames using -filters from the libavfilter library. Several chained filters form a filter -graph. @command{ffmpeg} distinguishes between two types of filtergraphs: -simple and complex. + +When transcoding, audio and video streams can be filtered before encoding, with +either a @emph{simple} or @emph{complex} filtergraph. @subsection Simple filtergraphs + Simple filtergraphs are those that have exactly one input and output, both of -the same type. In the above diagram they can be represented by simply inserting -an additional step between decoding and encoding: +the same type (audio or video). They are configured with the per-stream +@option{-filter} option (with @option{-vf} and @option{-af} aliases for +@option{-filter:v} (video) and @option{-filter:a} (audio) respectively). Note +that simple filtergraphs are tied to their output stream, so e.g. if you have +multiple audio streams, @option{-af} will create a separate filtergraph for each +one. +Taking the trancoding example from above, adding filtering (and omitting audio, +for clarity) makes it look like this: @verbatim - _________ ______________ -| | | | -| decoded | | encoded data | -| frames |\ _ | packets | -|_________| \ /||______________| - \ __________ / - simple _\|| | / encoder - filtergraph | filtered |/ - | frames | - |__________| - +┌──────────┬───────────────┐ +│ demuxer │ │ ┌─────────┐ +╞══════════╡ video stream │ packets │ video │ frames +│INPUT.mkv │ ├─────────⮞│ decoder ├─────⮞───╮ +│ │ │ └─────────┘ │ +└──────────┴───────────────┘ │ + ╭───────────⮜───────────╯ + │ ┌────────────────────────┐ + │ │ simple filtergraph │ + │ ╞════════════════════════╡ + │ │ ┌───────┐ ┌───────┐ │ + ╰──⮞├─⮞│ yadif ├─⮞│ scale ├─⮞├╮ + │ └───────┘ └───────┘ ││ + └────────────────────────┘│ + │ + │ +┌──────────┬───────────────┐ video ┌─────────┐ │ +│ muxer │ │ packets │ video │ │ +╞══════════╡ video stream │⮜─────────┤ encoder ├───────⮜───────╯ +│OUTPUT.mp4│ │ │ │ +│ │ │ └─────────┘ +└──────────┴───────────────┘ @end verbatim -Simple filtergraphs are configured with the per-stream @option{-filter} option -(with @option{-vf} and @option{-af} aliases for video and audio respectively). -A simple filtergraph for video can look for example like this: - -@verbatim - _______ _____________ _______ ________ -| | | | | | | | -| input | ---> | deinterlace | ---> | scale | ---> | output | -|_______| |_____________| |_______| |________| - -@end verbatim - -Note that some filters change frame properties but not frame contents. E.g. the -@code{fps} filter in the example above changes number of frames, but does not -touch the frame contents. Another example is the @code{setpts} filter, which -only sets timestamps and otherwise passes the frames unchanged. - @subsection Complex filtergraphs + Complex filtergraphs are those which cannot be described as simply a linear -processing chain applied to one stream. This is the case, for example, when the graph has -more than one input and/or output, or when output stream type is different from -input. They can be represented with the following diagram: - -@verbatim - _________ -| | -| input 0 |\ __________ -|_________| \ | | - \ _________ /| output 0 | - \ | | / |__________| - _________ \| complex | / -| | | |/ -| input 1 |---->| filter |\ -|_________| | | \ __________ - /| graph | \ | | - / | | \| output 1 | - _________ / |_________| |__________| -| | / -| input 2 |/ -|_________| - -@end verbatim - -Complex filtergraphs are configured with the @option{-filter_complex} option. -Note that this option is global, since a complex filtergraph, by its nature, -cannot be unambiguously associated with a single stream or file. - -The @option{-lavfi} option is equivalent to @option{-filter_complex}. +processing chain applied to one stream. This is the case, for example, when the +graph has more than one input and/or output, or when output stream type is +different from input. Complex filtergraphs are configured with the +@option{-filter_complex} option. Note that this option is global, since a +complex filtergraph, by its nature, cannot be unambiguously associated with a +single stream or file. Each instance of @option{-filter_complex} creates a new +complex filtergraph, and there can be any number of them. A trivial example of a complex filtergraph is the @code{overlay} filter, which has two video inputs and one video output, containing one video overlaid on top of the other. Its audio counterpart is the @code{amix} filter. -@section Stream copy -Stream copy is a mode selected by supplying the @code{copy} parameter to the -@option{-codec} option. It makes @command{ffmpeg} omit the decoding and encoding -step for the specified stream, so it does only demuxing and muxing. It is useful -for changing the container format or modifying container-level metadata. The -diagram above will, in this case, simplify to this: - -@verbatim - _______ ______________ ________ -| | | | | | -| input | demuxer | encoded data | muxer | output | -| file | ---------> | packets | -------> | file | -|_______| |______________| |________| - -@end verbatim - -Since there is no decoding or encoding, it is very fast and there is no quality -loss. However, it might not work in some cases because of many factors. Applying -filters is obviously also impossible, since filters work on uncompressed data. - +@anchor{Loopback decoders} @section Loopback decoders While decoders are normally associated with demuxer streams, it is also possible to create "loopback" decoders that decode the output from some encoder and allow

[FFmpeg-devel,1/2] doc/ffmpeg: rewrite the detailed description chapter

Checks

Commit Message

Patch