diff mbox series

[FFmpeg-devel] avformat/rtcenc: Add WHIP muxer support for subsecond latency streaming

Message ID 20230529115039.17409-1-lq@chinaffmpeg.org
State New
Headers show
Series [FFmpeg-devel] avformat/rtcenc: Add WHIP muxer support for subsecond latency streaming | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Steven Liu May 29, 2023, 11:50 a.m. UTC
Co-authored-by: winlin <winlinvip@gmail.com>
Co-authored-by: yangrtc <yangrtc@aliyun.com>
Co-authored-by: cloudwebrtc <duanweiwei1982@gmail.com>
Co-authored-by: Haibo Chen <495810242@qq.com>
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
---
 configure                |    1 +
 doc/muxers.texi          |   50 +
 libavformat/Makefile     |    1 +
 libavformat/allformats.c |    1 +
 libavformat/http.c       |    6 +
 libavformat/http.h       |    2 +
 libavformat/rtcenc.c     | 2208 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 2269 insertions(+)
 create mode 100644 libavformat/rtcenc.c

Comments

Michael Niedermayer May 29, 2023, 1 p.m. UTC | #1
On Mon, May 29, 2023 at 07:50:39PM +0800, Steven Liu wrote:
> Co-authored-by: winlin <winlinvip@gmail.com>
> Co-authored-by: yangrtc <yangrtc@aliyun.com>
> Co-authored-by: cloudwebrtc <duanweiwei1982@gmail.com>
> Co-authored-by: Haibo Chen <495810242@qq.com>
> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
> ---
>  configure                |    1 +
>  doc/muxers.texi          |   50 +
>  libavformat/Makefile     |    1 +
>  libavformat/allformats.c |    1 +
>  libavformat/http.c       |    6 +
>  libavformat/http.h       |    2 +
>  libavformat/rtcenc.c     | 2208 ++++++++++++++++++++++++++++++++++++++
>  7 files changed, 2269 insertions(+)
>  create mode 100644 libavformat/rtcenc.c

breaks build:

CC	libavformat/rtcenc.o
libavformat/rtcenc.c:30:2: error: #error "OpenSSL version 1.1.1b or newer is required"
 #error "OpenSSL version 1.1.1b or newer is required"
  ^~~~~
libavformat/rtcenc.c: In function ‘dtls_context_init’:
libavformat/rtcenc.c:210:34: error: implicit declaration of function ‘EVP_EC_gen’; did you mean ‘EVP_PBE_get’? [-Werror=implicit-function-declaration]
     ctx->dtls_pkey = dtls_pkey = EVP_EC_gen(curve);
                                  ^~~~~~~~~~
                                  EVP_PBE_get
libavformat/rtcenc.c:210:32: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
     ctx->dtls_pkey = dtls_pkey = EVP_EC_gen(curve);
                                ^
cc1: some warnings being treated as errors
ffbuild/common.mak:81: recipe for target 'libavformat/rtcenc.o' failed
make: *** [libavformat/rtcenc.o] Error 1

[...]
Kieran Kunhya May 30, 2023, 8:44 a.m. UTC | #2
On Mon, 29 May 2023 at 12:51, Steven Liu <lq@chinaffmpeg.org> wrote:

> Co-authored-by: winlin <winlinvip@gmail.com>
> Co-authored-by: yangrtc <yangrtc@aliyun.com>
> Co-authored-by: cloudwebrtc <duanweiwei1982@gmail.com>
> Co-authored-by: Haibo Chen <495810242@qq.com>
> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
> ---
>  configure                |    1 +
>  doc/muxers.texi          |   50 +
>  libavformat/Makefile     |    1 +
>  libavformat/allformats.c |    1 +
>  libavformat/http.c       |    6 +
>  libavformat/http.h       |    2 +
>  libavformat/rtcenc.c     | 2208 ++++++++++++++++++++++++++++++++++++++
>  7 files changed, 2269 insertions(+)
>  create mode 100644 libavformat/rtcenc.c
>
> diff --git a/configure b/configure
> index 495493aa0e..526ddc0bd6 100755
> --- a/configure
> +++ b/configure
> @@ -3483,6 +3483,7 @@ ogg_demuxer_select="dirac_parse"
>  ogv_muxer_select="ogg_muxer"
>  opus_muxer_select="ogg_muxer"
>  psp_muxer_select="mov_muxer"
> +rtc_muxer_deps_any="openssl"
>  rtp_demuxer_select="sdp_demuxer"
>  rtp_mpegts_muxer_select="mpegts_muxer rtp_muxer"
>  rtpdec_select="asf_demuxer mov_demuxer mpegts_demuxer rm_demuxer
> rtp_protocol srtp"
> diff --git a/doc/muxers.texi b/doc/muxers.texi
> index 31fca17dd6..00052f51b4 100644
> --- a/doc/muxers.texi
> +++ b/doc/muxers.texi
> @@ -1333,6 +1333,56 @@ Set custom HTTP headers, can override built in
> default headers. Applicable only
>
>  @end table
>
> +@anchor{rtc}
> +@section rtc
> +
> +WebRTC (Real-Time Communication) muxer that supports sub-second latency
> streaming according to
> +the WHIP (WebRTC-HTTP ingestion protocol) specification.
> +
> +It uses HTTP as a signaling protocol to exchange SDP capabilities and ICE
> lite candidates. Then,
> +it uses STUN binding requests and responses to establish a session over
> UDP. Subsequently, it
> +initiates a DTLS handshake to exchange the SRTP encryption keys. Lastly,
> it splits video and
> +audio frames into RTP packets and encrypts them using SRTP.
> +
> +Ensure that you use H.264 without B frames and Opus for the audio codec.
> For example, to convert
> +an input file with @command{ffmpeg} to WebRTC:
> +@example
> +ffmpeg -re -i input.mp4 -acodec libopus -ar 48000 -ac 2 \
> +  -vcodec libx264 -profile:v baseline -tune zerolatency -threads 1 -bf 0 \
> +  -f rtc "http://localhost:1985/rtc/v1/whip/?app=live&stream=livestream"
> +@end example
> +
> +For this example, we have employed low latency options, resulting in an
> end-to-end latency of
> +approximately 150ms.
> +
> +@subsection Options
> +
> +This muxer supports the following options:
> +
> +@table @option
> +
> +@item ice_arq_max @var{size}
> +Set the maximum number of retransmissions for the ICE ARQ mechanism.
> +Default value is 5.
> +
> +@item ice_arq_timeout @var{size}
> +Set the start timeout in milliseconds for the ICE ARQ mechanism.
> +Default value is 30.
> +
> +@item dtls_arq_max @var{size}
> +Set the maximum number of retransmissions for the DTLS ARQ mechanism.
> +Default value is 5.
> +
> +@item dtls_arq_timeout @var{size}
> +Set the start timeout in milliseconds for the DTLS ARQ mechanism.
> +Default value is 50.
> +
> +@item pkt_size @var{size}
> +Set the maximum size, in bytes, of RTP packets that send out.
> +Default value is 1500.
> +
> +@end table
> +
>  @anchor{ico}
>  @section ico
>
> diff --git a/libavformat/Makefile b/libavformat/Makefile
> index f8ad7c6a11..2a76e613b0 100644
> --- a/libavformat/Makefile
> +++ b/libavformat/Makefile
> @@ -493,6 +493,7 @@ OBJS-$(CONFIG_RSD_DEMUXER)               += rsd.o
>  OBJS-$(CONFIG_RPL_DEMUXER)               += rpl.o
>  OBJS-$(CONFIG_RSO_DEMUXER)               += rsodec.o rso.o pcm.o
>  OBJS-$(CONFIG_RSO_MUXER)                 += rsoenc.o rso.o rawenc.o
> +OBJS-$(CONFIG_RTC_MUXER)                 += rtcenc.o avc.o http.o srtp.o
>  OBJS-$(CONFIG_RTP_MPEGTS_MUXER)          += rtpenc_mpegts.o
>  OBJS-$(CONFIG_RTP_MUXER)                 += rtp.o         \
>                                              rtpenc_aac.o     \
> diff --git a/libavformat/allformats.c b/libavformat/allformats.c
> index efdb34e29d..56e852e5ea 100644
> --- a/libavformat/allformats.c
> +++ b/libavformat/allformats.c
> @@ -392,6 +392,7 @@ extern const AVInputFormat  ff_rpl_demuxer;
>  extern const AVInputFormat  ff_rsd_demuxer;
>  extern const AVInputFormat  ff_rso_demuxer;
>  extern const FFOutputFormat ff_rso_muxer;
> +extern const FFOutputFormat ff_rtc_muxer;
>  extern const AVInputFormat  ff_rtp_demuxer;
>  extern const FFOutputFormat ff_rtp_muxer;
>  extern const FFOutputFormat ff_rtp_mpegts_muxer;
> diff --git a/libavformat/http.c b/libavformat/http.c
> index 0817aafb5b..6f509656b5 100644
> --- a/libavformat/http.c
> +++ b/libavformat/http.c
> @@ -530,6 +530,12 @@ int ff_http_averror(int status_code, int
> default_averror)
>          return default_averror;
>  }
>
> +const char* ff_http_get_new_location(URLContext *h)
> +{
> +    HTTPContext *s = h->priv_data;
> +    return s->new_location;
> +}
> +
>  static int http_write_reply(URLContext* h, int status_code)
>  {
>      int ret, body = 0, reply_code, message_len;
> diff --git a/libavformat/http.h b/libavformat/http.h
> index 5f650ef143..d1b691826b 100644
> --- a/libavformat/http.h
> +++ b/libavformat/http.h
> @@ -62,4 +62,6 @@ int ff_http_do_new_request2(URLContext *h, const char
> *uri, AVDictionary **optio
>
>  int ff_http_averror(int status_code, int default_averror);
>
> +const char* ff_http_get_new_location(URLContext *h);
> +
>  #endif /* AVFORMAT_HTTP_H */
> diff --git a/libavformat/rtcenc.c b/libavformat/rtcenc.c
> new file mode 100644
> index 0000000000..65173d0073
> --- /dev/null
> +++ b/libavformat/rtcenc.c
> @@ -0,0 +1,2208 @@
> +/*
> + * WebRTC-HTTP ingestion protocol (WHIP) muxer
> + * Copyright (c) 2023 The FFmpeg Project
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> 02110-1301 USA
> + */
> +
> +#include "config.h"
> +
> +#ifndef CONFIG_OPENSSL
> +#error "DTLS is not supported, please enable openssl"
> +#else
> +#include <openssl/ssl.h>
> +#include <openssl/err.h>
> +#if OPENSSL_VERSION_NUMBER < 0x1010102fL
> +#error "OpenSSL version 1.1.1b or newer is required"
> +#endif
> +#endif
> +
> +#include "libavcodec/avcodec.h"
> +#include "libavutil/base64.h"
> +#include "libavutil/bprint.h"
> +#include "libavutil/crc.h"
> +#include "libavutil/hmac.h"
> +#include "libavutil/opt.h"
> +#include "libavutil/random_seed.h"
> +#include "libavutil/time.h"
> +#include "avc.h"
> +#include "avio_internal.h"
> +#include "http.h"
> +#include "internal.h"
> +#include "mux.h"
> +#include "network.h"
> +#include "srtp.h"
> +
> +/**
> + * Maximum size limit of a Session Description Protocol (SDP),
> + * be it an offer or answer.
> + */
> +#define MAX_SDP_SIZE 8192
>

?


> +/**
> + * Maximum size of the buffer for sending and receiving UDP packets.
> + * Please note that this size does not limit the size of the UDP packet
> that can be sent.
> + * To set the limit for packet size, modify the `pkt_size` parameter.
> + * For instance, it is possible to set the UDP buffer to 4096 to send or
> receive packets,
> + * but please keep in mind that the `pkt_size` option limits the packet
> size to 1400.
> + */
> +#define MAX_UDP_BUFFER_SIZE 4096
> +/*
>
>


> +
> +/**
> + * Parses the ISOM AVCC format of extradata and extracts SPS/PPS.
> + *
> + * This function is used to parse SPS/PPS from the extradata in ISOM AVCC
> format.
> + * It can handle both ISOM and annexb formats but only parses data in
> ISOM format.
> + * If the extradata is in annexb format, this function ignores it, and
> uses the entire
> + * extradata as a sequence header with SPS/PPS. Refer to
> ff_isom_write_avcc.
> + *
> + * @param s                Pointer to the AVFormatContext
> + * @param extradata        Pointer to the extradata
> + * @param extradata_size   Size of the extradata
> + * @returns                Returns 0 if successful or AVERROR_xxx in case
> of an error.
> + */
> +static int isom_read_avcc(AVFormatContext *s, uint8_t *extradata, int
> extradata_size)
> +{
> +    int ret = 0;
> +    uint8_t version, nal_length_size, nb_sps, nb_pps;
> +    AVIOContext *pb;
> +    RTCContext *rtc = s->priv_data;
> +
> +    if (!extradata || !extradata_size)
> +        return 0;
> +
> +    /* Not H.264 ISOM format, may be annexb etc. */
> +    if (extradata_size < 4 || extradata[0] != 1) {
> +        if (!ff_avc_find_startcode(extradata, extradata +
> extradata_size)) {
> +            av_log(s, AV_LOG_ERROR, "Format must be ISOM or annexb\n");
> +            return AVERROR_INVALIDDATA;
> +        }
> +        return 0;
> +    }
> +
> +    /* Parse the SPS/PPS in ISOM format in extradata. */
> +    pb = avio_alloc_context(extradata, extradata_size, 0, NULL, NULL,
> NULL, NULL);
> +    if (!pb)
> +        return AVERROR(ENOMEM);
> +
> +    version = avio_r8(pb); /* version */
> +    avio_r8(pb); /* avc profile */
> +    avio_r8(pb); /* avc profile compat */
> +    avio_r8(pb); /* avc level */
> +    nal_length_size = avio_r8(pb); /* 6 bits reserved (111111) + 2 bits
> nal size length - 1 (11) */
> +    nb_sps = avio_r8(pb); /* 3 bits reserved (111) + 5 bits number of sps
> */
> +
> +    if (version != 1) {
> +        av_log(s, AV_LOG_ERROR, "Invalid version=%d\n", version);
> +        ret = AVERROR_INVALIDDATA;
> +        goto end;
> +    }
> +
> +    rtc->avc_nal_length_size = (nal_length_size & 0x03) + 1;
> +    if (rtc->avc_nal_length_size == 3) {
> +        av_log(s, AV_LOG_ERROR, "Invalid nal length size=%d\n",
> rtc->avc_nal_length_size);
> +        ret = AVERROR_INVALIDDATA;
> +        goto end;
> +    }
> +
> +    /* Read SPS */
> +    nb_sps &= 0x1f;
> +    if (nb_sps != 1 || avio_feof(pb)) {
> +        av_log(s, AV_LOG_ERROR, "Invalid number of sps=%d, eof=%d\n",
> nb_sps, avio_feof(pb));
> +        ret = AVERROR_INVALIDDATA;
> +        goto end;
> +    }
> +
> +    rtc->avc_sps_size = avio_rb16(pb); /* sps size */
> +    if (rtc->avc_sps_size <= 0 || avio_feof(pb)) {
> +        av_log(s, AV_LOG_ERROR, "Invalid sps size=%d, eof=%d\n",
> rtc->avc_sps_size, avio_feof(pb));
> +        ret = AVERROR_INVALIDDATA;
> +        goto end;
> +    }
> +
> +    rtc->avc_sps = av_malloc(rtc->avc_sps_size);
> +    if (!rtc->avc_sps) {
> +        ret = AVERROR(ENOMEM);
> +        goto end;
> +    }
> +
> +    ret = avio_read(pb, rtc->avc_sps, rtc->avc_sps_size); /* sps */
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to read sps, size=%d\n",
> rtc->avc_sps_size);
> +        goto end;
> +    }
> +
> +    /* Read PPS */
> +    nb_pps = avio_r8(pb); /* number of pps */
> +    if (nb_pps != 1 || avio_feof(pb)) {
> +        av_log(s, AV_LOG_ERROR, "Invalid number of pps=%d, eof=%d\n",
> nb_pps, avio_feof(pb));
> +        ret = AVERROR_INVALIDDATA;
> +        goto end;
> +    }
> +
> +    rtc->avc_pps_size = avio_rb16(pb); /* pps size */
> +    if (rtc->avc_pps_size <= 0 || avio_feof(pb)) {
> +        av_log(s, AV_LOG_ERROR, "Invalid pps size=%d, eof=%d\n",
> rtc->avc_pps_size, avio_feof(pb));
> +        ret = AVERROR_INVALIDDATA;
> +        goto end;
> +    }
> +
> +    rtc->avc_pps = av_malloc(rtc->avc_pps_size);
> +    if (!rtc->avc_pps) {
> +        ret = AVERROR(ENOMEM);
> +        goto end;
> +    }
> +
> +    ret = avio_read(pb, rtc->avc_pps, rtc->avc_pps_size); /* pps */
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to read pps, size=%d\n",
> rtc->avc_pps_size);
> +        goto end;
> +    }
> +
> +end:
> +    avio_context_free(&pb);
> +    return ret;
> +}
>

This code is duplicated from other parts of FFmpeg

 +/**

> + * Generate SDP offer according to the codec parameters, DTLS and ICE
> information.
> + * The below is an example of SDP offer:
> + *
> + *       v=0
> + *       o=FFmpeg 4489045141692799359 2 IN IP4 127.0.0.1
> + *       s=FFmpegPublishSession
> + *       t=0 0
> + *       a=group:BUNDLE 0 1
> + *       a=extmap-allow-mixed
> + *       a=msid-semantic: WMS
> + *
> + *       m=audio 9 UDP/TLS/RTP/SAVPF 111
> + *       c=IN IP4 0.0.0.0
> + *       a=ice-ufrag:a174B
> + *       a=ice-pwd:wY8rJ3gNLxL3eWZs6UPOxy
> + *       a=fingerprint:sha-256
> EE:FE:A2:E5:6A:21:78:60:71:2C:21:DC:1A:2C:98:12:0C:E8:AD:68:07:61:1B:0E:FC:46:97:1E:BC:97:4A:54
> + *       a=setup:actpass
> + *       a=mid:0
> + *       a=sendonly
> + *       a=msid:FFmpeg audio
> + *       a=rtcp-mux
> + *       a=rtpmap:111 opus/48000/2
> + *       a=ssrc:4267647086 cname:FFmpeg
> + *       a=ssrc:4267647086 msid:FFmpeg audio
> + *
> + *       m=video 9 UDP/TLS/RTP/SAVPF 106
> + *       c=IN IP4 0.0.0.0
> + *       a=ice-ufrag:a174B
> + *       a=ice-pwd:wY8rJ3gNLxL3eWZs6UPOxy
> + *       a=fingerprint:sha-256
> EE:FE:A2:E5:6A:21:78:60:71:2C:21:DC:1A:2C:98:12:0C:E8:AD:68:07:61:1B:0E:FC:46:97:1E:BC:97:4A:54
> + *       a=setup:actpass
> + *       a=mid:1
> + *       a=sendonly
> + *       a=msid:FFmpeg video
> + *       a=rtcp-mux
> + *       a=rtcp-rsize
> + *       a=rtpmap:106 H264/90000
> + *       a=fmtp:106
> level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
> + *       a=ssrc:107169110 cname:FFmpeg
> + *       a=ssrc:107169110 msid:FFmpeg video
> + *
>

This is not a legal SDP, each line must end only with one space (yes, this
is regularly wrong).


>
>
> +
> +/**
> + * Opens the UDP transport and completes the ICE handshake, using fast
> retransmit to
> + * handle packet loss for the binding request.
> + *
> + * To initiate a fast retransmission of the STUN binding request during
> ICE, we wait only
> + * for a successful local ICE process i.e., when a binding response is
> received from the
> + * server. Since the server's binding request may not arrive, we do not
> always wait for it.
> + * However, we will always respond to the server's binding request during
> ICE, DTLS or
> + * RTP streaming.
> + *
> + * @param s Pointer to the AVFormatContext
> + * @return Returns 0 if the handshake was successful or AVERROR_xxx in
> case of an error
> + */
> +static int ice_handshake(AVFormatContext *s)
> +{
> +    int ret, size;
> +    char url[256], tmp[16];
> +    char req_buf[MAX_UDP_BUFFER_SIZE], res_buf[MAX_UDP_BUFFER_SIZE];
> +    RTCContext *rtc = s->priv_data;
> +    int fast_retries = rtc->ice_arq_max, timeout = rtc->ice_arq_timeout;
> +
> +    /* Build UDP URL and create the UDP context as transport. */
> +    ff_url_join(url, sizeof(url), "udp", NULL, rtc->ice_host,
> rtc->ice_port, NULL);
> +    ret = ffurl_alloc(&rtc->udp_uc, url, AVIO_FLAG_WRITE,
> &s->interrupt_callback);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to open udp://%s:%d\n",
> rtc->ice_host, rtc->ice_port);
> +        goto end;
> +    }
> +
> +    av_opt_set(rtc->udp_uc->priv_data, "connect", "1", 0);
> +    av_opt_set(rtc->udp_uc->priv_data, "fifo_size", "0", 0);
> +    /* Set the max packet size to the buffer size. */
> +    snprintf(tmp, sizeof(tmp), "%d", rtc->pkt_size);
> +    av_opt_set(rtc->udp_uc->priv_data, "pkt_size", tmp, 0);
> +
> +    ret = ffurl_connect(rtc->udp_uc, NULL);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to connect udp://%s:%d\n",
> rtc->ice_host, rtc->ice_port);
> +        goto end;
> +    }
> +
> +    /* Make the socket non-blocking, set to READ and WRITE mode after
> connected */
> +    ff_socket_nonblock(ffurl_get_file_handle(rtc->udp_uc), 1);
> +    rtc->udp_uc->flags |= AVIO_FLAG_READ | AVIO_FLAG_NONBLOCK;
> +
> +    /* Build the STUN binding request. */
> +    ret = ice_create_request(s, req_buf, sizeof(req_buf), &size);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to create STUN binding request,
> size=%d\n", size);
> +        goto end;
> +    }
> +
> +    /* Fast retransmit the STUN binding request. */
> +    while (1) {
> +        ret = ffurl_write(rtc->udp_uc, req_buf, size);
> +        if (ret < 0) {
> +            av_log(s, AV_LOG_ERROR, "Failed to send STUN binding request,
> size=%d\n", size);
> +            goto end;
> +        }
> +
> +        /* Wait so that the server can process the request and no need
> ARQ then. */
> +#if ICE_PROCESSING_TIMEOUT > 0
> +        av_usleep(ICE_PROCESSING_TIMEOUT * 10000);
> +#endif
> +
> +        /* Read the STUN binding response. */
> +        ret = ffurl_read(rtc->udp_uc, res_buf, sizeof(res_buf));
> +        if (ret < 0) {
> +            /* If max retries is 6 and start timeout is 21ms, the total
> timeout
> +             * is about 21 + 42 + 84 + 168 + 336 + 672 = 1263ms. */
> +            av_usleep(timeout * 1000);
> +            timeout *= 2;
> +
> +            if (ret == AVERROR(EAGAIN) && fast_retries) {
> +                fast_retries--;
> +                continue;
> +            }
> +
> +            av_log(s, AV_LOG_ERROR, "Failed to read STUN binding
> response, retries=%d\n", rtc->ice_arq_max);
> +            goto end;
> +        }
> +
> +        /* If got any binding response, the fast retransmission is done.
> */
> +        if (ice_is_binding_response(res_buf, ret))
> +            break;
> +
> +        /* When a binding request is received, it is necessary to respond
> immediately. */
> +        if (ice_is_binding_request(res_buf, ret) && (ret =
> ice_handle_binding_request(s, res_buf, ret)) < 0)
> +            goto end;
> +    }
> +
> +    /* Wait just for a small while to get the possible binding request
> from server. */
> +    fast_retries = rtc->ice_arq_max / 2;
> +    timeout = rtc->ice_arq_timeout;
> +    while (fast_retries) {
> +        ret = ffurl_read(rtc->udp_uc, res_buf, sizeof(res_buf));
> +        if (ret < 0) {
> +            /* If max retries is 6 and start timeout is 21ms, the total
> timeout
> +             * is about 21 + 42 + 84 = 147ms. */
> +            av_usleep(timeout * 1000);
> +            timeout *= 2;
> +
> +            if (ret == AVERROR(EAGAIN)) {
> +                fast_retries--;
> +                continue;
> +            }
> +
> +            av_log(s, AV_LOG_ERROR, "Failed to read STUN binding request,
> retries=%d\n", rtc->ice_arq_max);
> +            goto end;
> +        }
> +
> +        /* When a binding request is received, it is necessary to respond
> immediately. */
> +        if (ice_is_binding_request(res_buf, ret)) {
> +            if ((ret = ice_handle_binding_request(s, res_buf, ret)) < 0)
> +                goto end;
> +            break;
> +        }
> +    }
> +
> +    av_log(s, AV_LOG_INFO, "WHIP: ICE STUN ok, url=udp://%s:%d,
> location=%s, username=%s:%s, req=%dB, res=%dB, arq=%d\n",
> +        rtc->ice_host, rtc->ice_port, rtc->whip_resource_url ?
> rtc->whip_resource_url : "",
> +        rtc->ice_ufrag_remote, rtc->ice_ufrag_local, size, ret,
> +        rtc->ice_arq_max - fast_retries);
> +    ret = 0;
> +
> +end:
> +    return ret;
> +}
> +
> +/**
> + * Establish the SRTP context using the keying material exported from
> DTLS.
> + *
> + * Create separate SRTP contexts for sending video and audio, as their
> sequences differ
> + * and should not share a single context. Generate a single SRTP context
> for receiving
> + * RTCP only.
> + *
> + * @return 0 if OK, AVERROR_xxx on error
> + */
> +static int setup_srtp(AVFormatContext *s)
> +{
> +    int ret;
> +    char recv_key[DTLS_SRTP_MASTER_KEY_LEN],
> send_key[DTLS_SRTP_MASTER_KEY_LEN];
> +    char buf[AV_BASE64_SIZE(DTLS_SRTP_MASTER_KEY_LEN)];
> +    const char* suite = "AES_CM_128_HMAC_SHA1_80";
> +    RTCContext *rtc = s->priv_data;
> +
> +    /* As DTLS client, the send key is client master key plus salt. */
> +    memcpy(send_key, rtc->dtls_ctx.dtls_srtp_material, 16);
> +    memcpy(send_key + 16, rtc->dtls_ctx.dtls_srtp_material + 32, 14);
> +
> +    /* As DTLS client, the recv key is server master key plus salt. */
> +    memcpy(recv_key, rtc->dtls_ctx.dtls_srtp_material + 16, 16);
> +    memcpy(recv_key + 16, rtc->dtls_ctx.dtls_srtp_material + 46, 14);
> +
> +    /* Setup SRTP context for outgoing packets */
> +    if (!av_base64_encode(buf, sizeof(buf), send_key, sizeof(send_key))) {
> +        av_log(s, AV_LOG_ERROR, "Failed to encode send key\n");
> +        ret = AVERROR(EIO);
> +        goto end;
> +    }
> +
> +    ret = ff_srtp_set_crypto(&rtc->srtp_audio_send, suite, buf);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to set crypto for audio send\n");
> +        goto end;
> +    }
> +
> +    ret = ff_srtp_set_crypto(&rtc->srtp_video_send, suite, buf);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to set crypto for video send\n");
> +        goto end;
> +    }
> +
> +    ret = ff_srtp_set_crypto(&rtc->srtp_rtcp_send, suite, buf);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to set crypto for rtcp send\n");
> +        goto end;
> +    }
> +
> +    /* Setup SRTP context for incoming packets */
> +    if (!av_base64_encode(buf, sizeof(buf), recv_key, sizeof(recv_key))) {
> +        av_log(s, AV_LOG_ERROR, "Failed to encode recv key\n");
> +        ret = AVERROR(EIO);
> +        goto end;
> +    }
> +
> +    ret = ff_srtp_set_crypto(&rtc->srtp_recv, suite, buf);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to set crypto for recv\n");
> +        goto end;
> +    }
> +
> +    av_log(s, AV_LOG_INFO, "WHIP: SRTP setup done, suite=%s, key=%luB\n",
> suite, sizeof(send_key));
> +
> +end:
> +    return ret;
> +}
> +
> +/**
> + * Creates dedicated RTP muxers for each stream in the AVFormatContext to
> build RTP
> + * packets from the encoded frames.
> + *
> + * The corresponding SRTP context is utilized to encrypt each stream's
> RTP packets. For
> + * example, a video SRTP context is used for the video stream.
> Additionally, the
> + * "on_rtp_write_packet" callback function is set as the write function
> for each RTP
> + * muxer to send out encrypted RTP packets.
> + *
> + * @return 0 if OK, AVERROR_xxx on error
> + */
> +static int create_rtp_muxer(AVFormatContext *s)
> +{
> +    int ret, i, is_video, buffer_size, max_packet_size;
> +    AVFormatContext *rtp_ctx = NULL;
> +    AVDictionary *opts = NULL;
> +    uint8_t *buffer = NULL;
> +    char buf[64];
> +    RTCContext *rtc = s->priv_data;
> +
> +    const AVOutputFormat *rtp_format = av_guess_format("rtp", NULL, NULL);
> +    if (!rtp_format) {
> +        av_log(s, AV_LOG_ERROR, "Failed to guess rtp muxer\n");
> +        ret = AVERROR(ENOSYS);
> +        goto end;
> +    }
> +
> +    /* The UDP buffer size, may greater than MTU. */
> +    buffer_size = MAX_UDP_BUFFER_SIZE;
> +    /* The RTP payload max size. Reserved some bytes for SRTP checksum
> and padding. */
> +    max_packet_size = rtc->pkt_size - DTLS_SRTP_CHECKSUM_LEN;
> +
> +    for (i = 0; i < s->nb_streams; i++) {
> +        rtp_ctx = avformat_alloc_context();
> +        if (!rtp_ctx) {
> +            ret = AVERROR(ENOMEM);
> +            goto end;
> +        }
> +
> +        rtp_ctx->oformat = rtp_format;
> +        if (!avformat_new_stream(rtp_ctx, NULL)) {
> +            ret = AVERROR(ENOMEM);
> +            goto end;
> +        }
> +        /* Pass the interrupt callback on */
> +        rtp_ctx->interrupt_callback = s->interrupt_callback;
> +        /* Copy the max delay setting; the rtp muxer reads this. */
> +        rtp_ctx->max_delay = s->max_delay;
> +        /* Copy other stream parameters. */
> +        rtp_ctx->streams[0]->sample_aspect_ratio =
> s->streams[i]->sample_aspect_ratio;
> +        rtp_ctx->flags |= s->flags & AVFMT_FLAG_BITEXACT;
> +        rtp_ctx->strict_std_compliance = s->strict_std_compliance;
> +
> +        /* Set the synchronized start time. */
> +        rtp_ctx->start_time_realtime = s->start_time_realtime;
> +
> +        avcodec_parameters_copy(rtp_ctx->streams[0]->codecpar,
> s->streams[i]->codecpar);
> +        rtp_ctx->streams[0]->time_base = s->streams[i]->time_base;
> +
> +        buffer = av_malloc(buffer_size);
> +        if (!buffer) {
> +            ret = AVERROR(ENOMEM);
> +            goto end;
> +        }
> +
> +        rtp_ctx->pb = avio_alloc_context(buffer, buffer_size, 1, s, NULL,
> on_rtp_write_packet, NULL);
> +        if (!rtp_ctx->pb) {
> +            ret = AVERROR(ENOMEM);
> +            goto end;
> +        }
> +        rtp_ctx->pb->max_packet_size = max_packet_size;
> +        rtp_ctx->pb->av_class = &ff_avio_class;
> +
> +        is_video = s->streams[i]->codecpar->codec_type ==
> AVMEDIA_TYPE_VIDEO;
> +        snprintf(buf, sizeof(buf), "%d", is_video?
> rtc->video_payload_type : rtc->audio_payload_type);
> +        av_dict_set(&opts, "payload_type", buf, 0);
> +        snprintf(buf, sizeof(buf), "%d", is_video? rtc->video_ssrc :
> rtc->audio_ssrc);
> +        av_dict_set(&opts, "ssrc", buf, 0);
> +
> +        ret = avformat_write_header(rtp_ctx, &opts);
> +        if (ret < 0) {
> +            av_log(s, AV_LOG_ERROR, "Failed to write rtp header\n");
> +            goto end;
> +        }
> +
> +        ff_format_set_url(rtp_ctx, av_strdup(s->url));
> +        s->streams[i]->time_base = rtp_ctx->streams[0]->time_base;
> +        s->streams[i]->priv_data = rtp_ctx;
> +        rtp_ctx = NULL;
> +    }
> +
> +    av_log(s, AV_LOG_INFO, "WHIP: Create RTP muxer OK, buffer_size=%d,
> max_packet_size=%d\n",
> +        buffer_size, max_packet_size);
> +
> +end:
> +    if (rtp_ctx)
> +        avio_context_free(&rtp_ctx->pb);
> +    avformat_free_context(rtp_ctx);
> +    av_dict_free(&opts);
> +    return ret;
> +}
> +
> +/**
> + * Callback triggered by the RTP muxer when it creates and sends out an
> RTP packet.
> + *
> + * This function modifies the video STAP packet, removing the markers,
> and updating the
> + * NRI of the first NALU. Additionally, it uses the corresponding SRTP
> context to encrypt
> + * the RTP packet, where the video packet is handled by the video SRTP
> context.
> + */
> +static int on_rtp_write_packet(void *opaque, uint8_t *buf, int buf_size)
> +{
> +    int ret, cipher_size, is_rtcp, is_video;
> +    uint8_t payload_type, nalu_header;
> +    char cipher[MAX_UDP_BUFFER_SIZE];
> +    AVFormatContext *s = opaque;
> +    RTCContext *rtc = s->priv_data;
> +    struct SRTPContext *srtp;
> +
> +    /* Ignore if not RTP or RTCP packet. */
> +    if (buf_size < 12 || (buf[0] & 0xC0) != 0x80)
> +        return 0;
> +
> +    /* Only support audio, video and rtcp. */
> +    is_rtcp = buf[1] >= 192 && buf[1] <= 223;
> +    payload_type = buf[1] & 0x7f;
> +    is_video = payload_type == rtc->video_payload_type;
> +    if (!is_rtcp && payload_type != rtc->video_payload_type &&
> payload_type != rtc->audio_payload_type)
> +        return 0;
> +
> +    /**
> +     * For video, the STAP-A with SPS/PPS should:
> +     * 1. The marker bit should be 0, never be 1.
> +     * 2. The NRI should equal to the first NALU's.
> +     */
> +    if (is_video && buf_size > 12) {
> +        nalu_header = buf[12] & 0x1f;
> +        if (nalu_header == NALU_TYPE_STAP_A) {
> +            /* Reset the marker bit to 0. */
> +            if (buf[1] & 0x80)
> +                buf[1] &= 0x7f;
> +
> +            /* Reset the NRI to the first NALU's NRI. */
> +            if (buf_size > 15 && (buf[15]&0x60) != (buf[12]&0x60))
> +                buf[12] = (buf[12]&0x80) | (buf[15]&0x60) |
> (buf[12]&0x1f);
> +        }
> +    }
> +
> +    /* Get the corresponding SRTP context. */
> +    srtp = is_rtcp ? &rtc->srtp_rtcp_send : (is_video?
> &rtc->srtp_video_send : &rtc->srtp_audio_send);
> +
> +    /* Encrypt by SRTP and send out. */
> +    cipher_size = ff_srtp_encrypt(srtp, buf, buf_size, cipher,
> sizeof(cipher));
> +    if (cipher_size <= 0 || cipher_size < buf_size) {
> +        av_log(s, AV_LOG_WARNING, "Failed to encrypt packet=%dB,
> cipher=%dB\n", buf_size, cipher_size);
> +        return 0;
> +    }
> +
> +    ret = ffurl_write(rtc->udp_uc, cipher, cipher_size);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to write packet=%dB, ret=%d\n",
> cipher_size, ret);
> +        return ret;
> +    }
> +
> +    return ret;
> +}
> +
> +/**
> + * Inserts the SPS/PPS data before each IDR (Instantaneous Decoder
> Refresh) frame.
> + *
> + * The SPS/PPS is parsed from the extradata. If it's in ISOM format, the
> SPS/PPS is
> + * multiplexed to the data field of the packet. If it's in annexb format,
> then the entire
> + * extradata is set to the data field of the packet.
> + */
> +static int insert_sps_pps_packet(AVFormatContext *s, AVPacket *pkt)
> +{
> +    int ret, is_idr, size, i;
> +    uint8_t *p;
> +    AVPacket* extra = NULL;
> +    AVStream *st = s->streams[pkt->stream_index];
> +    AVFormatContext *rtp_ctx = st->priv_data;
> +    RTCContext *rtc = s->priv_data;
> +
> +    is_idr = (pkt->flags & AV_PKT_FLAG_KEY) && st->codecpar->codec_type
> == AVMEDIA_TYPE_VIDEO;
> +    if (!is_idr || !st->codecpar->extradata)
> +        return 0;
> +
> +    extra = av_packet_alloc();
> +    if (!extra)
> +        return AVERROR(ENOMEM);
> +
> +    size = !rtc->avc_nal_length_size ? st->codecpar->extradata_size :
> +            rtc->avc_nal_length_size * 2 + rtc->avc_sps_size +
> rtc->avc_pps_size;
> +    ret = av_new_packet(extra, size);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to allocate extra packet\n");
> +        goto end;
> +    }
> +
> +    /* Encode SPS/PPS in annexb format. */
> +    if (!rtc->avc_nal_length_size) {
> +        memcpy(extra->data, st->codecpar->extradata, size);
> +    } else {
> +        /* Encode SPS/PPS in ISOM format. */
> +        p = extra->data;
> +        for (i = 0; i < rtc->avc_nal_length_size; i++) {
> +            *p++ = rtc->avc_sps_size >> (8 * (rtc->avc_nal_length_size -
> i - 1));
> +        }
> +        memcpy(p, rtc->avc_sps, rtc->avc_sps_size);
> +        p += rtc->avc_sps_size;
> +
> +        /* Encode PPS in ISOM format. */
> +        for (i = 0; i < rtc->avc_nal_length_size; i++) {
> +            *p++ = rtc->avc_pps_size >> (8 * (rtc->avc_nal_length_size -
> i - 1));
> +        }
> +        memcpy(p, rtc->avc_pps, rtc->avc_pps_size);
> +        p += rtc->avc_pps_size;
> +    }
> +
> +    /* Setup packet and feed it to chain. */
> +    extra->pts = pkt->pts;
> +    extra->dts = pkt->dts;
> +    extra->stream_index = pkt->stream_index;
> +    extra->time_base = pkt->time_base;
> +
> +    ret = ff_write_chained(rtp_ctx, 0, extra, s, 0);
> +    if (ret < 0)
> +        goto end;
> +
> +end:
> +    av_packet_free(&extra);
> +    return ret;
> +}
> +
> +/**
> + * RTC is connectionless, for it's based on UDP, so it check whether
> sesison is
> + * timeout. In such case, publishers can't republish the stream util the
> session
> + * is timeout.
> + * This function is called to notify the server that the stream is ended,
> server
> + * should expire and close the session immediately, so that publishers
> can republish
> + * the stream quickly.
> + */
> +static int whip_dispose(AVFormatContext *s)
> +{
> +    int ret;
> +    char buf[MAX_URL_SIZE];
> +    URLContext *whip_uc = NULL;
> +    RTCContext *rtc = s->priv_data;
> +
> +    if (!rtc->whip_resource_url)
> +        return 0;
> +
> +    ret = ffurl_alloc(&whip_uc, rtc->whip_resource_url,
> AVIO_FLAG_READ_WRITE, &s->interrupt_callback);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to alloc WHIP delete context:
> %s\n", s->url);
> +        goto end;
> +    }
> +
> +    av_opt_set(whip_uc->priv_data, "chunked_post", "0", 0);
> +    av_opt_set(whip_uc->priv_data, "method", "DELETE", 0);
> +    ret = ffurl_connect(whip_uc, NULL);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to DELETE url=%s\n",
> rtc->whip_resource_url);
> +        goto end;
> +    }
> +
> +    while (1) {
> +        ret = ffurl_read(whip_uc, buf, sizeof(buf));
> +        if (ret == AVERROR_EOF) {
> +            ret = 0;
> +            break;
> +        }
> +        if (ret < 0) {
> +            av_log(s, AV_LOG_ERROR, "Failed to read response from DELETE
> url=%s\n", rtc->whip_resource_url);
> +            goto end;
> +        }
> +    }
> +
> +    av_log(s, AV_LOG_INFO, "WHIP: Dispose resource %s ok\n",
> rtc->whip_resource_url);
> +
> +end:
> +    ffurl_closep(&whip_uc);
> +    return ret;
> +}
> +
> +static av_cold int rtc_init(AVFormatContext *s)
> +{
> +    int ret;
> +    RTCContext *rtc = s->priv_data;
> +
> +    if ((ret = whip_init(s)) < 0)
> +        return ret;
> +
> +    if ((ret = parse_codec(s)) < 0)
> +        return ret;
> +
> +    if ((ret = generate_sdp_offer(s)) < 0)
> +        return ret;
> +
> +    if ((ret = exchange_sdp(s)) < 0)
> +        return ret;
> +
> +    if ((ret = parse_answer(s)) < 0)
> +        return ret;
> +
> +    if ((ret = ice_handshake(s)) < 0)
> +        return ret;
> +
> +    /* Now UDP URL context is ready, setup the DTLS transport. */
> +    rtc->dtls_ctx.udp_uc = rtc->udp_uc;
> +
> +    if ((ret = dtls_context_handshake(&rtc->dtls_ctx)) < 0)
> +        return ret;
> +
> +    if ((ret = setup_srtp(s)) < 0)
> +        return ret;
> +
> +    if ((ret = create_rtp_muxer(s)) < 0)
> +        return ret;
> +
> +    return ret;
> +}
> +
> +static int rtc_write_packet(AVFormatContext *s, AVPacket *pkt)
> +{
> +    int ret;
> +    RTCContext *rtc = s->priv_data;
> +    AVStream *st = s->streams[pkt->stream_index];
> +    AVFormatContext *rtp_ctx = st->priv_data;
> +
> +    /* TODO: Send binding request every 1s as WebRTC heartbeat. */
> +    /* TODO: Receive packets from the server such as ICE binding
> requests, DTLS messages,
> +     * and RTCP like PLI requests, then respond to them.*/
> +
> +    /* For audio OPUS stream, correct the timestamp. */
> +    if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
> +        pkt->dts = pkt->pts = rtc->audio_jitter_base;
> +        rtc->audio_jitter_base += 960;
> +    }
>

Correct how?

+    ret = insert_sps_pps_packet(s, pkt);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to insert SPS/PPS packet\n");
> +        return ret;
> +    }
> +
> +    ret = ff_write_chained(rtp_ctx, 0, pkt, s, 0);
> +    if (ret < 0) {
> +        if (ret == AVERROR(EINVAL)) {
> +            av_log(s, AV_LOG_WARNING, "Ignore failed to write packet=%dB,
> ret=%d\n", pkt->size, ret);
> +            ret = 0;
> +        } else
> +            av_log(s, AV_LOG_ERROR, "Failed to write packet, size=%d\n",
> pkt->size);
> +        return ret;
> +    }
> +
> +    return ret;
> +}
> +
> +static av_cold void rtc_deinit(AVFormatContext *s)
> +{
> +    int i, ret;
> +    RTCContext *rtc = s->priv_data;
> +
> +    ret = whip_dispose(s);
> +    if (ret < 0)
> +        av_log(s, AV_LOG_WARNING, "Failed to dispose resource, ret=%d\n",
> ret);
> +
> +    for (i = 0; i < s->nb_streams; i++) {
> +        AVFormatContext* rtp_ctx = s->streams[i]->priv_data;
> +        if (!rtp_ctx)
> +            continue;
> +
> +        av_write_trailer(rtp_ctx);
> +        avio_context_free(&rtp_ctx->pb);
> +        avformat_free_context(rtp_ctx);
> +        s->streams[i]->priv_data = NULL;
> +    }
> +
> +    av_freep(&rtc->avc_sps);
> +    av_freep(&rtc->avc_pps);
> +    av_freep(&rtc->sdp_offer);
> +    av_freep(&rtc->sdp_answer);
> +    av_freep(&rtc->whip_resource_url);
> +    av_freep(&rtc->ice_ufrag_remote);
> +    av_freep(&rtc->ice_pwd_remote);
> +    av_freep(&rtc->ice_protocol);
> +    av_freep(&rtc->ice_host);
> +    ffurl_closep(&rtc->udp_uc);
> +    ff_srtp_free(&rtc->srtp_audio_send);
> +    ff_srtp_free(&rtc->srtp_video_send);
> +    ff_srtp_free(&rtc->srtp_rtcp_send);
> +    ff_srtp_free(&rtc->srtp_recv);
> +    dtls_context_deinit(&rtc->dtls_ctx);
> +}
> +
> +#define OFFSET(x) offsetof(RTCContext, x)
> +#define DEC AV_OPT_FLAG_DECODING_PARAM
> +static const AVOption options[] = {
> +    { "ice_arq_max",        "Maximum number of retransmissions for the
> ICE ARQ mechanism",      OFFSET(ice_arq_max),        AV_OPT_TYPE_INT,    {
> .i64 = 5 },       -1, INT_MAX, DEC },
> +    { "ice_arq_timeout",    "Start timeout in milliseconds for the ICE
> ARQ mechanism",          OFFSET(ice_arq_timeout),    AV_OPT_TYPE_INT,    {
> .i64 = 30 },      -1, INT_MAX, DEC },
> +    { "dtls_arq_max",       "Maximum number of retransmissions for the
> DTLS ARQ mechanism",     OFFSET(dtls_arq_max),       AV_OPT_TYPE_INT,    {
> .i64 = 5 },       -1, INT_MAX, DEC },
> +    { "dtls_arq_timeout",   "Start timeout in milliseconds for the DTLS
> ARQ mechanism",         OFFSET(dtls_arq_timeout),   AV_OPT_TYPE_INT,    {
> .i64 = 50 },     -1, INT_MAX, DEC },
> +    { "pkt_size",           "The maximum size, in bytes, of RTP packets
> that send out",         OFFSET(pkt_size),           AV_OPT_TYPE_INT,    {
> .i64 = 1500 },    -1, INT_MAX, DEC },
> +    { NULL },
> +};
> +
> +static const AVClass rtc_muxer_class = {
> +    .class_name = "WebRTC muxer",
> +    .item_name  = av_default_item_name,
> +    .option     = options,
> +    .version    = LIBAVUTIL_VERSION_INT,
> +};
> +
> +const FFOutputFormat ff_rtc_muxer = {
> +    .p.name             = "rtc",
> +    .p.long_name        = NULL_IF_CONFIG_SMALL("WHIP WebRTC muxer"),
> +    .p.audio_codec      = AV_CODEC_ID_OPUS,
> +    .p.video_codec      = AV_CODEC_ID_H264,
> +    .p.flags            = AVFMT_GLOBALHEADER | AVFMT_NOFILE,
> +    .p.priv_class       = &rtc_muxer_class,
> +    .priv_data_size     = sizeof(RTCContext),
> +    .init               = rtc_init,
> +    .write_packet       = rtc_write_packet,
> +    .deinit             = rtc_deinit,
> +};
> --
> 2.40.0
>

This patchset is massive and unmaintainable. Not to mention only implements
a small portion of webrtc, that will inevitably get patched to add more
features and become a giant sprawl.

Kieran
Derek Buitenhuis May 31, 2023, 8:46 p.m. UTC | #3
Hello, quick skim below, followe by some waffling at the end.

I have not bothere to mention any code style nits, to keep it technical.

On 5/29/2023 12:50 PM, Steven Liu wrote:
> @@ -3483,6 +3483,7 @@ ogg_demuxer_select="dirac_parse"
>  ogv_muxer_select="ogg_muxer"
>  opus_muxer_select="ogg_muxer"
>  psp_muxer_select="mov_muxer"
> +rtc_muxer_deps_any="openssl"

Given we support a bunch of SSL libraries, how exactly does this work with that? Can
it be abstracted out? OpenSSL poses licensing issues for many.

Further, I think this is missing a bunch of dependencies like HTTP and AVC?
> +#ifndef CONFIG_OPENSSL
> +#error "DTLS is not supported, please enable openssl"
> +#else
> +#include <openssl/ssl.h>
> +#include <openssl/err.h>
> +#if OPENSSL_VERSION_NUMBER < 0x1010102fL
> +#error "OpenSSL version 1.1.1b or newer is required"
> +#endif
> +#endif

This should be in configure.

> +/*
> + * Supported DTLS cipher suites for FFmpeg as a DTLS client.
> + * These cipher suites are used to negotiate with DTLS servers.
> + *
> + * It is advisable to use a limited number of cipher suites to reduce
> + * the size of DTLS UDP packets.
> + */
> +#define DTLS_CIPHER_SUTES "ECDHE-ECDSA-AES128-GCM-SHA256"\
> +    ":ECDHE-RSA-AES128-GCM-SHA256"\
> +    ":ECDHE-ECDSA-AES128-SHA"\
> +    ":ECDHE-RSA-AES128-SHA"\
> +    ":ECDHE-ECDSA-AES256-SHA"\
> +    ":ECDHE-RSA-AES256-SHA"

The reason for this specific subset should be documented.

> +/**
> + * STAP-A stands for Single-Time Aggregation Packet.
> + * The NALU type for STAP-A is 24 (0x18).
> + */
> +#define NALU_TYPE_STAP_A 24

Why is a NALU type being defined in a protocol implementation file?

> +/**
> + * Wait for a small timeout in milliseconds to allow for the server to process
> + * the Interactive Connectivity Establishment (ICE) request. If we immediately
> + * read the response after sending the request, we may receive nothing and need
> + * to immediately retry. To lessen the likelihood of retries, we can send the
> + * request and wait for a small amount of time for the server to process it
> + * before reading the response.
> + */
> +#define ICE_PROCESSING_TIMEOUT 10

Are we really hardcoding sleeps? Surely a better way exists. What do other implementations
do here?

> +/**
> + * Wait for a short timeout in milliseconds to allow the server to process
> + * the Datagram Transport Layer Security (DTLS) request. If we immediately
> + * read the response after sending the request, we may receive nothing and
> + * need to immediately retry. To reduce the likelihood of retries, we can
> + * send the request and wait a short amount of time for the server to
> + * process it before attempting to read the response.
> + */
> +#define DTLS_PROCESSING_TIMEOUT 30

Same comment as above.

> +/**
> + * The maximum number of retries for Datagram Transport Layer Security (DTLS) EAGAIN errors.
> + * When we send a DTLS request and receive no response, we may encounter an EAGAIN error.
> + * In this situation, we wait briefly and attempt to read the response again.
> + * We limit the maximum number of times we retry this loop.
> + */
> +#define DTLS_EAGAIN_RETRIES_MAX 5

Should this be an option?

> +/**
> + * The DTLS timer's base timeout in microseconds. Its purpose is to minimize the unnecessary
> + * retransmission of ClientHello.
> + */
> +#define DTLS_SSL_TIMER_BASE 400 * 1000

Is this from a spec or arbitrary?

> +typedef struct DTLSContext {
> +    /* For av_log to write log to this category. */
> +    void *log_avcl;

Why is this void *?

> +    /**
> +     * This represents the material used to build the SRTP master key. It is
> +     * generated by DTLS and has the following layout:
> +     *          16B         16B         14B             14B
> +     *      client_key | server_key | client_salt | server_salt
> +     */
> +    uint8_t dtls_srtp_material[DTLS_SRTP_MASTER_KEY_LEN * 2];

*2?

> +/**
> + * Generate a self-signed certificate and private key for DTLS.
> + */

Does WHIP allow use of arbitrary certs (i.e. non-self-signed)?

> +static av_cold int dtls_context_init(DTLSContext *ctx)
> +{
> +    int ret = 0, serial, expire_day, i, n = 0;
> +    AVBPrint fingerprint;
> +    unsigned char md[EVP_MAX_MD_SIZE];
> +    const char *aor = "ffmpeg.org", *curve = NULL;

I am not sure the the FFmpeg community wants to advertise itself as a CN.

Mayb lavf?

> +    curve = "prime256v1";

For this and all other occurrences: They should be intialized at declaration
time, like the rest of the codebase.

> +    ctx->dtls_pkey = dtls_pkey = EVP_EC_gen(curve);
> +    if (!dtls_pkey) {
> +        av_log(s1, AV_LOG_ERROR, "DTLS: EVP_EC_gen curve=%s failed\n", curve);

For this and all ther logs: Something more descriptive than "function_name failed:"
should be used.

> +    serial = (int)av_get_random_seed();

For all uses of av_get_random_seed: Why are we using a seed and not a PRNG?

> +    if (!av_bprint_is_complete(&fingerprint)) {
> +        av_log(s1, AV_LOG_ERROR, "Fingerprint %d exceed max %d, %s\n", ret, MAX_SDP_SIZE, fingerprint.str);
> +        ret = AVERROR(EIO);
> +        goto end;
> +    }

Is this actually possible to trigger?

> +    if (!fingerprint.str || !strlen(fingerprint.str)) {
> +        av_log(s1, AV_LOG_ERROR, "Fingerprint is empty\n");
> +        ret = AVERROR(EINVAL);
> +        goto end;
> +    }

Same as above.

> +    /* never exceed the max timeout. */
> +    timeout_us = FFMIN(timeout_us, 30 * 1000 * 1000); /* in us */

Is this max based on anything?

> +    /* Change_cipher_spec(20), alert(21), handshake(22), application_data(23) */
> +    if (length >= 1)
> +        content_type = (uint8_t)data[0];
> +    if (length >= 13)
> +        size = (uint16_t)(data[11])<<8 | (uint16_t)data[12];

nit: AV_WB16?

> +    if (length >= 14)
> +        handshake_type = (uint8_t)data[13];

Casting uint8_t to uint8_t? Above, also.

> +    av_log(s1, AV_LOG_VERBOSE, "WHIP: DTLS state %s %s, done=%u, arq=%u, r0=%d, r1=%d, len=%u, cnt=%u, size=%u, hs=%u\n",
> +        "Active", (incoming? "RECV":"SEND"), ctx->dtls_done_for_us, ctx->dtls_arq_packets, r0, r1, length,
> +        content_type, size, handshake_type);

Does the use of this trace callback affec perf in any way? I have to assume not,
but thought I would check.

> +    /* For ECDSA, we could set the curves list. */
> +    if (SSL_CTX_set1_curves_list(dtls_ctx, "P-521:P-384:P-256") != 1) {

Where did this list come from?


> +    /* Only support SRTP_AES128_CM_SHA1_80, please read ssl/d1_srtp.c *> +    if (SSL_CTX_set_tlsext_use_srtp(dtls_ctx, "SRTP_AES128_CM_SHA1_80")) {

What is ssl/d1_srtp.c? It is not from FFmpeg.

> +        /* Wait so that the server can process the request and no need ARQ then. */
> +#if DTLS_PROCESSING_TIMEOUT > 0
> +        av_usleep(DTLS_PROCESSING_TIMEOUT * 10000);
> +#endif

This #if is a no-op.

> +
> +        for (j = 0; j <= DTLS_EAGAIN_RETRIES_MAX && !res_size; j++) {

Does this not make DTLS_EAGAIN_RETRIES_MAX a misnomer (due to use of <=)?

> +            /* Ignore other packets, such as ICE indication, except DTLS. */
> +            if (ret > 0 && (ret < 13 || buf[0] <= 19 || buf[0] >= 64))
> +                continue;

Magic numbers?

> +    if (!ctx->udp_uc) {
> +        av_log(s1, AV_LOG_ERROR, "DTLS: No UDP context\n");

Kind of odd this would be possible?

> +    av_log(s1, AV_LOG_INFO, "WHIP: DTLS handshake done=%d, arq=%d, srtp_material=%luB, cost=%dms\n",
> +        ctx->dtls_done_for_us, ctx->dtls_arq_packets, sizeof(ctx->dtls_srtp_material),
> +        (int)(av_gettime() - starttime) / 1000);

Bit verbose (here and other uses of INFO)?

Why are we logging the cost/time diff?

Also we do not use %lu, I think.
> +    /* The SPS/PPS of AVC video */
> +    uint8_t *avc_sps;
> +    int avc_sps_size;
> +    uint8_t *avc_pps;
> +    int avc_pps_size;
> +    /* The size of NALU in ISOM format. */
> +    int avc_nal_length_size;
> +

Will get into this in my waffling at the end...

> +    /* The SRTP send context, to encrypt outgoing packets. */
> +    struct SRTPContext srtp_audio_send;
> +    struct SRTPContext srtp_video_send;
> +    struct SRTPContext srtp_rtcp_send;
> +    /* The SRTP receive context, to decrypt incoming packets. */
> +    struct SRTPContext srtp_recv;

Not typedefed?

> +static int on_rtp_write_packet(void *opaque, uint8_t *buf, int buf_size);

Why not reorder the file instead of a same-file proto?

> +    rtc->dtls_ctx.dtls_arq_max = rtc->dtls_arq_max;
> +    rtc->dtls_ctx.dtls_arq_timeout = rtc->dtls_arq_timeout;
> +    rtc->dtls_ctx.pkt_size = rtc->pkt_size;

Why are these duplicated instead of the struct pointer embedded?



> +static int isom_read_avcc(AVFormatContext *s, uint8_t *extradata, int  extradata_size)

AS Kieran noted, duplicated code.
> +/**
> + * Generate SDP offer according to the codec parameters, DTLS and ICE information.
> + * The below is an example of SDP offer:

This comment and function need some serious work:
  * The example and comment explain exactly nothing about the SDP text being generated. It
    is an opaque blob.
  * It is illegal SDP.
  * We already have SDP facilties you should use.


> +    snprintf(rtc->ice_ufrag_local, sizeof(rtc->ice_ufrag_local), "%08x",
> +             av_get_random_seed());
> +    snprintf(rtc->ice_pwd_local, sizeof(rtc->ice_pwd_local), "%08x%08x%08x%08x",
> +             av_get_random_seed(), av_get_random_seed(), av_get_random_seed(),
> +             av_get_random_seed());

Use the PRNG... calling av_get_random_seed 5 times is a bit whack.

Also, is it limited to hex characters?

> +    rtc->audio_ssrc = av_get_random_seed();
> +    rtc->video_ssrc = av_get_random_seed();

Same as above.

> +    rtc->audio_payload_type = 111;
> +    rtc->video_payload_type = 106;

Magic numbers. I assume this is H.264 and Opus? Do not harcode.

> +
> +    av_bprintf(&bp, ""
> +        "v=0\r\n"
> +        "o=FFmpeg 4489045141692799359 2 IN IP4 127.0.0.1\r\n"

Hardcoding 127.0.0.1?

4489045141692799359?

> +    if (rtc->video_par) {
> +        profile = rtc->video_par->profile < 0 ? 0x42 : rtc->video_par->profile;
> +        level = rtc->video_par->level < 0 ? 30 : rtc->video_par->level;
> +        profile_iop = profile & FF_PROFILE_H264_CONSTRAINED;

Do not hardcode H264.

> +        av_bprintf(&bp, ""
> +            "m=video 9 UDP/TLS/RTP/SAVPF %u\r\n"
> +            "c=IN IP4 0.0.0.0\r\n"
> +            "a=ice-ufrag:%s\r\n"
> +            "a=ice-pwd:%s\r\n"
> +            "a=fingerprint:sha-256 %s\r\n"
> +            "a=setup:active\r\n"
> +            "a=mid:1\r\n"
> +            "a=sendonly\r\n"
> +            "a=msid:FFmpeg video\r\n"
> +            "a=rtcp-mux\r\n"
> +            "a=rtcp-rsize\r\n"
> +            "a=rtpmap:%u H264/90000\r\n"

Same.

> +            "a=fmtp:%u level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=%02x%02x%02x\r\n"

Same.

> +            "a=ssrc:%u cname:FFmpeg\r\n"
> +            "a=ssrc:%u msid:FFmpeg video\r\n",
> +            rtc->video_payload_type,
> +            rtc->ice_ufrag_local,
> +            rtc->ice_pwd_local,
> +            rtc->dtls_ctx.dtls_fingerprint,
> +            rtc->video_payload_type,
> +            rtc->video_payload_type,
> +            profile & (~FF_PROFILE_H264_CONSTRAINED),

Same.

> + * Exchange SDP offer with WebRTC peer to get the answer.
> + * The below is an example of SDP answer:

Same comment as my before: This is an opaque blob. 

> +/**
> + * Parses the ICE ufrag, pwd, and candidates from the SDP answer.

We have SDP facilities already.

> +    /* Write 20 bytes header */
> +    avio_wb16(pb, 0x0001); /* STUN binding request */
> +    avio_wb16(pb, 0);      /* length */
> +    avio_wb32(pb, STUN_MAGIC_COOKIE); /* magic cookie */
> +    avio_wb32(pb, av_get_random_seed()); /* transaction ID */
> +    avio_wb32(pb, av_get_random_seed()); /* transaction ID */
> +    avio_wb32(pb, av_get_random_seed()); /* transaction ID */

Use the PRNG.
> +    av_opt_set(rtc->udp_uc->priv_data, "connect", "1", 0);
> +    av_opt_set(rtc->udp_uc->priv_data, "fifo_size", "0", 0);
> +    /* Set the max packet size to the buffer size. */
> +    snprintf(tmp, sizeof(tmp), "%d", rtc->pkt_size);
> +    av_opt_set(rtc->udp_uc->priv_data, "pkt_size", tmp, 0);

This is as good of a place as any to ask: Do we plan to handle passing
AVOptions down to subcontexts?

> +static int setup_srtp(AVFormatContext *s)

> +    const char* suite = "AES_CM_128_HMAC_SHA1_80";

Source?

Same as before, re: subcontext AVOptions?


> +    /* Ignore if not RTP or RTCP packet. */
> +    if (buf_size < 12 || (buf[0] & 0xC0) != 0x80)
> +        return 0;
> +
> +    /* Only support audio, video and rtcp. */
> +    is_rtcp = buf[1] >= 192 && buf[1] <= 223;

Magic numbers.

> +    /**
> +     * For video, the STAP-A with SPS/PPS should:
> +     * 1. The marker bit should be 0, never be 1.
> +     * 2. The NRI should equal to the first NALU's.
> +     */
> +    if (is_video && buf_size > 12) {
> +        nalu_header = buf[12] & 0x1f;
> +        if (nalu_header == NALU_TYPE_STAP_A) {
> +            /* Reset the marker bit to 0. */
> +            if (buf[1] & 0x80)
> +                buf[1] &= 0x7f;
> +
> +            /* Reset the NRI to the first NALU's NRI. */
> +            if (buf_size > 15 && (buf[15]&0x60) != (buf[12]&0x60))
> +                buf[12] = (buf[12]&0x80) | (buf[15]&0x60) | (buf[12]&0x1f);
> +        }
> +    }

This seems hardcoded to H.264.


> +/**
> + * Inserts the SPS/PPS data before each IDR (Instantaneous Decoder Refresh) frame.
> + *
> + * The SPS/PPS is parsed from the extradata. If it's in ISOM format, the SPS/PPS is
> + * multiplexed to the data field of the packet. If it's in annexb format, then the entire
> + * extradata is set to the data field of the packet.

Why do we do it different ways instea of always munging it into the same way?

Also, do we have a BSF for this?

> +    /* For audio OPUS stream, correct the timestamp. */
> +    if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
> +        pkt->dts = pkt->pts = rtc->audio_jitter_base;
> +        rtc->audio_jitter_base += 960;
> +    }

1. Do not hardcode opus.
2. 960 is not always true for opus, it just happens to be the libopus default.

> +    ret = insert_sps_pps_packet(s, pkt);
> +    if (ret < 0) {
> +        av_log(s, AV_LOG_ERROR, "Failed to insert SPS/PPS packet\n");
> +        return ret;
> +    }

Do not hardcode H264.

> +static const AVClass rtc_muxer_class = {
> +    .class_name = "WebRTC muxer",

WHIP or WebRTC?

[...]

OK. So now on to my waffling.

This patch, to me, seems almost like a proof of concept rather than a fully fleshed out
(read: pushable) patch. I have few concerns:

1. If it is pushed in such a state, lacking so many things, my hunch, based on experience,
   is that it will stay that way: incomplete, just being maintained, rather than finished.
   Users will rightfully ask why it is like this, and will ask for it to be completed, but
   I expect nobody will. But there will be a blog post about how "FFmpeg has WebRTC now",
   like we saw with IPFS.
2. It is monolithic - WHIP, DTLS, ICE, STUN, SRTP, RTP, and SPD should really all be separate
   layers. Typing that out really does make one realize WebRTC suffers from deep IETF
   committee-ness...
3. The codec parts should probably be separated out like they are for RTP.
4. If our SDP facilities are not good enough for WHIP, they should be improved so that they
   are.

- Derek
diff mbox series

Patch

diff --git a/configure b/configure
index 495493aa0e..526ddc0bd6 100755
--- a/configure
+++ b/configure
@@ -3483,6 +3483,7 @@  ogg_demuxer_select="dirac_parse"
 ogv_muxer_select="ogg_muxer"
 opus_muxer_select="ogg_muxer"
 psp_muxer_select="mov_muxer"
+rtc_muxer_deps_any="openssl"
 rtp_demuxer_select="sdp_demuxer"
 rtp_mpegts_muxer_select="mpegts_muxer rtp_muxer"
 rtpdec_select="asf_demuxer mov_demuxer mpegts_demuxer rm_demuxer rtp_protocol srtp"
diff --git a/doc/muxers.texi b/doc/muxers.texi
index 31fca17dd6..00052f51b4 100644
--- a/doc/muxers.texi
+++ b/doc/muxers.texi
@@ -1333,6 +1333,56 @@  Set custom HTTP headers, can override built in default headers. Applicable only
 
 @end table
 
+@anchor{rtc}
+@section rtc
+
+WebRTC (Real-Time Communication) muxer that supports sub-second latency streaming according to
+the WHIP (WebRTC-HTTP ingestion protocol) specification.
+
+It uses HTTP as a signaling protocol to exchange SDP capabilities and ICE lite candidates. Then,
+it uses STUN binding requests and responses to establish a session over UDP. Subsequently, it
+initiates a DTLS handshake to exchange the SRTP encryption keys. Lastly, it splits video and
+audio frames into RTP packets and encrypts them using SRTP.
+
+Ensure that you use H.264 without B frames and Opus for the audio codec. For example, to convert
+an input file with @command{ffmpeg} to WebRTC:
+@example
+ffmpeg -re -i input.mp4 -acodec libopus -ar 48000 -ac 2 \
+  -vcodec libx264 -profile:v baseline -tune zerolatency -threads 1 -bf 0 \
+  -f rtc "http://localhost:1985/rtc/v1/whip/?app=live&stream=livestream"
+@end example
+
+For this example, we have employed low latency options, resulting in an end-to-end latency of
+approximately 150ms.
+
+@subsection Options
+
+This muxer supports the following options:
+
+@table @option
+
+@item ice_arq_max @var{size}
+Set the maximum number of retransmissions for the ICE ARQ mechanism.
+Default value is 5.
+
+@item ice_arq_timeout @var{size}
+Set the start timeout in milliseconds for the ICE ARQ mechanism.
+Default value is 30.
+
+@item dtls_arq_max @var{size}
+Set the maximum number of retransmissions for the DTLS ARQ mechanism.
+Default value is 5.
+
+@item dtls_arq_timeout @var{size}
+Set the start timeout in milliseconds for the DTLS ARQ mechanism.
+Default value is 50.
+
+@item pkt_size @var{size}
+Set the maximum size, in bytes, of RTP packets that send out.
+Default value is 1500.
+
+@end table
+
 @anchor{ico}
 @section ico
 
diff --git a/libavformat/Makefile b/libavformat/Makefile
index f8ad7c6a11..2a76e613b0 100644
--- a/libavformat/Makefile
+++ b/libavformat/Makefile
@@ -493,6 +493,7 @@  OBJS-$(CONFIG_RSD_DEMUXER)               += rsd.o
 OBJS-$(CONFIG_RPL_DEMUXER)               += rpl.o
 OBJS-$(CONFIG_RSO_DEMUXER)               += rsodec.o rso.o pcm.o
 OBJS-$(CONFIG_RSO_MUXER)                 += rsoenc.o rso.o rawenc.o
+OBJS-$(CONFIG_RTC_MUXER)                 += rtcenc.o avc.o http.o srtp.o
 OBJS-$(CONFIG_RTP_MPEGTS_MUXER)          += rtpenc_mpegts.o
 OBJS-$(CONFIG_RTP_MUXER)                 += rtp.o         \
                                             rtpenc_aac.o     \
diff --git a/libavformat/allformats.c b/libavformat/allformats.c
index efdb34e29d..56e852e5ea 100644
--- a/libavformat/allformats.c
+++ b/libavformat/allformats.c
@@ -392,6 +392,7 @@  extern const AVInputFormat  ff_rpl_demuxer;
 extern const AVInputFormat  ff_rsd_demuxer;
 extern const AVInputFormat  ff_rso_demuxer;
 extern const FFOutputFormat ff_rso_muxer;
+extern const FFOutputFormat ff_rtc_muxer;
 extern const AVInputFormat  ff_rtp_demuxer;
 extern const FFOutputFormat ff_rtp_muxer;
 extern const FFOutputFormat ff_rtp_mpegts_muxer;
diff --git a/libavformat/http.c b/libavformat/http.c
index 0817aafb5b..6f509656b5 100644
--- a/libavformat/http.c
+++ b/libavformat/http.c
@@ -530,6 +530,12 @@  int ff_http_averror(int status_code, int default_averror)
         return default_averror;
 }
 
+const char* ff_http_get_new_location(URLContext *h)
+{
+    HTTPContext *s = h->priv_data;
+    return s->new_location;
+}
+
 static int http_write_reply(URLContext* h, int status_code)
 {
     int ret, body = 0, reply_code, message_len;
diff --git a/libavformat/http.h b/libavformat/http.h
index 5f650ef143..d1b691826b 100644
--- a/libavformat/http.h
+++ b/libavformat/http.h
@@ -62,4 +62,6 @@  int ff_http_do_new_request2(URLContext *h, const char *uri, AVDictionary **optio
 
 int ff_http_averror(int status_code, int default_averror);
 
+const char* ff_http_get_new_location(URLContext *h);
+
 #endif /* AVFORMAT_HTTP_H */
diff --git a/libavformat/rtcenc.c b/libavformat/rtcenc.c
new file mode 100644
index 0000000000..65173d0073
--- /dev/null
+++ b/libavformat/rtcenc.c
@@ -0,0 +1,2208 @@ 
+/*
+ * WebRTC-HTTP ingestion protocol (WHIP) muxer
+ * Copyright (c) 2023 The FFmpeg Project
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+
+#ifndef CONFIG_OPENSSL
+#error "DTLS is not supported, please enable openssl"
+#else
+#include <openssl/ssl.h>
+#include <openssl/err.h>
+#if OPENSSL_VERSION_NUMBER < 0x1010102fL
+#error "OpenSSL version 1.1.1b or newer is required"
+#endif
+#endif
+
+#include "libavcodec/avcodec.h"
+#include "libavutil/base64.h"
+#include "libavutil/bprint.h"
+#include "libavutil/crc.h"
+#include "libavutil/hmac.h"
+#include "libavutil/opt.h"
+#include "libavutil/random_seed.h"
+#include "libavutil/time.h"
+#include "avc.h"
+#include "avio_internal.h"
+#include "http.h"
+#include "internal.h"
+#include "mux.h"
+#include "network.h"
+#include "srtp.h"
+
+/**
+ * Maximum size limit of a Session Description Protocol (SDP),
+ * be it an offer or answer.
+ */
+#define MAX_SDP_SIZE 8192
+/**
+ * Maximum size of the buffer for sending and receiving UDP packets.
+ * Please note that this size does not limit the size of the UDP packet that can be sent.
+ * To set the limit for packet size, modify the `pkt_size` parameter.
+ * For instance, it is possible to set the UDP buffer to 4096 to send or receive packets,
+ * but please keep in mind that the `pkt_size` option limits the packet size to 1400.
+ */
+#define MAX_UDP_BUFFER_SIZE 4096
+/*
+ * Supported DTLS cipher suites for FFmpeg as a DTLS client.
+ * These cipher suites are used to negotiate with DTLS servers.
+ *
+ * It is advisable to use a limited number of cipher suites to reduce
+ * the size of DTLS UDP packets.
+ */
+#define DTLS_CIPHER_SUTES "ECDHE-ECDSA-AES128-GCM-SHA256"\
+    ":ECDHE-RSA-AES128-GCM-SHA256"\
+    ":ECDHE-ECDSA-AES128-SHA"\
+    ":ECDHE-RSA-AES128-SHA"\
+    ":ECDHE-ECDSA-AES256-SHA"\
+    ":ECDHE-RSA-AES256-SHA"
+/**
+ * The size of the Secure Real-time Transport Protocol (SRTP) master key material
+ * that is exported by Secure Sockets Layer (SSL) after a successful Datagram
+ * Transport Layer Security (DTLS) handshake. This material consists of a key
+ * of 16 bytes and a salt of 14 bytes.
+ *
+ * The material is exported by SSL in the following format: client_key (16 bytes) |
+ * server_key (16 bytes) | client_salt (14 bytes) | server_salt (14 bytes).
+ */
+#define DTLS_SRTP_MASTER_KEY_LEN 30
+/**
+ * The maximum size of the Secure Real-time Transport Protocol (SRTP) HMAC checksum
+ * and padding that is appended to the end of the packet. To calculate the maximum
+ * size of the User Datagram Protocol (UDP) packet that can be sent out, subtract
+ * this size from the `pkt_size`.
+ */
+#define DTLS_SRTP_CHECKSUM_LEN 16
+/**
+ * STAP-A stands for Single-Time Aggregation Packet.
+ * The NALU type for STAP-A is 24 (0x18).
+ */
+#define NALU_TYPE_STAP_A 24
+
+/**
+ * Wait for a small timeout in milliseconds to allow for the server to process
+ * the Interactive Connectivity Establishment (ICE) request. If we immediately
+ * read the response after sending the request, we may receive nothing and need
+ * to immediately retry. To lessen the likelihood of retries, we can send the
+ * request and wait for a small amount of time for the server to process it
+ * before reading the response.
+ */
+#define ICE_PROCESSING_TIMEOUT 10
+/**
+ * Wait for a short timeout in milliseconds to allow the server to process
+ * the Datagram Transport Layer Security (DTLS) request. If we immediately
+ * read the response after sending the request, we may receive nothing and
+ * need to immediately retry. To reduce the likelihood of retries, we can
+ * send the request and wait a short amount of time for the server to
+ * process it before attempting to read the response.
+ */
+#define DTLS_PROCESSING_TIMEOUT 30
+/**
+ * The maximum number of retries for Datagram Transport Layer Security (DTLS) EAGAIN errors.
+ * When we send a DTLS request and receive no response, we may encounter an EAGAIN error.
+ * In this situation, we wait briefly and attempt to read the response again.
+ * We limit the maximum number of times we retry this loop.
+ */
+#define DTLS_EAGAIN_RETRIES_MAX 5
+
+/**
+ * The DTLS timer's base timeout in microseconds. Its purpose is to minimize the unnecessary
+ * retransmission of ClientHello.
+ */
+#define DTLS_SSL_TIMER_BASE 400 * 1000
+
+/* The magic cookie for Session Traversal Utilities for NAT (STUN) messages. */
+#define STUN_MAGIC_COOKIE 0x2112A442
+
+/* STUN Attribute, comprehension-required range (0x0000-0x7FFF) */
+enum StunAttr {
+    STUN_ATTR_USERNAME                  = 0x0006, /// shared secret response/bind request
+    STUN_ATTR_USE_CANDIDATE             = 0x0025, /// bind request
+    STUN_ATTR_MESSAGE_INTEGRITY         = 0x0008, /// bind request/response
+    STUN_ATTR_FINGERPRINT               = 0x8028, /// rfc5389
+};
+
+typedef struct DTLSContext {
+    /* For av_log to write log to this category. */
+    void *log_avcl;
+
+    /* The private key for DTLS handshake. */
+    EVP_PKEY *dtls_pkey;
+    /* The SSL certificate used for fingerprint in SDP and DTLS handshake. */
+    X509 *dtls_cert;
+    /* The fingerprint of certificate, used in SDP offer. */
+    char *dtls_fingerprint;
+
+    /**
+     * This represents the material used to build the SRTP master key. It is
+     * generated by DTLS and has the following layout:
+     *          16B         16B         14B             14B
+     *      client_key | server_key | client_salt | server_salt
+     */
+    uint8_t dtls_srtp_material[DTLS_SRTP_MASTER_KEY_LEN * 2];
+
+    /* Whether the timer should be reset. */
+    int dtls_should_reset_timer;
+    /* Whether the DTLS is done at least for us. */
+    int dtls_done_for_us;
+    /* The number of packets retransmitted for DTLS. */
+    int dtls_arq_packets;
+
+    /* The UDP transport is used for delivering ICE, DTLS and SRTP packets. */
+    URLContext *udp_uc;
+
+    /* The maximum number of retries for DTLS transmission. */
+    int dtls_arq_max;
+    /* The step start timeout in ms for DTLS transmission. */
+    int dtls_arq_timeout;
+    /* The size of RTP packet, should generally be set to MTU. */
+    int pkt_size;
+} DTLSContext;
+
+/**
+ * Generate a self-signed certificate and private key for DTLS.
+ */
+static av_cold int dtls_context_init(DTLSContext *ctx)
+{
+    int ret = 0, serial, expire_day, i, n = 0;
+    AVBPrint fingerprint;
+    unsigned char md[EVP_MAX_MD_SIZE];
+    const char *aor = "ffmpeg.org", *curve = NULL;
+    X509_NAME* subject = NULL;
+    EVP_PKEY *dtls_pkey = NULL;
+    X509 *dtls_cert = NULL;
+    void *s1 = ctx->log_avcl;
+
+    /* To prevent a crash during cleanup, always initialize it. */
+    av_bprint_init(&fingerprint, 1, MAX_SDP_SIZE);
+
+    ctx->dtls_cert = dtls_cert = X509_new();
+    if (!dtls_cert) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    /* Should use the curves in ClientHello.supported_groups, for example:
+     *      Supported Group: x25519 (0x001d)
+     *      Supported Group: secp256r1 (0x0017)
+     *      Supported Group: secp384r1 (0x0018)
+     * Note that secp256r1 in openssl is called NID_X9_62_prime256v1 or prime256v1 in string,
+     * not NID_secp256k1 or secp256k1 in string
+     */
+    curve = "prime256v1";
+    ctx->dtls_pkey = dtls_pkey = EVP_EC_gen(curve);
+    if (!dtls_pkey) {
+        av_log(s1, AV_LOG_ERROR, "DTLS: EVP_EC_gen curve=%s failed\n", curve);
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    /* Generate a self-signed certificate. */
+    subject = X509_NAME_new();
+    if (!subject) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    serial = (int)av_get_random_seed();
+    if (ASN1_INTEGER_set(X509_get_serialNumber(dtls_cert), serial) != 1) {
+        av_log(s1, AV_LOG_ERROR, "WHIP: Failed to set serial\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    if (X509_NAME_add_entry_by_txt(subject, "CN", MBSTRING_ASC, aor, strlen(aor), -1, 0) != 1) {
+        av_log(s1, AV_LOG_ERROR, "WHIP: Failed to set CN\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    if (X509_set_issuer_name(dtls_cert, subject) != 1) {
+        av_log(s1, AV_LOG_ERROR, "WHIP: Failed to set issuer\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+    if (X509_set_subject_name(dtls_cert, subject) != 1) {
+        av_log(s1, AV_LOG_ERROR, "WHIP: Failed to set subject name\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    expire_day = 365;
+    if (!X509_gmtime_adj(X509_get_notBefore(dtls_cert), 0)) {
+        av_log(s1, AV_LOG_ERROR, "WHIP: Failed to set notBefore\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+    if (!X509_gmtime_adj(X509_get_notAfter(dtls_cert), 60*60*24*expire_day)) {
+        av_log(s1, AV_LOG_ERROR, "WHIP: Failed to set notAfter\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    if (X509_set_version(dtls_cert, 2) != 1) {
+        av_log(s1, AV_LOG_ERROR, "WHIP: Failed to set version\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    if (X509_set_pubkey(dtls_cert, dtls_pkey) != 1) {
+        av_log(s1, AV_LOG_ERROR, "WHIP: Failed to set public key\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    if (!X509_sign(dtls_cert, dtls_pkey, EVP_sha1())) {
+        av_log(s1, AV_LOG_ERROR, "WHIP: Failed to sign certificate\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    /* Generate the fingerpint of certficate. */
+    if (X509_digest(dtls_cert, EVP_sha256(), md, &n) != 1) {
+        av_log(s1, AV_LOG_ERROR, "Failed to generate fingerprint\n");
+        ret = AVERROR(EIO);
+        goto end;
+    }
+    for (i = 0; i < n; i++) {
+        av_bprintf(&fingerprint, "%02X", md[i]);
+        if (i < n - 1)
+            av_bprintf(&fingerprint, ":");
+    }
+    if (!av_bprint_is_complete(&fingerprint)) {
+        av_log(s1, AV_LOG_ERROR, "Fingerprint %d exceed max %d, %s\n", ret, MAX_SDP_SIZE, fingerprint.str);
+        ret = AVERROR(EIO);
+        goto end;
+    }
+    if (!fingerprint.str || !strlen(fingerprint.str)) {
+        av_log(s1, AV_LOG_ERROR, "Fingerprint is empty\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    ctx->dtls_fingerprint = av_strdup(fingerprint.str);
+    if (!ctx->dtls_fingerprint) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    av_log(s1, AV_LOG_INFO, "DTLS: Curve=%s, fingerprint %s\n", curve, ctx->dtls_fingerprint);
+
+end:
+    X509_NAME_free(subject);
+    av_bprint_finalize(&fingerprint, NULL);
+    return ret;
+}
+
+/**
+ * Cleanup the DTLS context.
+ */
+static av_cold void dtls_context_deinit(DTLSContext *ctx)
+{
+    X509_free(ctx->dtls_cert);
+    EVP_PKEY_free(ctx->dtls_pkey);
+    av_freep(&ctx->dtls_fingerprint);
+}
+
+/**
+ * Callback function to print the OpenSSL SSL status.
+ */
+static void openssl_on_info(const SSL *dtls, int where, int ret)
+{
+    int w, r1;
+    const char *method = "undefined", *alert_type, *alert_desc;
+    DTLSContext *ctx = (DTLSContext*)SSL_get_ex_data(dtls, 0);
+    void *s1 = ctx->log_avcl;
+
+    w = where & ~SSL_ST_MASK;
+    if (w & SSL_ST_CONNECT)
+        method = "SSL_connect";
+    else if (w & SSL_ST_ACCEPT)
+        method = "SSL_accept";
+
+    r1 = SSL_get_error(dtls, ret);
+    if (where & SSL_CB_LOOP) {
+        av_log(s1, AV_LOG_VERBOSE, "DTLS: method=%s state=%s(%s), where=%d, ret=%d, r1=%d\n",
+            method, SSL_state_string(dtls), SSL_state_string_long(dtls), where, ret, r1);
+    } else if (where & SSL_CB_ALERT) {
+        method = (where & SSL_CB_READ) ? "read":"write";
+
+        alert_type = SSL_alert_type_string_long(ret);
+        alert_desc = SSL_alert_desc_string(ret);
+
+        if (!av_strcasecmp(alert_type, "warning") && !av_strcasecmp(alert_desc, "CN"))
+            av_log(s1, AV_LOG_WARNING, "DTLS: SSL3 alert method=%s type=%s, desc=%s(%s), where=%d, ret=%d, r1=%d\n",
+                method, alert_type, alert_desc, SSL_alert_desc_string_long(ret), where, ret, r1);
+        else
+            av_log(s1, AV_LOG_ERROR, "DTLS: SSL3 alert method=%s type=%s, desc=%s(%s), where=%d, ret=%d, r1=%d\n",
+                method, alert_type, alert_desc, SSL_alert_desc_string_long(ret), where, ret, r1);
+    } else if (where & SSL_CB_EXIT) {
+        if (!ret)
+            av_log(s1, AV_LOG_WARNING, "DTLS: Fail method=%s state=%s(%s), where=%d, ret=%d, r1=%d\n",
+                method, SSL_state_string(dtls), SSL_state_string_long(dtls), where, ret, r1);
+        else if (ret < 0)
+            if (r1 != SSL_ERROR_NONE && r1 != SSL_ERROR_WANT_READ && r1 != SSL_ERROR_WANT_WRITE)
+                av_log(s1, AV_LOG_ERROR, "DTLS: Error method=%s state=%s(%s), where=%d, ret=%d, r1=%d\n",
+                    method, SSL_state_string(dtls), SSL_state_string_long(dtls), where, ret, r1);
+            else
+                av_log(s1, AV_LOG_VERBOSE, "DTLS: method=%s state=%s(%s), where=%d, ret=%d, r1=%d\n",
+                    method, SSL_state_string(dtls), SSL_state_string_long(dtls), where, ret, r1);
+    }
+}
+
+static unsigned int openssl_dtls_timer_cb(SSL *dtls, unsigned int previous_us)
+{
+    DTLSContext *ctx = (DTLSContext*)SSL_get_ex_data(dtls, 0);
+    void *s1 = ctx->log_avcl;
+
+    /* Double the timeout, note that it may be 0. */
+    unsigned int timeout_us = previous_us * 2;
+
+    /* If previous_us is 0, for example, the HelloVerifyRequest, we should respond it ASAP.
+     * when got ServerHello, we should reset the timer. */
+    if (!previous_us || ctx->dtls_should_reset_timer)
+        timeout_us =  DTLS_SSL_TIMER_BASE + ctx->dtls_arq_timeout * 1000; /* in us */
+
+    /* never exceed the max timeout. */
+    timeout_us = FFMIN(timeout_us, 30 * 1000 * 1000); /* in us */
+
+    av_log(s1, AV_LOG_VERBOSE, "DTLS: ARQ timer cb timeout=%ums, previous=%ums\n",
+        timeout_us / 1000, previous_us / 1000);
+
+    return timeout_us;
+}
+
+static void openssl_state_trace(DTLSContext *ctx, uint8_t *data, int length, int incoming, int r0, int r1)
+{
+    uint8_t content_type = 0;
+    uint16_t size = 0;
+    uint8_t handshake_type = 0;
+    void *s1 = ctx->log_avcl;
+
+    /* Change_cipher_spec(20), alert(21), handshake(22), application_data(23) */
+    if (length >= 1)
+        content_type = (uint8_t)data[0];
+    if (length >= 13)
+        size = (uint16_t)(data[11])<<8 | (uint16_t)data[12];
+    if (length >= 14)
+        handshake_type = (uint8_t)data[13];
+
+    av_log(s1, AV_LOG_VERBOSE, "WHIP: DTLS state %s %s, done=%u, arq=%u, r0=%d, r1=%d, len=%u, cnt=%u, size=%u, hs=%u\n",
+        "Active", (incoming? "RECV":"SEND"), ctx->dtls_done_for_us, ctx->dtls_arq_packets, r0, r1, length,
+        content_type, size, handshake_type);
+}
+
+/**
+ * Initializes DTLS context for client role using ECDHE.
+ */
+static av_cold int openssl_init_dtls_context(DTLSContext *ctx, SSL_CTX *dtls_ctx)
+{
+    int ret = 0;
+    void *s1 = ctx->log_avcl;
+    EVP_PKEY *dtls_pkey = ctx->dtls_pkey;
+    X509 *dtls_cert = ctx->dtls_cert;
+
+    /* For ECDSA, we could set the curves list. */
+    if (SSL_CTX_set1_curves_list(dtls_ctx, "P-521:P-384:P-256") != 1) {
+        av_log(s1, AV_LOG_ERROR, "DTLS: SSL_CTX_set1_curves_list failed\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    /* We use "ALL", while you can use "DEFAULT" means "ALL:!EXPORT:!LOW:!aNULL:!eNULL:!SSLv2" */
+    if (SSL_CTX_set_cipher_list(dtls_ctx, DTLS_CIPHER_SUTES) != 1) {
+        av_log(s1, AV_LOG_ERROR, "DTLS: SSL_CTX_set_cipher_list failed\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+    /* Setup the certificate. */
+    if (SSL_CTX_use_certificate(dtls_ctx, dtls_cert) != 1) {
+        av_log(s1, AV_LOG_ERROR, "DTLS: SSL_CTX_use_certificate failed\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+    if (SSL_CTX_use_PrivateKey(dtls_ctx, dtls_pkey) != 1) {
+        av_log(s1, AV_LOG_ERROR, "DTLS: SSL_CTX_use_PrivateKey failed\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+    /* The depth count is "level 0:peer certificate", "level 1: CA certificate",
+     * "level 2: higher level CA certificate", and so on. */
+    SSL_CTX_set_verify_depth(dtls_ctx, 4);
+    /* Whether we should read as many input bytes as possible (for non-blocking reads) or not. */
+    SSL_CTX_set_read_ahead(dtls_ctx, 1);
+    /* Only support SRTP_AES128_CM_SHA1_80, please read ssl/d1_srtp.c */
+    if (SSL_CTX_set_tlsext_use_srtp(dtls_ctx, "SRTP_AES128_CM_SHA1_80")) {
+        av_log(s1, AV_LOG_ERROR, "DTLS: SSL_CTX_set_tlsext_use_srtp failed\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+end:
+    return ret;
+}
+
+/**
+ * After creating a DTLS context, initialize the DTLS SSL object.
+ */
+static av_cold int openssl_init_dtls_ssl(DTLSContext *ctx, SSL *dtls)
+{
+    int ret = 0;
+
+    /* Setup the callback for logging. */
+    SSL_set_ex_data(dtls, 0, ctx);
+    SSL_set_info_callback(dtls, openssl_on_info);
+
+    /* Set dtls fragment size */
+    SSL_set_options(dtls, SSL_OP_NO_QUERY_MTU);
+    /* Avoid dtls negotiate failed, limit the max size of DTLS fragment. */
+    SSL_set_mtu(dtls, ctx->pkt_size);
+
+    /* Set the callback for ARQ timer. */
+    DTLS_set_timer_cb(dtls, openssl_dtls_timer_cb);
+
+    /* Setup DTLS as active, which is client role. */
+    SSL_set_connect_state(dtls);
+    SSL_set_max_send_fragment(dtls, ctx->pkt_size);
+
+    return ret;
+}
+
+/**
+ * Drives the SSL context by attempting to read packets to send from SSL, sending them
+ * over UDP, and then reading packets from UDP to feed back to SSL.
+ */
+static int openssl_drive_context(DTLSContext *ctx, SSL *dtls, BIO *bio_in, BIO *bio_out, int loop)
+{
+    int ret, i, j, r0, r1, req_size, res_size = 0;
+    uint8_t *data = NULL, req_ct = 0, req_ht = 0, res_ct = 0, res_ht = 0;
+    char buf[MAX_UDP_BUFFER_SIZE];
+    void *s1 = ctx->log_avcl;
+
+    /* Drive the SSL context by state change, arq or response messages. */
+    r0 = SSL_do_handshake(dtls);
+    r1 = SSL_get_error(dtls, r0);
+
+    /* Handshake successfully done */
+    if (r0 == 1) {
+        ctx->dtls_done_for_us = 1;
+        return 0;
+    }
+
+    /* Handshake failed with fatal error */
+    if (r0 < 0 && r1 != SSL_ERROR_WANT_READ) {
+        av_log(s1, AV_LOG_ERROR, "DTLS: Start handshake failed, loop=%d, r0=%d, r1=%d\n", loop, r0, r1);
+        return AVERROR(EIO);
+    }
+
+    /* Fast retransmit the request util got response. */
+    for (i = 0; i <= ctx->dtls_arq_max && !res_size; i++) {
+        req_size = BIO_get_mem_data(bio_out, (char**)&data);
+        openssl_state_trace(ctx, data, req_size, 0, r0, r1);
+        ret = ffurl_write(ctx->udp_uc, data, req_size);
+        BIO_reset(bio_out);
+        req_ct = req_size > 0 ? data[0] : 0;
+        req_ht = req_size > 13 ? data[13] : 0;
+        if (ret < 0) {
+            av_log(s1, AV_LOG_ERROR, "DTLS: Send request failed, loop=%d, content=%d, handshake=%d, size=%d\n",
+                loop, req_ct, req_ht, req_size);
+            return ret;
+        }
+
+        /* Wait so that the server can process the request and no need ARQ then. */
+#if DTLS_PROCESSING_TIMEOUT > 0
+        av_usleep(DTLS_PROCESSING_TIMEOUT * 10000);
+#endif
+
+        for (j = 0; j <= DTLS_EAGAIN_RETRIES_MAX && !res_size; j++) {
+            ret = ffurl_read(ctx->udp_uc, buf, sizeof(buf));
+
+            /* Ignore other packets, such as ICE indication, except DTLS. */
+            if (ret > 0 && (ret < 13 || buf[0] <= 19 || buf[0] >= 64))
+                continue;
+
+            /* Got DTLS response successfully. */
+            if (ret > 0) {
+                res_size = ret;
+                ctx->dtls_should_reset_timer = 1;
+                break;
+            }
+
+            /* Fatal error or timeout. */
+            if (ret != AVERROR(EAGAIN)) {
+                av_log(s1, AV_LOG_ERROR, "DTLS: Read response failed, loop=%d, content=%d, handshake=%d\n",
+                    loop, req_ct, req_ht);
+                return ret;
+            }
+
+            /* DTLSv1_handle_timeout is called when a DTLS handshake timeout expires. If no timeout
+             * had expired, it returns 0. Otherwise, it retransmits the previous flight of handshake
+             * messages and returns 1. If too many timeouts had expired without progress or an error
+             * occurs, it returns -1. */
+            r0 = DTLSv1_handle_timeout(dtls);
+            if (!r0) {
+                av_usleep(DTLS_SSL_TIMER_BASE + ctx->dtls_arq_timeout * 1000);
+                continue; /* no timeout had expired. */
+            }
+            if (r0 != 1) {
+                r1 = SSL_get_error(dtls, r0);
+                av_log(s1, AV_LOG_ERROR, "DTLS: Handle timeout, loop=%d, content=%d, handshake=%d, r0=%d, r1=%d\n",
+                    loop, req_ct, req_ht, r0, r1);
+                return AVERROR(EIO);
+            }
+
+            ctx->dtls_arq_packets++;
+            break;
+        }
+    }
+
+    /* Trace the response packet, feed to SSL. */
+    BIO_reset(bio_in);
+    openssl_state_trace(ctx, buf, res_size, 1, r0, SSL_ERROR_NONE);
+    res_ct = res_size > 0 ? buf[0]: 0;
+    res_ht = res_size > 13 ? buf[13] : 0;
+    if ((r0 = BIO_write(bio_in, buf, res_size)) <= 0) {
+        av_log(s1, AV_LOG_ERROR, "DTLS: Feed response failed, loop=%d, content=%d, handshake=%d, size=%d, r0=%d\n",
+            loop, res_ct, res_ht, res_size, r0);
+        return AVERROR(EIO);
+    }
+
+    return ret;
+}
+
+/**
+ * DTLS handshake with server, as a client in active mode, using openssl.
+ *
+ * This function initializes the SSL context as the client role using OpenSSL and
+ * then performs the DTLS handshake until success. Upon successful completion, it
+ * exports the SRTP material key.
+ *
+ * @return 0 if OK, AVERROR_xxx on error
+ */
+static int dtls_context_handshake(DTLSContext *ctx)
+{
+    int ret, loop;
+    SSL_CTX *dtls_ctx = NULL;
+    SSL *dtls = NULL;
+    const char* dst = "EXTRACTOR-dtls_srtp";
+    BIO *bio_in = NULL, *bio_out = NULL;
+    int64_t starttime = av_gettime();
+    void *s1 = ctx->log_avcl;
+
+    dtls_ctx = SSL_CTX_new(DTLS_client_method());
+    if (!dtls_ctx) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    if (!ctx->udp_uc) {
+        av_log(s1, AV_LOG_ERROR, "DTLS: No UDP context\n");
+        ret = AVERROR(EIO);
+        goto end;
+    }
+
+    ret = openssl_init_dtls_context(ctx, dtls_ctx);
+    if (ret < 0) {
+        av_log(s1, AV_LOG_ERROR, "Failed to initialize DTLS context\n");
+        goto end;
+    }
+
+    /* The dtls should not be created unless the dtls_ctx has been initialized. */
+    dtls = SSL_new(dtls_ctx);
+    if (!dtls) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    bio_in = BIO_new(BIO_s_mem());
+    if (!bio_in) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    bio_out = BIO_new(BIO_s_mem());
+    if (!bio_out) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    SSL_set_bio(dtls, bio_in, bio_out);
+
+    ret = openssl_init_dtls_ssl(ctx, dtls);
+    if (ret < 0) {
+        av_log(s1, AV_LOG_ERROR, "Failed to initialize SSL context\n");
+        goto end;
+    }
+
+    for (loop = 0; loop < 64 && !ctx->dtls_done_for_us; loop++) {
+        ret = openssl_drive_context(ctx, dtls, bio_in, bio_out, loop);
+        if (ret < 0) {
+            av_log(s1, AV_LOG_ERROR, "Failed to drive SSL context, cost=%dms\n",
+                (int)(av_gettime() - starttime) / 1000);
+            goto end;
+        }
+    }
+    if (!ctx->dtls_done_for_us) {
+        av_log(s1, AV_LOG_ERROR, "DTLS: Handshake failed, loop=%d\n", loop);
+        ret = AVERROR(EIO);
+        goto end;
+    }
+
+    /* Export SRTP master key after DTLS done */
+    ret = SSL_export_keying_material(dtls, ctx->dtls_srtp_material, sizeof(ctx->dtls_srtp_material),
+        dst, strlen(dst), NULL, 0, 0);
+    if (!ret) {
+        av_log(s1, AV_LOG_ERROR, "DTLS: SSL export key r0=%lu, ret=%d\n", ERR_get_error(), ret);
+        ret = AVERROR(EIO);
+        goto end;
+    }
+
+    av_log(s1, AV_LOG_INFO, "WHIP: DTLS handshake done=%d, arq=%d, srtp_material=%luB, cost=%dms\n",
+        ctx->dtls_done_for_us, ctx->dtls_arq_packets, sizeof(ctx->dtls_srtp_material),
+        (int)(av_gettime() - starttime) / 1000);
+
+end:
+    SSL_free(dtls);
+    SSL_CTX_free(dtls_ctx);
+    return ret;
+}
+
+typedef struct RTCContext {
+    AVClass *av_class;
+
+    /* Parameters for the input audio and video codecs. */
+    AVCodecParameters *audio_par;
+    AVCodecParameters *video_par;
+
+    /* The SPS/PPS of AVC video */
+    uint8_t *avc_sps;
+    int avc_sps_size;
+    uint8_t *avc_pps;
+    int avc_pps_size;
+    /* The size of NALU in ISOM format. */
+    int avc_nal_length_size;
+
+    /* The ICE username and pwd fragment generated by the muxer. */
+    char ice_ufrag_local[9];
+    char ice_pwd_local[33];
+    /* The SSRC of the audio and video stream, generated by the muxer. */
+    uint32_t audio_ssrc;
+    uint32_t video_ssrc;
+    /* The PT(Payload Type) of stream, generated by the muxer. */
+    uint8_t audio_payload_type;
+    uint8_t video_payload_type;
+    /**
+     * This is the SDP offer generated by the muxer based on the codec parameters,
+     * DTLS, and ICE information.
+     */
+    char *sdp_offer;
+
+    /* The ICE username and pwd from remote server. */
+    char *ice_ufrag_remote;
+    char *ice_pwd_remote;
+    /**
+     * This represents the ICE candidate protocol, priority, host and port.
+     * Currently, we only support one candidate and choose the first UDP candidate.
+     * However, we plan to support multiple candidates in the future.
+     */
+    char *ice_protocol;
+    char *ice_host;
+    int ice_port;
+
+    /* The SDP answer received from the WebRTC server. */
+    char *sdp_answer;
+    /* The resource URL returned in the Location header of WHIP HTTP response. */
+    char *whip_resource_url;
+
+    /* The DTLS context. */
+    DTLSContext dtls_ctx;
+
+    /* The SRTP send context, to encrypt outgoing packets. */
+    struct SRTPContext srtp_audio_send;
+    struct SRTPContext srtp_video_send;
+    struct SRTPContext srtp_rtcp_send;
+    /* The SRTP receive context, to decrypt incoming packets. */
+    struct SRTPContext srtp_recv;
+
+    /* The time jitter base for audio OPUS stream. */
+    int64_t audio_jitter_base;
+
+    /* The UDP transport is used for delivering ICE, DTLS and SRTP packets. */
+    URLContext *udp_uc;
+
+    /* The maximum number of retries for ICE transmission. */
+    int ice_arq_max;
+    /* The step start timeout in ms for ICE transmission. */
+    int ice_arq_timeout;
+    /* The maximum number of retries for DTLS transmission. */
+    int dtls_arq_max;
+    /* The step start timeout in ms for DTLS transmission. */
+    int dtls_arq_timeout;
+    /* The size of RTP packet, should generally be set to MTU. */
+    int pkt_size;
+} RTCContext;
+
+static int on_rtp_write_packet(void *opaque, uint8_t *buf, int buf_size);
+
+/**
+ * Initialize and check the options for the WebRTC muxer.
+ */
+static av_cold int whip_init(AVFormatContext *s)
+{
+    int ret, ideal_pkt_size = 532;
+    RTCContext *rtc = s->priv_data;
+
+    /* Use the same logging context as AV format. */
+    rtc->dtls_ctx.log_avcl = s;
+    rtc->dtls_ctx.dtls_arq_max = rtc->dtls_arq_max;
+    rtc->dtls_ctx.dtls_arq_timeout = rtc->dtls_arq_timeout;
+    rtc->dtls_ctx.pkt_size = rtc->pkt_size;
+
+    if ((ret = dtls_context_init(&rtc->dtls_ctx)) < 0) {
+        av_log(s, AV_LOG_ERROR, "WHIP: Failed to init DTLS context\n");
+        return ret;
+    }
+
+    av_log(s, AV_LOG_INFO, "WHIP: Init ice_arq_max=%d, ice_arq_timeout=%d, dtls_arq_max=%d, dtls_arq_timeout=%d pkt_size=%d\n",
+        rtc->ice_arq_max, rtc->ice_arq_timeout, rtc->dtls_arq_max, rtc->dtls_arq_timeout, rtc->pkt_size);
+
+    if (rtc->pkt_size < ideal_pkt_size)
+        av_log(s, AV_LOG_WARNING, "WHIP: pkt_size=%d(<%d) is too small, may cause packet loss\n",
+            rtc->pkt_size, ideal_pkt_size);
+
+    return 0;
+}
+
+/**
+ * Parses the ISOM AVCC format of extradata and extracts SPS/PPS.
+ *
+ * This function is used to parse SPS/PPS from the extradata in ISOM AVCC format.
+ * It can handle both ISOM and annexb formats but only parses data in ISOM format.
+ * If the extradata is in annexb format, this function ignores it, and uses the entire
+ * extradata as a sequence header with SPS/PPS. Refer to ff_isom_write_avcc.
+ *
+ * @param s                Pointer to the AVFormatContext
+ * @param extradata        Pointer to the extradata
+ * @param extradata_size   Size of the extradata
+ * @returns                Returns 0 if successful or AVERROR_xxx in case of an error.
+ */
+static int isom_read_avcc(AVFormatContext *s, uint8_t *extradata, int  extradata_size)
+{
+    int ret = 0;
+    uint8_t version, nal_length_size, nb_sps, nb_pps;
+    AVIOContext *pb;
+    RTCContext *rtc = s->priv_data;
+
+    if (!extradata || !extradata_size)
+        return 0;
+
+    /* Not H.264 ISOM format, may be annexb etc. */
+    if (extradata_size < 4 || extradata[0] != 1) {
+        if (!ff_avc_find_startcode(extradata, extradata + extradata_size)) {
+            av_log(s, AV_LOG_ERROR, "Format must be ISOM or annexb\n");
+            return AVERROR_INVALIDDATA;
+        }
+        return 0;
+    }
+
+    /* Parse the SPS/PPS in ISOM format in extradata. */
+    pb = avio_alloc_context(extradata, extradata_size, 0, NULL, NULL, NULL, NULL);
+    if (!pb)
+        return AVERROR(ENOMEM);
+
+    version = avio_r8(pb); /* version */
+    avio_r8(pb); /* avc profile */
+    avio_r8(pb); /* avc profile compat */
+    avio_r8(pb); /* avc level */
+    nal_length_size = avio_r8(pb); /* 6 bits reserved (111111) + 2 bits nal size length - 1 (11) */
+    nb_sps = avio_r8(pb); /* 3 bits reserved (111) + 5 bits number of sps */
+
+    if (version != 1) {
+        av_log(s, AV_LOG_ERROR, "Invalid version=%d\n", version);
+        ret = AVERROR_INVALIDDATA;
+        goto end;
+    }
+
+    rtc->avc_nal_length_size = (nal_length_size & 0x03) + 1;
+    if (rtc->avc_nal_length_size == 3) {
+        av_log(s, AV_LOG_ERROR, "Invalid nal length size=%d\n", rtc->avc_nal_length_size);
+        ret = AVERROR_INVALIDDATA;
+        goto end;
+    }
+
+    /* Read SPS */
+    nb_sps &= 0x1f;
+    if (nb_sps != 1 || avio_feof(pb)) {
+        av_log(s, AV_LOG_ERROR, "Invalid number of sps=%d, eof=%d\n", nb_sps, avio_feof(pb));
+        ret = AVERROR_INVALIDDATA;
+        goto end;
+    }
+
+    rtc->avc_sps_size = avio_rb16(pb); /* sps size */
+    if (rtc->avc_sps_size <= 0 || avio_feof(pb)) {
+        av_log(s, AV_LOG_ERROR, "Invalid sps size=%d, eof=%d\n", rtc->avc_sps_size, avio_feof(pb));
+        ret = AVERROR_INVALIDDATA;
+        goto end;
+    }
+
+    rtc->avc_sps = av_malloc(rtc->avc_sps_size);
+    if (!rtc->avc_sps) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    ret = avio_read(pb, rtc->avc_sps, rtc->avc_sps_size); /* sps */
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to read sps, size=%d\n", rtc->avc_sps_size);
+        goto end;
+    }
+
+    /* Read PPS */
+    nb_pps = avio_r8(pb); /* number of pps */
+    if (nb_pps != 1 || avio_feof(pb)) {
+        av_log(s, AV_LOG_ERROR, "Invalid number of pps=%d, eof=%d\n", nb_pps, avio_feof(pb));
+        ret = AVERROR_INVALIDDATA;
+        goto end;
+    }
+
+    rtc->avc_pps_size = avio_rb16(pb); /* pps size */
+    if (rtc->avc_pps_size <= 0 || avio_feof(pb)) {
+        av_log(s, AV_LOG_ERROR, "Invalid pps size=%d, eof=%d\n", rtc->avc_pps_size, avio_feof(pb));
+        ret = AVERROR_INVALIDDATA;
+        goto end;
+    }
+
+    rtc->avc_pps = av_malloc(rtc->avc_pps_size);
+    if (!rtc->avc_pps) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    ret = avio_read(pb, rtc->avc_pps, rtc->avc_pps_size); /* pps */
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to read pps, size=%d\n", rtc->avc_pps_size);
+        goto end;
+    }
+
+end:
+    avio_context_free(&pb);
+    return ret;
+}
+
+/**
+ * Parses video SPS/PPS from the extradata of codecpar and checks the codec.
+ * Currently only supports video(h264) and audio(opus). Note that only baseline
+ * and constrained baseline profiles of h264 are supported.
+ *
+ * If the profile is less than 0, the function considers the profile as baseline.
+ * It may need to parse the profile from SPS/PPS. This situation occurs when ingesting
+ * desktop and transcoding.
+ *
+ * @param s Pointer to the AVFormatContext
+ * @returns Returns 0 if successful or AVERROR_xxx in case of an error.
+ */
+static int parse_codec(AVFormatContext *s)
+{
+    int i, ret;
+    RTCContext *rtc = s->priv_data;
+
+    for (i = 0; i < s->nb_streams; i++) {
+        AVCodecParameters *par = s->streams[i]->codecpar;
+        const AVCodecDescriptor *desc = avcodec_descriptor_get(par->codec_id);
+        switch (par->codec_type) {
+        case AVMEDIA_TYPE_VIDEO:
+            if (rtc->video_par) {
+                av_log(s, AV_LOG_ERROR, "Only one video stream is supported by RTC\n");
+                return AVERROR(EINVAL);
+            }
+            rtc->video_par = par;
+
+            if (par->codec_id != AV_CODEC_ID_H264) {
+                av_log(s, AV_LOG_ERROR, "Unsupported video codec %s by RTC, choose h264\n",
+                       desc ? desc->name : "unknown");
+                return AVERROR_PATCHWELCOME;
+            }
+
+            if (par->video_delay > 0) {
+                av_log(s, AV_LOG_ERROR, "Unsupported B frames by RTC\n");
+                return AVERROR_PATCHWELCOME;
+            }
+
+            ret = isom_read_avcc(s, par->extradata, par->extradata_size);
+            if (ret < 0) {
+                av_log(s, AV_LOG_ERROR, "Failed to parse SPS/PPS from extradata\n");
+                return ret;
+            }
+            break;
+        case AVMEDIA_TYPE_AUDIO:
+            if (rtc->audio_par) {
+                av_log(s, AV_LOG_ERROR, "Only one audio stream is supported by RTC\n");
+                return AVERROR(EINVAL);
+            }
+            rtc->audio_par = par;
+
+            if (par->codec_id != AV_CODEC_ID_OPUS) {
+                av_log(s, AV_LOG_ERROR, "Unsupported audio codec %s by RTC, choose opus\n",
+                    desc ? desc->name : "unknown");
+                return AVERROR_PATCHWELCOME;
+            }
+
+            if (par->ch_layout.nb_channels != 2) {
+                av_log(s, AV_LOG_ERROR, "Unsupported audio channels %d by RTC, choose stereo\n",
+                    par->ch_layout.nb_channels);
+                return AVERROR_PATCHWELCOME;
+            }
+
+            if (par->sample_rate != 48000) {
+                av_log(s, AV_LOG_ERROR, "Unsupported audio sample rate %d by RTC, choose 48000\n", par->sample_rate);
+                return AVERROR_PATCHWELCOME;
+            }
+            break;
+        default:
+            av_log(s, AV_LOG_ERROR, "Codec type '%s' for stream %d is not supported by RTC\n",
+                   av_get_media_type_string(par->codec_type), i);
+            return AVERROR_PATCHWELCOME;
+        }
+    }
+
+    return 0;
+}
+
+/**
+ * Generate SDP offer according to the codec parameters, DTLS and ICE information.
+ * The below is an example of SDP offer:
+ *
+ *       v=0
+ *       o=FFmpeg 4489045141692799359 2 IN IP4 127.0.0.1
+ *       s=FFmpegPublishSession
+ *       t=0 0
+ *       a=group:BUNDLE 0 1
+ *       a=extmap-allow-mixed
+ *       a=msid-semantic: WMS
+ *
+ *       m=audio 9 UDP/TLS/RTP/SAVPF 111
+ *       c=IN IP4 0.0.0.0
+ *       a=ice-ufrag:a174B
+ *       a=ice-pwd:wY8rJ3gNLxL3eWZs6UPOxy
+ *       a=fingerprint:sha-256 EE:FE:A2:E5:6A:21:78:60:71:2C:21:DC:1A:2C:98:12:0C:E8:AD:68:07:61:1B:0E:FC:46:97:1E:BC:97:4A:54
+ *       a=setup:actpass
+ *       a=mid:0
+ *       a=sendonly
+ *       a=msid:FFmpeg audio
+ *       a=rtcp-mux
+ *       a=rtpmap:111 opus/48000/2
+ *       a=ssrc:4267647086 cname:FFmpeg
+ *       a=ssrc:4267647086 msid:FFmpeg audio
+ *
+ *       m=video 9 UDP/TLS/RTP/SAVPF 106
+ *       c=IN IP4 0.0.0.0
+ *       a=ice-ufrag:a174B
+ *       a=ice-pwd:wY8rJ3gNLxL3eWZs6UPOxy
+ *       a=fingerprint:sha-256 EE:FE:A2:E5:6A:21:78:60:71:2C:21:DC:1A:2C:98:12:0C:E8:AD:68:07:61:1B:0E:FC:46:97:1E:BC:97:4A:54
+ *       a=setup:actpass
+ *       a=mid:1
+ *       a=sendonly
+ *       a=msid:FFmpeg video
+ *       a=rtcp-mux
+ *       a=rtcp-rsize
+ *       a=rtpmap:106 H264/90000
+ *       a=fmtp:106 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
+ *       a=ssrc:107169110 cname:FFmpeg
+ *       a=ssrc:107169110 msid:FFmpeg video
+ *
+ * Note that we don't use av_sdp_create to generate SDP offer because it doesn't
+ * support DTLS and ICE information.
+ *
+ * @return 0 if OK, AVERROR_xxx on error
+ */
+static int generate_sdp_offer(AVFormatContext *s)
+{
+    int ret = 0, profile, level, profile_iop;
+    AVBPrint bp;
+    RTCContext *rtc = s->priv_data;
+
+    /* To prevent a crash during cleanup, always initialize it. */
+    av_bprint_init(&bp, 1, MAX_SDP_SIZE);
+
+    if (rtc->sdp_offer) {
+        av_log(s, AV_LOG_ERROR, "SDP offer is already set\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    snprintf(rtc->ice_ufrag_local, sizeof(rtc->ice_ufrag_local), "%08x",
+             av_get_random_seed());
+    snprintf(rtc->ice_pwd_local, sizeof(rtc->ice_pwd_local), "%08x%08x%08x%08x",
+             av_get_random_seed(), av_get_random_seed(), av_get_random_seed(),
+             av_get_random_seed());
+
+    rtc->audio_ssrc = av_get_random_seed();
+    rtc->video_ssrc = av_get_random_seed();
+
+    rtc->audio_payload_type = 111;
+    rtc->video_payload_type = 106;
+
+    av_bprintf(&bp, ""
+        "v=0\r\n"
+        "o=FFmpeg 4489045141692799359 2 IN IP4 127.0.0.1\r\n"
+        "s=FFmpegPublishSession\r\n"
+        "t=0 0\r\n"
+        "a=group:BUNDLE 0 1\r\n"
+        "a=extmap-allow-mixed\r\n"
+        "a=msid-semantic: WMS\r\n");
+    if (!av_bprint_is_complete(&bp)) {
+        av_log(s, AV_LOG_ERROR, "Offer %d exceed max %d, %s\n", ret, MAX_SDP_SIZE, bp.str);
+        ret = AVERROR(EIO);
+        goto end;
+    }
+
+    if (rtc->audio_par) {
+        av_bprintf(&bp, ""
+            "m=audio 9 UDP/TLS/RTP/SAVPF %u\r\n"
+            "c=IN IP4 0.0.0.0\r\n"
+            "a=ice-ufrag:%s\r\n"
+            "a=ice-pwd:%s\r\n"
+            "a=fingerprint:sha-256 %s\r\n"
+            "a=setup:active\r\n"
+            "a=mid:0\r\n"
+            "a=sendonly\r\n"
+            "a=msid:FFmpeg audio\r\n"
+            "a=rtcp-mux\r\n"
+            "a=rtpmap:%u opus/%d/%d\r\n"
+            "a=ssrc:%u cname:FFmpeg\r\n"
+            "a=ssrc:%u msid:FFmpeg audio\r\n",
+            rtc->audio_payload_type,
+            rtc->ice_ufrag_local,
+            rtc->ice_pwd_local,
+            rtc->dtls_ctx.dtls_fingerprint,
+            rtc->audio_payload_type,
+            rtc->audio_par->sample_rate,
+            rtc->audio_par->ch_layout.nb_channels,
+            rtc->audio_ssrc,
+            rtc->audio_ssrc);
+        if (!av_bprint_is_complete(&bp)) {
+            av_log(s, AV_LOG_ERROR, "Offer %d exceed max %d, %s\n", ret, MAX_SDP_SIZE, bp.str);
+            ret = AVERROR(EIO);
+            goto end;
+        }
+    }
+
+    if (rtc->video_par) {
+        profile = rtc->video_par->profile < 0 ? 0x42 : rtc->video_par->profile;
+        level = rtc->video_par->level < 0 ? 30 : rtc->video_par->level;
+        profile_iop = profile & FF_PROFILE_H264_CONSTRAINED;
+        av_bprintf(&bp, ""
+            "m=video 9 UDP/TLS/RTP/SAVPF %u\r\n"
+            "c=IN IP4 0.0.0.0\r\n"
+            "a=ice-ufrag:%s\r\n"
+            "a=ice-pwd:%s\r\n"
+            "a=fingerprint:sha-256 %s\r\n"
+            "a=setup:active\r\n"
+            "a=mid:1\r\n"
+            "a=sendonly\r\n"
+            "a=msid:FFmpeg video\r\n"
+            "a=rtcp-mux\r\n"
+            "a=rtcp-rsize\r\n"
+            "a=rtpmap:%u H264/90000\r\n"
+            "a=fmtp:%u level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=%02x%02x%02x\r\n"
+            "a=ssrc:%u cname:FFmpeg\r\n"
+            "a=ssrc:%u msid:FFmpeg video\r\n",
+            rtc->video_payload_type,
+            rtc->ice_ufrag_local,
+            rtc->ice_pwd_local,
+            rtc->dtls_ctx.dtls_fingerprint,
+            rtc->video_payload_type,
+            rtc->video_payload_type,
+            profile & (~FF_PROFILE_H264_CONSTRAINED),
+            profile_iop,
+            level,
+            rtc->video_ssrc,
+            rtc->video_ssrc);
+        if (!av_bprint_is_complete(&bp)) {
+            av_log(s, AV_LOG_ERROR, "Offer %d exceed max %d, %s\n", ret, MAX_SDP_SIZE, bp.str);
+            ret = AVERROR(EIO);
+            goto end;
+        }
+    }
+
+    rtc->sdp_offer = av_strdup(bp.str);
+    if (!rtc->sdp_offer) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+    av_log(s, AV_LOG_VERBOSE, "WHIP: Generated offer: %s\n", rtc->sdp_offer);
+
+end:
+    av_bprint_finalize(&bp, NULL);
+    return ret;
+}
+
+/**
+ * Exchange SDP offer with WebRTC peer to get the answer.
+ * The below is an example of SDP answer:
+ *
+ *       v=0
+ *       o=SRS/6.0.42(Bee) 107408542208384 2 IN IP4 0.0.0.0
+ *       s=SRSPublishSession
+ *       t=0 0
+ *       a=ice-lite
+ *       a=group:BUNDLE 0 1
+ *       a=msid-semantic: WMS live/show
+ *
+ *       m=audio 9 UDP/TLS/RTP/SAVPF 111
+ *       c=IN IP4 0.0.0.0
+ *       a=ice-ufrag:ex9061f9
+ *       a=ice-pwd:bi8k19m9n836187b00d1gm3946234w85
+ *       a=fingerprint:sha-256 68:DD:7A:95:27:BD:0A:99:F4:7A:83:21:2F:50:15:2A:1D:1F:8A:D8:96:24:42:2D:A1:83:99:BF:F1:E2:11:A2
+ *       a=setup:passive
+ *       a=mid:0
+ *       a=recvonly
+ *       a=rtcp-mux
+ *       a=rtcp-rsize
+ *       a=rtpmap:111 opus/48000/2
+ *       a=candidate:0 1 udp 2130706431 172.20.10.7 8000 typ host generation 0
+ *
+ *       m=video 9 UDP/TLS/RTP/SAVPF 106
+ *       c=IN IP4 0.0.0.0
+ *       a=ice-ufrag:ex9061f9
+ *       a=ice-pwd:bi8k19m9n836187b00d1gm3946234w85
+ *       a=fingerprint:sha-256 68:DD:7A:95:27:BD:0A:99:F4:7A:83:21:2F:50:15:2A:1D:1F:8A:D8:96:24:42:2D:A1:83:99:BF:F1:E2:11:A2
+ *       a=setup:passive
+ *       a=mid:1
+ *       a=recvonly
+ *       a=rtcp-mux
+ *       a=rtcp-rsize
+ *       a=rtpmap:106 H264/90000
+ *       a=fmtp:106 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01e
+ *       a=candidate:0 1 udp 2130706431 172.20.10.7 8000 typ host generation 0
+ *
+ * @return 0 if OK, AVERROR_xxx on error
+ */
+static int exchange_sdp(AVFormatContext *s)
+{
+    int ret;
+    char buf[MAX_URL_SIZE];
+    AVBPrint bp;
+    RTCContext *rtc = s->priv_data;
+    /* The URL context is an HTTP transport layer for the WHIP protocol. */
+    URLContext *whip_uc = NULL;
+
+    /* To prevent a crash during cleanup, always initialize it. */
+    av_bprint_init(&bp, 1, MAX_SDP_SIZE);
+
+    ret = ffurl_alloc(&whip_uc, s->url, AVIO_FLAG_READ_WRITE, &s->interrupt_callback);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to alloc HTTP context: %s\n", s->url);
+        goto end;
+    }
+
+    if (!rtc->sdp_offer || !strlen(rtc->sdp_offer)) {
+        av_log(s, AV_LOG_ERROR, "No offer to exchange\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    snprintf(buf, sizeof(buf),
+             "Cache-Control: no-cache\r\n"
+             "Content-Type: application/sdp\r\n");
+    av_opt_set(whip_uc->priv_data, "headers", buf, 0);
+    av_opt_set(whip_uc->priv_data, "chunked_post", "0", 0);
+    av_opt_set_bin(whip_uc->priv_data, "post_data", rtc->sdp_offer, (int)strlen(rtc->sdp_offer), 0);
+
+    ret = ffurl_connect(whip_uc, NULL);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to request url=%s, offer: %s\n", s->url, rtc->sdp_offer);
+        goto end;
+    }
+
+    if (ff_http_get_new_location(whip_uc)) {
+        rtc->whip_resource_url = av_strdup(ff_http_get_new_location(whip_uc));
+        if (!rtc->whip_resource_url) {
+            ret = AVERROR(ENOMEM);
+            goto end;
+        }
+    }
+
+    while (1) {
+        ret = ffurl_read(whip_uc, buf, sizeof(buf));
+        if (ret == AVERROR_EOF) {
+            /* Reset the error because we read all response as answer util EOF. */
+            ret = 0;
+            break;
+        }
+        if (ret <= 0) {
+            av_log(s, AV_LOG_ERROR, "Failed to read response from url=%s, offer is %s, answer is %s\n",
+                s->url, rtc->sdp_offer, rtc->sdp_answer);
+            goto end;
+        }
+
+        av_bprintf(&bp, "%.*s", ret, buf);
+        if (!av_bprint_is_complete(&bp)) {
+            av_log(s, AV_LOG_ERROR, "Answer %d exceed max size %d, %s\n", ret, MAX_SDP_SIZE, bp.str);
+            ret = AVERROR(EIO);
+            goto end;
+        }
+    }
+
+    rtc->sdp_answer = av_strdup(bp.str);
+    if (!rtc->sdp_answer) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+    av_log(s, AV_LOG_VERBOSE, "WHIP: Got answer: %s\n", rtc->sdp_answer);
+
+end:
+    ffurl_closep(&whip_uc);
+    av_bprint_finalize(&bp, NULL);
+    return ret;
+}
+
+/**
+ * Parses the ICE ufrag, pwd, and candidates from the SDP answer.
+ *
+ * This function is used to extract the ICE ufrag, pwd, and candidates from the SDP answer.
+ * It returns an error if any of these fields is NULL. The function only uses the first
+ * candidate if there are multiple candidates. However, support for multiple candidates
+ * will be added in the future.
+ *
+ * @param s Pointer to the AVFormatContext
+ * @returns Returns 0 if successful or AVERROR_xxx if an error occurs.
+ */
+static int parse_answer(AVFormatContext *s)
+{
+    int ret = 0;
+    AVIOContext *pb;
+    char line[MAX_URL_SIZE];
+    const char *ptr;
+    int i;
+    RTCContext *rtc = s->priv_data;
+
+    if (!rtc->sdp_answer || !strlen(rtc->sdp_answer)) {
+        av_log(s, AV_LOG_ERROR, "No answer to parse\n");
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    pb = avio_alloc_context(rtc->sdp_answer, strlen(rtc->sdp_answer), 0, NULL, NULL, NULL, NULL);
+    if (!pb)
+        return AVERROR(ENOMEM);
+
+    for (i = 0; !avio_feof(pb); i++) {
+        ff_get_chomp_line(pb, line, sizeof(line));
+        if (av_strstart(line, "a=ice-ufrag:", &ptr) && !rtc->ice_ufrag_remote) {
+            rtc->ice_ufrag_remote = av_strdup(ptr);
+            if (!rtc->ice_ufrag_remote) {
+                ret = AVERROR(ENOMEM);
+                goto end;
+            }
+        } else if (av_strstart(line, "a=ice-pwd:", &ptr) && !rtc->ice_pwd_remote) {
+            rtc->ice_pwd_remote = av_strdup(ptr);
+            if (!rtc->ice_pwd_remote) {
+                ret = AVERROR(ENOMEM);
+                goto end;
+            }
+        } else if (av_strstart(line, "a=candidate:", &ptr) && !rtc->ice_protocol) {
+            ptr = av_stristr(ptr, "udp");
+            if (ptr && av_stristr(ptr, "host")) {
+                char protocol[17], host[129];
+                int priority, port;
+                ret = sscanf(ptr, "%16s %d %128s %d typ host", protocol, &priority, host, &port);
+                if (ret != 4) {
+                    av_log(s, AV_LOG_ERROR, "Failed %d to parse line %d %s from %s\n",
+                        ret, i, line, rtc->sdp_answer);
+                    ret = AVERROR(EIO);
+                    goto end;
+                }
+
+                if (av_strcasecmp(protocol, "udp")) {
+                    av_log(s, AV_LOG_ERROR, "Protocol %s is not supported by RTC, choose udp, line %d %s of %s\n",
+                        protocol, i, line, rtc->sdp_answer);
+                    ret = AVERROR(EIO);
+                    goto end;
+                }
+
+                rtc->ice_protocol = av_strdup(protocol);
+                rtc->ice_host = av_strdup(host);
+                rtc->ice_port = port;
+                if (!rtc->ice_protocol || !rtc->ice_host) {
+                    ret = AVERROR(ENOMEM);
+                    goto end;
+                }
+            }
+        }
+    }
+
+    if (!rtc->ice_pwd_remote || !strlen(rtc->ice_pwd_remote)) {
+        av_log(s, AV_LOG_ERROR, "No remote ice pwd parsed from %s\n", rtc->sdp_answer);
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    if (!rtc->ice_ufrag_remote || !strlen(rtc->ice_ufrag_remote)) {
+        av_log(s, AV_LOG_ERROR, "No remote ice ufrag parsed from %s\n", rtc->sdp_answer);
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    if (!rtc->ice_protocol || !rtc->ice_host || !rtc->ice_port) {
+        av_log(s, AV_LOG_ERROR, "No ice candidate parsed from %s\n", rtc->sdp_answer);
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    av_log(s, AV_LOG_INFO, "WHIP: SDP offer=%luB, answer=%luB, ufrag=%s, pwd=%luB, transport=%s://%s:%d\n",
+        strlen(rtc->sdp_offer), strlen(rtc->sdp_answer), rtc->ice_ufrag_remote, strlen(rtc->ice_pwd_remote),
+        rtc->ice_protocol, rtc->ice_host, rtc->ice_port);
+
+end:
+    avio_context_free(&pb);
+    return ret;
+}
+
+/**
+ * Creates and marshals an ICE binding request packet.
+ *
+ * This function creates and marshals an ICE binding request packet. The function only
+ * generates the username attribute and does not include goog-network-info, ice-controlling,
+ * use-candidate, and priority. However, some of these attributes may be added in the future.
+ *
+ * @param s Pointer to the AVFormatContext
+ * @param buf Pointer to memory buffer to store the request packet
+ * @param buf_size Size of the memory buffer
+ * @param request_size Pointer to an integer that receives the size of the request packet
+ * @return Returns 0 if successful or AVERROR_xxx if an error occurs.
+ */
+static int ice_create_request(AVFormatContext *s, uint8_t *buf, int buf_size, int *request_size)
+{
+    int ret, size, crc32;
+    char username[128];
+    AVIOContext *pb = NULL;
+    AVHMAC *hmac = NULL;
+    RTCContext *rtc = s->priv_data;
+
+    pb = avio_alloc_context(buf, buf_size, 1, NULL, NULL, NULL, NULL);
+    if (!pb)
+        return AVERROR(ENOMEM);
+
+    hmac = av_hmac_alloc(AV_HMAC_SHA1);
+    if (!hmac) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    /* Write 20 bytes header */
+    avio_wb16(pb, 0x0001); /* STUN binding request */
+    avio_wb16(pb, 0);      /* length */
+    avio_wb32(pb, STUN_MAGIC_COOKIE); /* magic cookie */
+    avio_wb32(pb, av_get_random_seed()); /* transaction ID */
+    avio_wb32(pb, av_get_random_seed()); /* transaction ID */
+    avio_wb32(pb, av_get_random_seed()); /* transaction ID */
+
+    /* The username is the concatenation of the two ICE ufrag */
+    ret = snprintf(username, sizeof(username), "%s:%s", rtc->ice_ufrag_remote, rtc->ice_ufrag_local);
+    if (ret <= 0 || ret >= sizeof(username)) {
+        av_log(s, AV_LOG_ERROR, "Failed to build username %s:%s, max=%lu, ret=%d\n",
+            rtc->ice_ufrag_remote, rtc->ice_ufrag_local, sizeof(username), ret);
+        ret = AVERROR(EIO);
+        goto end;
+    }
+
+    /* Write the username attribute */
+    avio_wb16(pb, STUN_ATTR_USERNAME); /* attribute type username */
+    avio_wb16(pb, ret); /* size of username */
+    avio_write(pb, username, ret); /* bytes of username */
+    ffio_fill(pb, 0, (4 - (ret % 4)) % 4); /* padding */
+
+    /* Write the use-candidate attribute */
+    avio_wb16(pb, STUN_ATTR_USE_CANDIDATE); /* attribute type use-candidate */
+    avio_wb16(pb, 0); /* size of use-candidate */
+
+    /* Build and update message integrity */
+    avio_wb16(pb, STUN_ATTR_MESSAGE_INTEGRITY); /* attribute type message integrity */
+    avio_wb16(pb, 20); /* size of message integrity */
+    ffio_fill(pb, 0, 20); /* fill with zero to directly write and skip it */
+    size = avio_tell(pb);
+    buf[2] = (size - 20) >> 8;
+    buf[3] = (size - 20) & 0xFF;
+    av_hmac_init(hmac, rtc->ice_pwd_remote, strlen(rtc->ice_pwd_remote));
+    av_hmac_update(hmac, buf, size - 24);
+    av_hmac_final(hmac, buf + size - 20, 20);
+
+    /* Write the fingerprint attribute */
+    avio_wb16(pb, STUN_ATTR_FINGERPRINT); /* attribute type fingerprint */
+    avio_wb16(pb, 4); /* size of fingerprint */
+    ffio_fill(pb, 0, 4); /* fill with zero to directly write and skip it */
+    size = avio_tell(pb);
+    buf[2] = (size - 20) >> 8;
+    buf[3] = (size - 20) & 0xFF;
+    /* Refer to the av_hash_alloc("CRC32"), av_hash_init and av_hash_final */
+    crc32 = av_crc(av_crc_get_table(AV_CRC_32_IEEE_LE), 0xFFFFFFFF, buf, size - 8) ^ 0xFFFFFFFF;
+    avio_skip(pb, -4);
+    avio_wb32(pb, crc32 ^ 0x5354554E); /* xor with "STUN" */
+
+    *request_size = size;
+
+end:
+    avio_context_free(&pb);
+    av_hmac_free(hmac);
+    return ret;
+}
+
+/**
+ * Create an ICE binding response.
+ *
+ * This function generates an ICE binding response and writes it to the provided
+ * buffer. The response is signed using the local password for message integrity.
+ *
+ * @param s Pointer to the AVFormatContext structure.
+ * @param tid Pointer to the transaction ID of the binding request. The tid_size should be 12.
+ * @param tid_size The size of the transaction ID, should be 12.
+ * @param buf Pointer to the buffer where the response will be written.
+ * @param buf_size The size of the buffer provided for the response.
+ * @param response_size Pointer to an integer that will store the size of the generated response.
+ * @return Returns 0 if successful or AVERROR_xxx if an error occurs.
+ */
+static int ice_create_response(AVFormatContext *s, char *tid, int tid_size, uint8_t *buf, int buf_size, int *response_size) {
+    int ret = 0, size, crc32;
+    AVIOContext *pb = NULL;
+    AVHMAC *hmac = NULL;
+    RTCContext *rtc = s->priv_data;
+
+    if (tid_size != 12) {
+        av_log(s, AV_LOG_ERROR, "Invalid transaction ID size. Expected 12, got %d\n", tid_size);
+        return AVERROR(EINVAL);
+    }
+
+    pb = avio_alloc_context(buf, buf_size, 1, NULL, NULL, NULL, NULL);
+    if (!pb)
+        return AVERROR(ENOMEM);
+
+    hmac = av_hmac_alloc(AV_HMAC_SHA1);
+    if (!hmac) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    /* Write 20 bytes header */
+    avio_wb16(pb, 0x0101); /* STUN binding response */
+    avio_wb16(pb, 0);      /* length */
+    avio_wb32(pb, STUN_MAGIC_COOKIE); /* magic cookie */
+    avio_write(pb, tid, tid_size); /* transaction ID */
+
+    /* Build and update message integrity */
+    avio_wb16(pb, STUN_ATTR_MESSAGE_INTEGRITY); /* attribute type message integrity */
+    avio_wb16(pb, 20); /* size of message integrity */
+    ffio_fill(pb, 0, 20); /* fill with zero to directly write and skip it */
+    size = avio_tell(pb);
+    buf[2] = (size - 20) >> 8;
+    buf[3] = (size - 20) & 0xFF;
+    av_hmac_init(hmac, rtc->ice_pwd_local, strlen(rtc->ice_pwd_local));
+    av_hmac_update(hmac, buf, size - 24);
+    av_hmac_final(hmac, buf + size - 20, 20);
+
+    /* Write the fingerprint attribute */
+    avio_wb16(pb, STUN_ATTR_FINGERPRINT); /* attribute type fingerprint */
+    avio_wb16(pb, 4); /* size of fingerprint */
+    ffio_fill(pb, 0, 4); /* fill with zero to directly write and skip it */
+    size = avio_tell(pb);
+    buf[2] = (size - 20) >> 8;
+    buf[3] = (size - 20) & 0xFF;
+    /* Refer to the av_hash_alloc("CRC32"), av_hash_init and av_hash_final */
+    crc32 = av_crc(av_crc_get_table(AV_CRC_32_IEEE_LE), 0xFFFFFFFF, buf, size - 8) ^ 0xFFFFFFFF;
+    avio_skip(pb, -4);
+    avio_wb32(pb, crc32 ^ 0x5354554E); /* xor with "STUN" */
+
+    *response_size = size;
+
+end:
+    avio_context_free(&pb);
+    av_hmac_free(hmac);
+    return ret;
+}
+
+static int ice_is_binding_request(char *buf, int buf_size) {
+    return buf_size > 1 && buf[0] == 0x00 && buf[1] == 0x01;
+}
+
+static int ice_is_binding_response(char *buf, int buf_size) {
+    return buf_size > 1 && buf[0] == 0x01 && buf[1] == 0x01;
+}
+
+/**
+ * This function handles incoming binding request messages by responding to them.
+ * If the message is not a binding request, it will be ignored.
+ */
+static int ice_handle_binding_request(AVFormatContext *s, char *buf, int buf_size) {
+    int ret = 0, size;
+    char tid[12];
+    uint8_t res_buf[MAX_UDP_BUFFER_SIZE];
+    RTCContext *rtc = s->priv_data;
+
+    /* Ignore if not a binding request. */
+    if (!ice_is_binding_request(buf, buf_size))
+        return ret;
+
+    if (buf_size < 20) {
+        av_log(s, AV_LOG_ERROR, "Invalid STUN message size. Expected at least 20, got %d\n", buf_size);
+        return AVERROR(EINVAL);
+    }
+
+    /* Parse transaction id from binding request in buf. */
+    memcpy(tid, buf + 8, 12);
+
+    /* Build the STUN binding response. */
+    ret = ice_create_response(s, tid, sizeof(tid), res_buf, sizeof(res_buf), &size);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to create STUN binding response, size=%d\n", size);
+        return ret;
+    }
+
+    ret = ffurl_write(rtc->udp_uc, res_buf, size);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to send STUN binding response, size=%d\n", size);
+        return ret;
+    }
+
+    return 0;
+}
+
+/**
+ * Opens the UDP transport and completes the ICE handshake, using fast retransmit to
+ * handle packet loss for the binding request.
+ *
+ * To initiate a fast retransmission of the STUN binding request during ICE, we wait only
+ * for a successful local ICE process i.e., when a binding response is received from the
+ * server. Since the server's binding request may not arrive, we do not always wait for it.
+ * However, we will always respond to the server's binding request during ICE, DTLS or
+ * RTP streaming.
+ *
+ * @param s Pointer to the AVFormatContext
+ * @return Returns 0 if the handshake was successful or AVERROR_xxx in case of an error
+ */
+static int ice_handshake(AVFormatContext *s)
+{
+    int ret, size;
+    char url[256], tmp[16];
+    char req_buf[MAX_UDP_BUFFER_SIZE], res_buf[MAX_UDP_BUFFER_SIZE];
+    RTCContext *rtc = s->priv_data;
+    int fast_retries = rtc->ice_arq_max, timeout = rtc->ice_arq_timeout;
+
+    /* Build UDP URL and create the UDP context as transport. */
+    ff_url_join(url, sizeof(url), "udp", NULL, rtc->ice_host, rtc->ice_port, NULL);
+    ret = ffurl_alloc(&rtc->udp_uc, url, AVIO_FLAG_WRITE, &s->interrupt_callback);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to open udp://%s:%d\n", rtc->ice_host, rtc->ice_port);
+        goto end;
+    }
+
+    av_opt_set(rtc->udp_uc->priv_data, "connect", "1", 0);
+    av_opt_set(rtc->udp_uc->priv_data, "fifo_size", "0", 0);
+    /* Set the max packet size to the buffer size. */
+    snprintf(tmp, sizeof(tmp), "%d", rtc->pkt_size);
+    av_opt_set(rtc->udp_uc->priv_data, "pkt_size", tmp, 0);
+
+    ret = ffurl_connect(rtc->udp_uc, NULL);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to connect udp://%s:%d\n", rtc->ice_host, rtc->ice_port);
+        goto end;
+    }
+
+    /* Make the socket non-blocking, set to READ and WRITE mode after connected */
+    ff_socket_nonblock(ffurl_get_file_handle(rtc->udp_uc), 1);
+    rtc->udp_uc->flags |= AVIO_FLAG_READ | AVIO_FLAG_NONBLOCK;
+
+    /* Build the STUN binding request. */
+    ret = ice_create_request(s, req_buf, sizeof(req_buf), &size);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to create STUN binding request, size=%d\n", size);
+        goto end;
+    }
+
+    /* Fast retransmit the STUN binding request. */
+    while (1) {
+        ret = ffurl_write(rtc->udp_uc, req_buf, size);
+        if (ret < 0) {
+            av_log(s, AV_LOG_ERROR, "Failed to send STUN binding request, size=%d\n", size);
+            goto end;
+        }
+
+        /* Wait so that the server can process the request and no need ARQ then. */
+#if ICE_PROCESSING_TIMEOUT > 0
+        av_usleep(ICE_PROCESSING_TIMEOUT * 10000);
+#endif
+
+        /* Read the STUN binding response. */
+        ret = ffurl_read(rtc->udp_uc, res_buf, sizeof(res_buf));
+        if (ret < 0) {
+            /* If max retries is 6 and start timeout is 21ms, the total timeout
+             * is about 21 + 42 + 84 + 168 + 336 + 672 = 1263ms. */
+            av_usleep(timeout * 1000);
+            timeout *= 2;
+
+            if (ret == AVERROR(EAGAIN) && fast_retries) {
+                fast_retries--;
+                continue;
+            }
+
+            av_log(s, AV_LOG_ERROR, "Failed to read STUN binding response, retries=%d\n", rtc->ice_arq_max);
+            goto end;
+        }
+
+        /* If got any binding response, the fast retransmission is done. */
+        if (ice_is_binding_response(res_buf, ret))
+            break;
+
+        /* When a binding request is received, it is necessary to respond immediately. */
+        if (ice_is_binding_request(res_buf, ret) && (ret = ice_handle_binding_request(s, res_buf, ret)) < 0)
+            goto end;
+    }
+
+    /* Wait just for a small while to get the possible binding request from server. */
+    fast_retries = rtc->ice_arq_max / 2;
+    timeout = rtc->ice_arq_timeout;
+    while (fast_retries) {
+        ret = ffurl_read(rtc->udp_uc, res_buf, sizeof(res_buf));
+        if (ret < 0) {
+            /* If max retries is 6 and start timeout is 21ms, the total timeout
+             * is about 21 + 42 + 84 = 147ms. */
+            av_usleep(timeout * 1000);
+            timeout *= 2;
+
+            if (ret == AVERROR(EAGAIN)) {
+                fast_retries--;
+                continue;
+            }
+
+            av_log(s, AV_LOG_ERROR, "Failed to read STUN binding request, retries=%d\n", rtc->ice_arq_max);
+            goto end;
+        }
+
+        /* When a binding request is received, it is necessary to respond immediately. */
+        if (ice_is_binding_request(res_buf, ret)) {
+            if ((ret = ice_handle_binding_request(s, res_buf, ret)) < 0)
+                goto end;
+            break;
+        }
+    }
+
+    av_log(s, AV_LOG_INFO, "WHIP: ICE STUN ok, url=udp://%s:%d, location=%s, username=%s:%s, req=%dB, res=%dB, arq=%d\n",
+        rtc->ice_host, rtc->ice_port, rtc->whip_resource_url ? rtc->whip_resource_url : "",
+        rtc->ice_ufrag_remote, rtc->ice_ufrag_local, size, ret,
+        rtc->ice_arq_max - fast_retries);
+    ret = 0;
+
+end:
+    return ret;
+}
+
+/**
+ * Establish the SRTP context using the keying material exported from DTLS.
+ *
+ * Create separate SRTP contexts for sending video and audio, as their sequences differ
+ * and should not share a single context. Generate a single SRTP context for receiving
+ * RTCP only.
+ *
+ * @return 0 if OK, AVERROR_xxx on error
+ */
+static int setup_srtp(AVFormatContext *s)
+{
+    int ret;
+    char recv_key[DTLS_SRTP_MASTER_KEY_LEN], send_key[DTLS_SRTP_MASTER_KEY_LEN];
+    char buf[AV_BASE64_SIZE(DTLS_SRTP_MASTER_KEY_LEN)];
+    const char* suite = "AES_CM_128_HMAC_SHA1_80";
+    RTCContext *rtc = s->priv_data;
+
+    /* As DTLS client, the send key is client master key plus salt. */
+    memcpy(send_key, rtc->dtls_ctx.dtls_srtp_material, 16);
+    memcpy(send_key + 16, rtc->dtls_ctx.dtls_srtp_material + 32, 14);
+
+    /* As DTLS client, the recv key is server master key plus salt. */
+    memcpy(recv_key, rtc->dtls_ctx.dtls_srtp_material + 16, 16);
+    memcpy(recv_key + 16, rtc->dtls_ctx.dtls_srtp_material + 46, 14);
+
+    /* Setup SRTP context for outgoing packets */
+    if (!av_base64_encode(buf, sizeof(buf), send_key, sizeof(send_key))) {
+        av_log(s, AV_LOG_ERROR, "Failed to encode send key\n");
+        ret = AVERROR(EIO);
+        goto end;
+    }
+
+    ret = ff_srtp_set_crypto(&rtc->srtp_audio_send, suite, buf);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to set crypto for audio send\n");
+        goto end;
+    }
+
+    ret = ff_srtp_set_crypto(&rtc->srtp_video_send, suite, buf);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to set crypto for video send\n");
+        goto end;
+    }
+
+    ret = ff_srtp_set_crypto(&rtc->srtp_rtcp_send, suite, buf);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to set crypto for rtcp send\n");
+        goto end;
+    }
+
+    /* Setup SRTP context for incoming packets */
+    if (!av_base64_encode(buf, sizeof(buf), recv_key, sizeof(recv_key))) {
+        av_log(s, AV_LOG_ERROR, "Failed to encode recv key\n");
+        ret = AVERROR(EIO);
+        goto end;
+    }
+
+    ret = ff_srtp_set_crypto(&rtc->srtp_recv, suite, buf);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to set crypto for recv\n");
+        goto end;
+    }
+
+    av_log(s, AV_LOG_INFO, "WHIP: SRTP setup done, suite=%s, key=%luB\n", suite, sizeof(send_key));
+
+end:
+    return ret;
+}
+
+/**
+ * Creates dedicated RTP muxers for each stream in the AVFormatContext to build RTP
+ * packets from the encoded frames.
+ *
+ * The corresponding SRTP context is utilized to encrypt each stream's RTP packets. For
+ * example, a video SRTP context is used for the video stream. Additionally, the
+ * "on_rtp_write_packet" callback function is set as the write function for each RTP
+ * muxer to send out encrypted RTP packets.
+ *
+ * @return 0 if OK, AVERROR_xxx on error
+ */
+static int create_rtp_muxer(AVFormatContext *s)
+{
+    int ret, i, is_video, buffer_size, max_packet_size;
+    AVFormatContext *rtp_ctx = NULL;
+    AVDictionary *opts = NULL;
+    uint8_t *buffer = NULL;
+    char buf[64];
+    RTCContext *rtc = s->priv_data;
+
+    const AVOutputFormat *rtp_format = av_guess_format("rtp", NULL, NULL);
+    if (!rtp_format) {
+        av_log(s, AV_LOG_ERROR, "Failed to guess rtp muxer\n");
+        ret = AVERROR(ENOSYS);
+        goto end;
+    }
+
+    /* The UDP buffer size, may greater than MTU. */
+    buffer_size = MAX_UDP_BUFFER_SIZE;
+    /* The RTP payload max size. Reserved some bytes for SRTP checksum and padding. */
+    max_packet_size = rtc->pkt_size - DTLS_SRTP_CHECKSUM_LEN;
+
+    for (i = 0; i < s->nb_streams; i++) {
+        rtp_ctx = avformat_alloc_context();
+        if (!rtp_ctx) {
+            ret = AVERROR(ENOMEM);
+            goto end;
+        }
+
+        rtp_ctx->oformat = rtp_format;
+        if (!avformat_new_stream(rtp_ctx, NULL)) {
+            ret = AVERROR(ENOMEM);
+            goto end;
+        }
+        /* Pass the interrupt callback on */
+        rtp_ctx->interrupt_callback = s->interrupt_callback;
+        /* Copy the max delay setting; the rtp muxer reads this. */
+        rtp_ctx->max_delay = s->max_delay;
+        /* Copy other stream parameters. */
+        rtp_ctx->streams[0]->sample_aspect_ratio = s->streams[i]->sample_aspect_ratio;
+        rtp_ctx->flags |= s->flags & AVFMT_FLAG_BITEXACT;
+        rtp_ctx->strict_std_compliance = s->strict_std_compliance;
+
+        /* Set the synchronized start time. */
+        rtp_ctx->start_time_realtime = s->start_time_realtime;
+
+        avcodec_parameters_copy(rtp_ctx->streams[0]->codecpar, s->streams[i]->codecpar);
+        rtp_ctx->streams[0]->time_base = s->streams[i]->time_base;
+
+        buffer = av_malloc(buffer_size);
+        if (!buffer) {
+            ret = AVERROR(ENOMEM);
+            goto end;
+        }
+
+        rtp_ctx->pb = avio_alloc_context(buffer, buffer_size, 1, s, NULL, on_rtp_write_packet, NULL);
+        if (!rtp_ctx->pb) {
+            ret = AVERROR(ENOMEM);
+            goto end;
+        }
+        rtp_ctx->pb->max_packet_size = max_packet_size;
+        rtp_ctx->pb->av_class = &ff_avio_class;
+
+        is_video = s->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO;
+        snprintf(buf, sizeof(buf), "%d", is_video? rtc->video_payload_type : rtc->audio_payload_type);
+        av_dict_set(&opts, "payload_type", buf, 0);
+        snprintf(buf, sizeof(buf), "%d", is_video? rtc->video_ssrc : rtc->audio_ssrc);
+        av_dict_set(&opts, "ssrc", buf, 0);
+
+        ret = avformat_write_header(rtp_ctx, &opts);
+        if (ret < 0) {
+            av_log(s, AV_LOG_ERROR, "Failed to write rtp header\n");
+            goto end;
+        }
+
+        ff_format_set_url(rtp_ctx, av_strdup(s->url));
+        s->streams[i]->time_base = rtp_ctx->streams[0]->time_base;
+        s->streams[i]->priv_data = rtp_ctx;
+        rtp_ctx = NULL;
+    }
+
+    av_log(s, AV_LOG_INFO, "WHIP: Create RTP muxer OK, buffer_size=%d, max_packet_size=%d\n",
+        buffer_size, max_packet_size);
+
+end:
+    if (rtp_ctx)
+        avio_context_free(&rtp_ctx->pb);
+    avformat_free_context(rtp_ctx);
+    av_dict_free(&opts);
+    return ret;
+}
+
+/**
+ * Callback triggered by the RTP muxer when it creates and sends out an RTP packet.
+ *
+ * This function modifies the video STAP packet, removing the markers, and updating the
+ * NRI of the first NALU. Additionally, it uses the corresponding SRTP context to encrypt
+ * the RTP packet, where the video packet is handled by the video SRTP context.
+ */
+static int on_rtp_write_packet(void *opaque, uint8_t *buf, int buf_size)
+{
+    int ret, cipher_size, is_rtcp, is_video;
+    uint8_t payload_type, nalu_header;
+    char cipher[MAX_UDP_BUFFER_SIZE];
+    AVFormatContext *s = opaque;
+    RTCContext *rtc = s->priv_data;
+    struct SRTPContext *srtp;
+
+    /* Ignore if not RTP or RTCP packet. */
+    if (buf_size < 12 || (buf[0] & 0xC0) != 0x80)
+        return 0;
+
+    /* Only support audio, video and rtcp. */
+    is_rtcp = buf[1] >= 192 && buf[1] <= 223;
+    payload_type = buf[1] & 0x7f;
+    is_video = payload_type == rtc->video_payload_type;
+    if (!is_rtcp && payload_type != rtc->video_payload_type && payload_type != rtc->audio_payload_type)
+        return 0;
+
+    /**
+     * For video, the STAP-A with SPS/PPS should:
+     * 1. The marker bit should be 0, never be 1.
+     * 2. The NRI should equal to the first NALU's.
+     */
+    if (is_video && buf_size > 12) {
+        nalu_header = buf[12] & 0x1f;
+        if (nalu_header == NALU_TYPE_STAP_A) {
+            /* Reset the marker bit to 0. */
+            if (buf[1] & 0x80)
+                buf[1] &= 0x7f;
+
+            /* Reset the NRI to the first NALU's NRI. */
+            if (buf_size > 15 && (buf[15]&0x60) != (buf[12]&0x60))
+                buf[12] = (buf[12]&0x80) | (buf[15]&0x60) | (buf[12]&0x1f);
+        }
+    }
+
+    /* Get the corresponding SRTP context. */
+    srtp = is_rtcp ? &rtc->srtp_rtcp_send : (is_video? &rtc->srtp_video_send : &rtc->srtp_audio_send);
+
+    /* Encrypt by SRTP and send out. */
+    cipher_size = ff_srtp_encrypt(srtp, buf, buf_size, cipher, sizeof(cipher));
+    if (cipher_size <= 0 || cipher_size < buf_size) {
+        av_log(s, AV_LOG_WARNING, "Failed to encrypt packet=%dB, cipher=%dB\n", buf_size, cipher_size);
+        return 0;
+    }
+
+    ret = ffurl_write(rtc->udp_uc, cipher, cipher_size);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to write packet=%dB, ret=%d\n", cipher_size, ret);
+        return ret;
+    }
+
+    return ret;
+}
+
+/**
+ * Inserts the SPS/PPS data before each IDR (Instantaneous Decoder Refresh) frame.
+ *
+ * The SPS/PPS is parsed from the extradata. If it's in ISOM format, the SPS/PPS is
+ * multiplexed to the data field of the packet. If it's in annexb format, then the entire
+ * extradata is set to the data field of the packet.
+ */
+static int insert_sps_pps_packet(AVFormatContext *s, AVPacket *pkt)
+{
+    int ret, is_idr, size, i;
+    uint8_t *p;
+    AVPacket* extra = NULL;
+    AVStream *st = s->streams[pkt->stream_index];
+    AVFormatContext *rtp_ctx = st->priv_data;
+    RTCContext *rtc = s->priv_data;
+
+    is_idr = (pkt->flags & AV_PKT_FLAG_KEY) && st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO;
+    if (!is_idr || !st->codecpar->extradata)
+        return 0;
+
+    extra = av_packet_alloc();
+    if (!extra)
+        return AVERROR(ENOMEM);
+
+    size = !rtc->avc_nal_length_size ? st->codecpar->extradata_size :
+            rtc->avc_nal_length_size * 2 + rtc->avc_sps_size + rtc->avc_pps_size;
+    ret = av_new_packet(extra, size);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to allocate extra packet\n");
+        goto end;
+    }
+
+    /* Encode SPS/PPS in annexb format. */
+    if (!rtc->avc_nal_length_size) {
+        memcpy(extra->data, st->codecpar->extradata, size);
+    } else {
+        /* Encode SPS/PPS in ISOM format. */
+        p = extra->data;
+        for (i = 0; i < rtc->avc_nal_length_size; i++) {
+            *p++ = rtc->avc_sps_size >> (8 * (rtc->avc_nal_length_size - i - 1));
+        }
+        memcpy(p, rtc->avc_sps, rtc->avc_sps_size);
+        p += rtc->avc_sps_size;
+
+        /* Encode PPS in ISOM format. */
+        for (i = 0; i < rtc->avc_nal_length_size; i++) {
+            *p++ = rtc->avc_pps_size >> (8 * (rtc->avc_nal_length_size - i - 1));
+        }
+        memcpy(p, rtc->avc_pps, rtc->avc_pps_size);
+        p += rtc->avc_pps_size;
+    }
+
+    /* Setup packet and feed it to chain. */
+    extra->pts = pkt->pts;
+    extra->dts = pkt->dts;
+    extra->stream_index = pkt->stream_index;
+    extra->time_base = pkt->time_base;
+
+    ret = ff_write_chained(rtp_ctx, 0, extra, s, 0);
+    if (ret < 0)
+        goto end;
+
+end:
+    av_packet_free(&extra);
+    return ret;
+}
+
+/**
+ * RTC is connectionless, for it's based on UDP, so it check whether sesison is
+ * timeout. In such case, publishers can't republish the stream util the session
+ * is timeout.
+ * This function is called to notify the server that the stream is ended, server
+ * should expire and close the session immediately, so that publishers can republish
+ * the stream quickly.
+ */
+static int whip_dispose(AVFormatContext *s)
+{
+    int ret;
+    char buf[MAX_URL_SIZE];
+    URLContext *whip_uc = NULL;
+    RTCContext *rtc = s->priv_data;
+
+    if (!rtc->whip_resource_url)
+        return 0;
+
+    ret = ffurl_alloc(&whip_uc, rtc->whip_resource_url, AVIO_FLAG_READ_WRITE, &s->interrupt_callback);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to alloc WHIP delete context: %s\n", s->url);
+        goto end;
+    }
+
+    av_opt_set(whip_uc->priv_data, "chunked_post", "0", 0);
+    av_opt_set(whip_uc->priv_data, "method", "DELETE", 0);
+    ret = ffurl_connect(whip_uc, NULL);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to DELETE url=%s\n", rtc->whip_resource_url);
+        goto end;
+    }
+
+    while (1) {
+        ret = ffurl_read(whip_uc, buf, sizeof(buf));
+        if (ret == AVERROR_EOF) {
+            ret = 0;
+            break;
+        }
+        if (ret < 0) {
+            av_log(s, AV_LOG_ERROR, "Failed to read response from DELETE url=%s\n", rtc->whip_resource_url);
+            goto end;
+        }
+    }
+
+    av_log(s, AV_LOG_INFO, "WHIP: Dispose resource %s ok\n", rtc->whip_resource_url);
+
+end:
+    ffurl_closep(&whip_uc);
+    return ret;
+}
+
+static av_cold int rtc_init(AVFormatContext *s)
+{
+    int ret;
+    RTCContext *rtc = s->priv_data;
+
+    if ((ret = whip_init(s)) < 0)
+        return ret;
+
+    if ((ret = parse_codec(s)) < 0)
+        return ret;
+
+    if ((ret = generate_sdp_offer(s)) < 0)
+        return ret;
+
+    if ((ret = exchange_sdp(s)) < 0)
+        return ret;
+
+    if ((ret = parse_answer(s)) < 0)
+        return ret;
+
+    if ((ret = ice_handshake(s)) < 0)
+        return ret;
+
+    /* Now UDP URL context is ready, setup the DTLS transport. */
+    rtc->dtls_ctx.udp_uc = rtc->udp_uc;
+
+    if ((ret = dtls_context_handshake(&rtc->dtls_ctx)) < 0)
+        return ret;
+
+    if ((ret = setup_srtp(s)) < 0)
+        return ret;
+
+    if ((ret = create_rtp_muxer(s)) < 0)
+        return ret;
+
+    return ret;
+}
+
+static int rtc_write_packet(AVFormatContext *s, AVPacket *pkt)
+{
+    int ret;
+    RTCContext *rtc = s->priv_data;
+    AVStream *st = s->streams[pkt->stream_index];
+    AVFormatContext *rtp_ctx = st->priv_data;
+
+    /* TODO: Send binding request every 1s as WebRTC heartbeat. */
+    /* TODO: Receive packets from the server such as ICE binding requests, DTLS messages,
+     * and RTCP like PLI requests, then respond to them.*/
+
+    /* For audio OPUS stream, correct the timestamp. */
+    if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
+        pkt->dts = pkt->pts = rtc->audio_jitter_base;
+        rtc->audio_jitter_base += 960;
+    }
+
+    ret = insert_sps_pps_packet(s, pkt);
+    if (ret < 0) {
+        av_log(s, AV_LOG_ERROR, "Failed to insert SPS/PPS packet\n");
+        return ret;
+    }
+
+    ret = ff_write_chained(rtp_ctx, 0, pkt, s, 0);
+    if (ret < 0) {
+        if (ret == AVERROR(EINVAL)) {
+            av_log(s, AV_LOG_WARNING, "Ignore failed to write packet=%dB, ret=%d\n", pkt->size, ret);
+            ret = 0;
+        } else
+            av_log(s, AV_LOG_ERROR, "Failed to write packet, size=%d\n", pkt->size);
+        return ret;
+    }
+
+    return ret;
+}
+
+static av_cold void rtc_deinit(AVFormatContext *s)
+{
+    int i, ret;
+    RTCContext *rtc = s->priv_data;
+
+    ret = whip_dispose(s);
+    if (ret < 0)
+        av_log(s, AV_LOG_WARNING, "Failed to dispose resource, ret=%d\n", ret);
+
+    for (i = 0; i < s->nb_streams; i++) {
+        AVFormatContext* rtp_ctx = s->streams[i]->priv_data;
+        if (!rtp_ctx)
+            continue;
+
+        av_write_trailer(rtp_ctx);
+        avio_context_free(&rtp_ctx->pb);
+        avformat_free_context(rtp_ctx);
+        s->streams[i]->priv_data = NULL;
+    }
+
+    av_freep(&rtc->avc_sps);
+    av_freep(&rtc->avc_pps);
+    av_freep(&rtc->sdp_offer);
+    av_freep(&rtc->sdp_answer);
+    av_freep(&rtc->whip_resource_url);
+    av_freep(&rtc->ice_ufrag_remote);
+    av_freep(&rtc->ice_pwd_remote);
+    av_freep(&rtc->ice_protocol);
+    av_freep(&rtc->ice_host);
+    ffurl_closep(&rtc->udp_uc);
+    ff_srtp_free(&rtc->srtp_audio_send);
+    ff_srtp_free(&rtc->srtp_video_send);
+    ff_srtp_free(&rtc->srtp_rtcp_send);
+    ff_srtp_free(&rtc->srtp_recv);
+    dtls_context_deinit(&rtc->dtls_ctx);
+}
+
+#define OFFSET(x) offsetof(RTCContext, x)
+#define DEC AV_OPT_FLAG_DECODING_PARAM
+static const AVOption options[] = {
+    { "ice_arq_max",        "Maximum number of retransmissions for the ICE ARQ mechanism",      OFFSET(ice_arq_max),        AV_OPT_TYPE_INT,    { .i64 = 5 },       -1, INT_MAX, DEC },
+    { "ice_arq_timeout",    "Start timeout in milliseconds for the ICE ARQ mechanism",          OFFSET(ice_arq_timeout),    AV_OPT_TYPE_INT,    { .i64 = 30 },      -1, INT_MAX, DEC },
+    { "dtls_arq_max",       "Maximum number of retransmissions for the DTLS ARQ mechanism",     OFFSET(dtls_arq_max),       AV_OPT_TYPE_INT,    { .i64 = 5 },       -1, INT_MAX, DEC },
+    { "dtls_arq_timeout",   "Start timeout in milliseconds for the DTLS ARQ mechanism",         OFFSET(dtls_arq_timeout),   AV_OPT_TYPE_INT,    { .i64 = 50 },     -1, INT_MAX, DEC },
+    { "pkt_size",           "The maximum size, in bytes, of RTP packets that send out",         OFFSET(pkt_size),           AV_OPT_TYPE_INT,    { .i64 = 1500 },    -1, INT_MAX, DEC },
+    { NULL },
+};
+
+static const AVClass rtc_muxer_class = {
+    .class_name = "WebRTC muxer",
+    .item_name  = av_default_item_name,
+    .option     = options,
+    .version    = LIBAVUTIL_VERSION_INT,
+};
+
+const FFOutputFormat ff_rtc_muxer = {
+    .p.name             = "rtc",
+    .p.long_name        = NULL_IF_CONFIG_SMALL("WHIP WebRTC muxer"),
+    .p.audio_codec      = AV_CODEC_ID_OPUS,
+    .p.video_codec      = AV_CODEC_ID_H264,
+    .p.flags            = AVFMT_GLOBALHEADER | AVFMT_NOFILE,
+    .p.priv_class       = &rtc_muxer_class,
+    .priv_data_size     = sizeof(RTCContext),
+    .init               = rtc_init,
+    .write_packet       = rtc_write_packet,
+    .deinit             = rtc_deinit,
+};