From patchwork Thu Jan 6 14:24:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Romain Beauxis X-Patchwork-Id: 33124 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp1695875iog; Thu, 6 Jan 2022 06:24:22 -0800 (PST) X-Google-Smtp-Source: ABdhPJzoIR/0vlMgYnAe1EfPFWUs4Eh8B9rElWnPajI3B0QPbxbdE7dLWGnyLKDPsdR2NU4p7rKj X-Received: by 2002:a50:ed06:: with SMTP id j6mr5782239eds.21.1641479061867; Thu, 06 Jan 2022 06:24:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1641479061; cv=none; d=google.com; s=arc-20160816; b=jXVZWaycrFo/XEvzFbCDw9ceaFurRdL4cG8GwnPALtRqd/aKt+l4F9odxsXjvX7cZF 2jKuKZO7ecN04B2Txj3atJkvcxt9F94m3THV3yoBKolF5O4iFtYaqUz74PEH6EXAeVNh 0Sn030ey9TB3VbSVA20jC9l1W2Ukl5L2PTNqHfSW98mPxKNIaQcJlevTjLXZmEdt4owP uwkHDKjMwiCdTohenoqj6grv9ty2Ezj3lccQo0842201BM0kw+2xvHZ2yymQYXGmcUno 3ZOUMfAXHauzJekxUTETcqyvMdxRmOaZiqcQbgDUln5VVg3p7/yj2vEI6IUYz3EO809D gMjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:to:date:message-id:mime-version:from :delivered-to; bh=h1Mu8/8DEjaO8+QZ4QUaxlMZtf5UkR2QAPrWrVYt/KA=; b=aFPlXDZOEpJ11E2U8DiFrq7ZzdmR/WfRPKAGvqcIpM3iIRU/as7C6SD/W+2KcvsPy6 6lMB4TPrDpOkG4zwjNPX/qZgW7vEBDMDK2OBmDkNzszp2w2euWT93fU3e16ZrnTbVAhC 0OGWOh/eDAPPS7G8JCmjlhuCrnpYTZ6FeuasKqqKv+VZU8BJa3wRRbOYiACkeSc6bExf 0Q2tozHFTHBOFcFBfsx/b9F+l8aqcMZY5KyequvZMu92nzSQVoL8RNbcKsojOfI8DxYw Y1sqXm4HeGYSHVAzbfYlaU5g0IyCVooye9K1JWeh7lwrzhWUV+/CYtt0uShZ9xtMDa5Q TJfA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id w12si1228693edj.233.2022.01.06.06.24.20; Thu, 06 Jan 2022 06:24:21 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1798468A7FD; Thu, 6 Jan 2022 16:24:17 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2FEF06800E0 for ; Thu, 6 Jan 2022 16:24:11 +0200 (EET) Received: by mail-qk1-f178.google.com with SMTP id f138so2738069qke.10 for ; Thu, 06 Jan 2022 06:24:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:message-id:date:cc:to; bh=9IlDBr9ruBjEvdQbwYZiFCJfL2nazNewlJuxVde7G1U=; b=qgTZXV/d/kKrnMzjnYbIRWXqdTOlcGBAhQpnOh38VX/Aw7Yu2oLF0/g+8zEKICs3s6 VmIuPioEYmPsmmBsBeB6w7qhf/NopYWl5kv5KXXTSDhxXMf3JW82vBL4V+Nv5BmYWYiv 66lAzCsqPE2W91yKqahaaxc3UgrKTB/BnAzPz6s+O1CNYlYNOUgU+6VgIFhS3zgj80Bz 0P2xlFQJL/0GuwG7FvNG/wWyu8b5r2aSraFZVmeZwbY8NQ6+cBb66hzWhxIeI8jOszzx LwDe+z0FCE2h3nypc19OS4GfrrO/xiWjCjL3BbyzbQp4HJ5bxh3pEPCRCjTnaul846T4 MjiQ== X-Gm-Message-State: AOAM5315e0E0H/Kqs+z66Vj1IKU0Z7ojQ6d13VciEtSz6+PMsGdBUFy+ sQRXNzHTgNXMPrJdjQJ6PGTLirib7KEZOA== X-Received: by 2002:a37:b001:: with SMTP id z1mr41450404qke.252.1641479049435; Thu, 06 Jan 2022 06:24:09 -0800 (PST) Received: from smtpclient.apple ([172.58.129.153]) by smtp.gmail.com with ESMTPSA id br43sm1599883qkb.57.2022.01.06.06.24.07 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Jan 2022 06:24:08 -0800 (PST) From: Romain Beauxis Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.40.0.1.81\)) Message-Id: <03970EF9-6434-417E-B32D-70E37743B16E@rastageeks.org> Date: Thu, 6 Jan 2022 08:24:05 -0600 To: ffmpeg-devel@ffmpeg.org X-Mailer: Apple Mail (2.3693.40.0.1.81) Subject: [FFmpeg-devel] [PATCH v9 1/3] libavdevice/avfoundation.m: use AudioConvert, extend supported formats X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thilo.borgmann@mail.de, Aman Karmani , epirat07@gmail.com Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 7sU7X8dlrseT * Implement support for AudioConverter * Switch to AudioConverter's API to convert unsupported PCM formats (non-interleaved, non-packed) to supported formats * Minimize data copy. This fixes: https://trac.ffmpeg.org/ticket/9502 API ref: https://developer.apple.com/documentation/audiotoolbox/audio_converter_services Signed-off-by: Romain Beauxis --- This is the first patch of a series of 3 that fix, cleanup and enhance the avfoundation implementation for libavdevice. These patches come from an actual user-facing application relying on libavdevice’s implementation of avfoundation audio input. Without them, Avfoundation is practically unusable as it will: * Refuse to process certain specific audio input format that are actually returned by the OS for some users (packed PCM audio) * Drop audio frames, resulting in corrupted audio input. This might have been unnoticed with video frames but this makes avfoundation essentially unusable for audio. The patches are now being included in our production build so they are tested and usable in production. Changelog for this patch: * v2: None * v3: None * v4: None * v5: Fix indentation/wrapping * v6: None * v7: Removed use of kAudioConverterPropertyCalculateOutputBufferSize to calculate output buffer size. The calculation is trivial and this call was randomly failing for no reason * v8: None * v9: None libavdevice/avfoundation.m | 255 +++++++++++++++++++++---------------- 1 file changed, 145 insertions(+), 110 deletions(-) diff --git a/libavdevice/avfoundation.m b/libavdevice/avfoundation.m index 0cd6e646d5..738cd93375 100644 --- a/libavdevice/avfoundation.m +++ b/libavdevice/avfoundation.m @@ -111,16 +111,11 @@ int num_video_devices; - int audio_channels; - int audio_bits_per_sample; - int audio_float; - int audio_be; - int audio_signed_integer; - int audio_packed; - int audio_non_interleaved; - - int32_t *audio_buffer; - int audio_buffer_size; + UInt32 audio_buffers; + UInt32 audio_channels; + UInt32 input_bytes_per_sample; + UInt32 output_bytes_per_sample; + AudioConverterRef audio_converter; enum AVPixelFormat pixel_format; @@ -299,7 +294,10 @@ static void destroy_context(AVFContext* ctx) ctx->avf_delegate = NULL; ctx->avf_audio_delegate = NULL; - av_freep(&ctx->audio_buffer); + if (ctx->audio_converter) { + AudioConverterDispose(ctx->audio_converter); + ctx->audio_converter = NULL; + } pthread_mutex_destroy(&ctx->frame_lock); @@ -673,6 +671,10 @@ static int get_audio_config(AVFormatContext *s) AVFContext *ctx = (AVFContext*)s->priv_data; CMFormatDescriptionRef format_desc; AVStream* stream = avformat_new_stream(s, NULL); + AudioStreamBasicDescription output_format = {0}; + int audio_bits_per_sample, audio_float, audio_be; + int audio_signed_integer, audio_packed, audio_non_interleaved; + int must_convert = 0; if (!stream) { return 1; @@ -690,60 +692,97 @@ static int get_audio_config(AVFormatContext *s) avpriv_set_pts_info(stream, 64, 1, avf_time_base); format_desc = CMSampleBufferGetFormatDescription(ctx->current_audio_frame); - const AudioStreamBasicDescription *basic_desc = CMAudioFormatDescriptionGetStreamBasicDescription(format_desc); + const AudioStreamBasicDescription *input_format = CMAudioFormatDescriptionGetStreamBasicDescription(format_desc); - if (!basic_desc) { + if (!input_format) { unlock_frames(ctx); av_log(s, AV_LOG_ERROR, "audio format not available\n"); return 1; } + if (input_format->mFormatID != kAudioFormatLinearPCM) { + unlock_frames(ctx); + av_log(s, AV_LOG_ERROR, "only PCM audio format are supported at the moment\n"); + return 1; + } + stream->codecpar->codec_type = AVMEDIA_TYPE_AUDIO; - stream->codecpar->sample_rate = basic_desc->mSampleRate; - stream->codecpar->channels = basic_desc->mChannelsPerFrame; + stream->codecpar->sample_rate = input_format->mSampleRate; + stream->codecpar->channels = input_format->mChannelsPerFrame; stream->codecpar->channel_layout = av_get_default_channel_layout(stream->codecpar->channels); - ctx->audio_channels = basic_desc->mChannelsPerFrame; - ctx->audio_bits_per_sample = basic_desc->mBitsPerChannel; - ctx->audio_float = basic_desc->mFormatFlags & kAudioFormatFlagIsFloat; - ctx->audio_be = basic_desc->mFormatFlags & kAudioFormatFlagIsBigEndian; - ctx->audio_signed_integer = basic_desc->mFormatFlags & kAudioFormatFlagIsSignedInteger; - ctx->audio_packed = basic_desc->mFormatFlags & kAudioFormatFlagIsPacked; - ctx->audio_non_interleaved = basic_desc->mFormatFlags & kAudioFormatFlagIsNonInterleaved; - - if (basic_desc->mFormatID == kAudioFormatLinearPCM && - ctx->audio_float && - ctx->audio_bits_per_sample == 32 && - ctx->audio_packed) { - stream->codecpar->codec_id = ctx->audio_be ? AV_CODEC_ID_PCM_F32BE : AV_CODEC_ID_PCM_F32LE; - } else if (basic_desc->mFormatID == kAudioFormatLinearPCM && - ctx->audio_signed_integer && - ctx->audio_bits_per_sample == 16 && - ctx->audio_packed) { - stream->codecpar->codec_id = ctx->audio_be ? AV_CODEC_ID_PCM_S16BE : AV_CODEC_ID_PCM_S16LE; - } else if (basic_desc->mFormatID == kAudioFormatLinearPCM && - ctx->audio_signed_integer && - ctx->audio_bits_per_sample == 24 && - ctx->audio_packed) { - stream->codecpar->codec_id = ctx->audio_be ? AV_CODEC_ID_PCM_S24BE : AV_CODEC_ID_PCM_S24LE; - } else if (basic_desc->mFormatID == kAudioFormatLinearPCM && - ctx->audio_signed_integer && - ctx->audio_bits_per_sample == 32 && - ctx->audio_packed) { - stream->codecpar->codec_id = ctx->audio_be ? AV_CODEC_ID_PCM_S32BE : AV_CODEC_ID_PCM_S32LE; + audio_bits_per_sample = input_format->mBitsPerChannel; + audio_float = input_format->mFormatFlags & kAudioFormatFlagIsFloat; + audio_be = input_format->mFormatFlags & kAudioFormatFlagIsBigEndian; + audio_signed_integer = input_format->mFormatFlags & kAudioFormatFlagIsSignedInteger; + audio_packed = input_format->mFormatFlags & kAudioFormatFlagIsPacked; + audio_non_interleaved = input_format->mFormatFlags & kAudioFormatFlagIsNonInterleaved; + + ctx->input_bytes_per_sample = input_format->mBitsPerChannel >> 3; + ctx->output_bytes_per_sample = ctx->input_bytes_per_sample; + ctx->audio_channels = input_format->mChannelsPerFrame; + + if (audio_non_interleaved) { + ctx->audio_buffers = input_format->mChannelsPerFrame; } else { - unlock_frames(ctx); - av_log(s, AV_LOG_ERROR, "audio format is not supported\n"); - return 1; + ctx->audio_buffers = 1; + } + + if (audio_non_interleaved || !audio_packed) { + must_convert = 1; + } + + output_format.mBitsPerChannel = input_format->mBitsPerChannel; + output_format.mChannelsPerFrame = ctx->audio_channels; + output_format.mFramesPerPacket = 1; + output_format.mBytesPerFrame = output_format.mChannelsPerFrame * ctx->input_bytes_per_sample; + output_format.mBytesPerPacket = output_format.mFramesPerPacket * output_format.mBytesPerFrame; + output_format.mFormatFlags = kAudioFormatFlagIsPacked | audio_be; + output_format.mFormatID = kAudioFormatLinearPCM; + output_format.mReserved = 0; + output_format.mSampleRate = input_format->mSampleRate; + + if (audio_float && + audio_bits_per_sample == 32) { + stream->codecpar->codec_id = audio_be ? AV_CODEC_ID_PCM_F32BE : AV_CODEC_ID_PCM_F32LE; + output_format.mFormatFlags |= kAudioFormatFlagIsFloat; + } else if (audio_float && + audio_bits_per_sample == 64) { + stream->codecpar->codec_id = audio_be ? AV_CODEC_ID_PCM_F64BE : AV_CODEC_ID_PCM_F64LE; + output_format.mFormatFlags |= kAudioFormatFlagIsFloat; + } else if (audio_signed_integer && + audio_bits_per_sample == 8) { + stream->codecpar->codec_id = AV_CODEC_ID_PCM_S8; + output_format.mFormatFlags |= kAudioFormatFlagIsSignedInteger; + } else if (audio_signed_integer && + audio_bits_per_sample == 16) { + stream->codecpar->codec_id = audio_be ? AV_CODEC_ID_PCM_S16BE : AV_CODEC_ID_PCM_S16LE; + output_format.mFormatFlags |= kAudioFormatFlagIsSignedInteger; + } else if (audio_signed_integer && + audio_bits_per_sample == 24) { + stream->codecpar->codec_id = audio_be ? AV_CODEC_ID_PCM_S24BE : AV_CODEC_ID_PCM_S24LE; + output_format.mFormatFlags |= kAudioFormatFlagIsSignedInteger; + } else if (audio_signed_integer && + audio_bits_per_sample == 32) { + stream->codecpar->codec_id = audio_be ? AV_CODEC_ID_PCM_S32BE : AV_CODEC_ID_PCM_S32LE; + output_format.mFormatFlags |= kAudioFormatFlagIsSignedInteger; + } else if (audio_signed_integer && + audio_bits_per_sample == 64) { + stream->codecpar->codec_id = audio_be ? AV_CODEC_ID_PCM_S64BE : AV_CODEC_ID_PCM_S64LE; + output_format.mFormatFlags |= kAudioFormatFlagIsSignedInteger; + } else { + stream->codecpar->codec_id = audio_be ? AV_CODEC_ID_PCM_S32BE : AV_CODEC_ID_PCM_S32LE; + ctx->output_bytes_per_sample = 4; + output_format.mBitsPerChannel = 32; + output_format.mFormatFlags |= kAudioFormatFlagIsSignedInteger; + must_convert = 1; } - if (ctx->audio_non_interleaved) { - CMBlockBufferRef block_buffer = CMSampleBufferGetDataBuffer(ctx->current_audio_frame); - ctx->audio_buffer_size = CMBlockBufferGetDataLength(block_buffer); - ctx->audio_buffer = av_malloc(ctx->audio_buffer_size); - if (!ctx->audio_buffer) { + if (must_convert) { + OSStatus ret = AudioConverterNew(input_format, &output_format, &ctx->audio_converter); + if (ret != noErr) { unlock_frames(ctx); - av_log(s, AV_LOG_ERROR, "error allocating audio buffer\n"); + av_log(s, AV_LOG_ERROR, "Error while allocating audio converter\n"); return 1; } } @@ -1048,6 +1087,7 @@ static int copy_cvpixelbuffer(AVFormatContext *s, static int avf_read_packet(AVFormatContext *s, AVPacket *pkt) { + OSStatus ret; AVFContext* ctx = (AVFContext*)s->priv_data; do { @@ -1091,7 +1131,7 @@ static int avf_read_packet(AVFormatContext *s, AVPacket *pkt) status = copy_cvpixelbuffer(s, image_buffer, pkt); } else { status = 0; - OSStatus ret = CMBlockBufferCopyDataBytes(block_buffer, 0, pkt->size, pkt->data); + ret = CMBlockBufferCopyDataBytes(block_buffer, 0, pkt->size, pkt->data); if (ret != kCMBlockBufferNoErr) { status = AVERROR(EIO); } @@ -1105,21 +1145,60 @@ static int avf_read_packet(AVFormatContext *s, AVPacket *pkt) } } else if (ctx->current_audio_frame != nil) { CMBlockBufferRef block_buffer = CMSampleBufferGetDataBuffer(ctx->current_audio_frame); - int block_buffer_size = CMBlockBufferGetDataLength(block_buffer); - if (!block_buffer || !block_buffer_size) { - unlock_frames(ctx); - return AVERROR(EIO); - } + size_t input_size = CMBlockBufferGetDataLength(block_buffer); + int buffer_size = input_size / ctx->audio_buffers; + int nb_samples = input_size / (ctx->audio_channels * ctx->input_bytes_per_sample); + int output_size = nb_samples * ctx->output_bytes_per_sample * ctx->audio_channels; - if (ctx->audio_non_interleaved && block_buffer_size > ctx->audio_buffer_size) { - unlock_frames(ctx); - return AVERROR_BUFFER_TOO_SMALL; + status = av_new_packet(pkt, output_size); + if (status < 0) { + CFRelease(audio_frame); + return status; } - if (av_new_packet(pkt, block_buffer_size) < 0) { - unlock_frames(ctx); - return AVERROR(EIO); + if (ctx->audio_converter) { + size_t input_buffer_size = offsetof(AudioBufferList, mBuffers[0]) + (sizeof(AudioBuffer) * ctx->audio_buffers); + AudioBufferList *input_buffer = av_malloc(input_buffer_size); + + input_buffer->mNumberBuffers = ctx->audio_buffers; + + for (int c = 0; c < ctx->audio_buffers; c++) { + input_buffer->mBuffers[c].mNumberChannels = 1; + + ret = CMBlockBufferGetDataPointer(block_buffer, c * buffer_size, (size_t *)&input_buffer->mBuffers[c].mDataByteSize, NULL, (void *)&input_buffer->mBuffers[c].mData); + + if (ret != kCMBlockBufferNoErr) { + av_free(input_buffer); + unlock_frames(ctx); + return AVERROR(EIO); + } + } + + AudioBufferList output_buffer = { + .mNumberBuffers = 1, + .mBuffers[0] = { + .mNumberChannels = ctx->audio_channels, + .mDataByteSize = pkt->size, + .mData = pkt->data + } + }; + + ret = AudioConverterConvertComplexBuffer(ctx->audio_converter, nb_samples, input_buffer, &output_buffer); + av_free(input_buffer); + + if (ret != noErr) { + unlock_frames(ctx); + return AVERROR(EIO); + } + + pkt->size = output_buffer.mBuffers[0].mDataByteSize; + } else { + ret = CMBlockBufferCopyDataBytes(block_buffer, 0, pkt->size, pkt->data); + if (ret != kCMBlockBufferNoErr) { + unlock_frames(ctx); + return AVERROR(EIO); + } } CMItemCount count; @@ -1133,54 +1212,10 @@ static int avf_read_packet(AVFormatContext *s, AVPacket *pkt) pkt->stream_index = ctx->audio_stream_index; pkt->flags |= AV_PKT_FLAG_KEY; - if (ctx->audio_non_interleaved) { - int sample, c, shift, num_samples; - - OSStatus ret = CMBlockBufferCopyDataBytes(block_buffer, 0, pkt->size, ctx->audio_buffer); - if (ret != kCMBlockBufferNoErr) { - unlock_frames(ctx); - return AVERROR(EIO); - } - - num_samples = pkt->size / (ctx->audio_channels * (ctx->audio_bits_per_sample >> 3)); - - // transform decoded frame into output format - #define INTERLEAVE_OUTPUT(bps) \ - { \ - int##bps##_t **src; \ - int##bps##_t *dest; \ - src = av_malloc(ctx->audio_channels * sizeof(int##bps##_t*)); \ - if (!src) { \ - unlock_frames(ctx); \ - return AVERROR(EIO); \ - } \ - \ - for (c = 0; c < ctx->audio_channels; c++) { \ - src[c] = ((int##bps##_t*)ctx->audio_buffer) + c * num_samples; \ - } \ - dest = (int##bps##_t*)pkt->data; \ - shift = bps - ctx->audio_bits_per_sample; \ - for (sample = 0; sample < num_samples; sample++) \ - for (c = 0; c < ctx->audio_channels; c++) \ - *dest++ = src[c][sample] << shift; \ - av_freep(&src); \ - } - - if (ctx->audio_bits_per_sample <= 16) { - INTERLEAVE_OUTPUT(16) - } else { - INTERLEAVE_OUTPUT(32) - } - } else { - OSStatus ret = CMBlockBufferCopyDataBytes(block_buffer, 0, pkt->size, pkt->data); - if (ret != kCMBlockBufferNoErr) { - unlock_frames(ctx); - return AVERROR(EIO); - } - } - CFRelease(ctx->current_audio_frame); ctx->current_audio_frame = nil; + + unlock_frames(ctx); } else { pkt->data = NULL; unlock_frames(ctx);