From patchwork Mon Oct 24 20:03:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 38968 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4a86:b0:9d:28a3:170e with SMTP id fn6csp2409573pzb; Mon, 24 Oct 2022 13:12:53 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7otNUp0shchQ+a1es/yi3j8sXyJIBRTwapUV06QtEewee151XuAYZhnrk3zeC1wDpiY7JW X-Received: by 2002:a05:6402:440f:b0:45d:297b:c70a with SMTP id y15-20020a056402440f00b0045d297bc70amr32480355eda.187.1666642373621; Mon, 24 Oct 2022 13:12:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666642373; cv=none; d=google.com; s=arc-20160816; b=Stum6Cm/SfPHm0Ya5tf0pEq8hyPZbs37euf/QBq+31GL1sZEWKYrBKHOSttBCWLdoa 4co/glHbyXDdXO5NUVGPnzV5x+7j3HFfSM40WraJehk5o0CGwMY6NQJ5VFXhmKssILbz 6M/tnk9/B/GtN2sRlCVm6F3+ZRHmamymdXOgCXMw2pWRKeI9hiVCgmzyOLH0EYLiK64c TxkGLiolgRAhYYDQrOS8Mi3FOM+jdx1jGewST+jaMa6brwrXb4nypuj+RZISK4KMgaPt 0GIA5j85J7ySZZGgtkAUWc92zsvlmHR+5UYPGzRTEYAdIMtm6E8Ul368+xHZqSjSa/FX sbEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=juVfuQk9OrJLbMer86WTX7s1BbdcxbTMGzXm+JB/78Y=; b=CBdXjYKt1/xjA/OubsoXw7jYmycyyIeBVLFo+WVc0NUu5ROwCssLm3GuQo34Lk/2sx WB2X6vQFpifvrKVE1q0V1Tw17hmRouD4j90xVKeVF8hpLxqBb82aatdraag5mY8YHafX SHCItH5DQwxraMb4php9LJepoixSlVhuB6pXz+FBb04avJODh+IauwUU3dGBMNh5P6Xs hLeXfd5ghJMForz+VDUWvH5IFbXYb+A5SJL5MQRHzqoQvCoFyPyG6mhEJv1QHLxq9K4m FHXk/8KMS+7v4/cuXuj5q9Cn7EUiv7MFcWtRQkIP58/BZo3EEm0LrSQ/XXn2DwvrUFCY 6/eg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=HEeq2Gnx; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id js3-20020a17090797c300b00782161b3422si705111ejc.914.2022.10.24.13.12.52; Mon, 24 Oct 2022 13:12:53 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=HEeq2Gnx; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6463668BC61; Mon, 24 Oct 2022 23:12:49 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-oa1-f52.google.com (mail-oa1-f52.google.com [209.85.160.52]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B91A268BB9A for ; Mon, 24 Oct 2022 23:12:42 +0300 (EEST) Received: by mail-oa1-f52.google.com with SMTP id 586e51a60fabf-13ba9a4430cso4440097fac.11 for ; Mon, 24 Oct 2022 13:12:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=6IZ56e4tETLt6eQV7NJr/uuGj+AlrgLoX/bHQkSGPQY=; b=HEeq2GnxxxtcXcJ4gIXm3qzrZbQizSwp6CKjfyjL+ybHGe0HTjhV8C7uQJPJrXaIEH qFvu4bort+3WYYZze9rtfGeZ0Ek3NHB1bEk8v6DAvScRwzpwPp8a5W/FjETHSogYBs3i TtjIKbULjOV3aTHGmRiOFg+gQbiCZzHakA+ePQIv+pKBpjnHlyFx6HsLpcxtGNcvN9Lj HNERIJOoX0ztjVTtorZSPXnEl2XOApEQcbqoIfGAXEtymE2YxcKcxgN7ygE1DPfEvZR+ vvBA//lLMB4Z6nYBXS1DeqRzhR9N0UN2xvvcyv7cEo/2PW2g7eSwHY4WUVLsO607Z5uQ c95g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6IZ56e4tETLt6eQV7NJr/uuGj+AlrgLoX/bHQkSGPQY=; b=0+nJ54AEoTjF46i8lxT37P0bi4ztFQzj3PsiGc3ZMePJmeePyAlcIy87M9EIEleMK0 nMpd1XDnE88c+KTLKWcHb40r/QEqtFsI58SxEI+jAIQXV3WofqrAPy0pzeAQVlZGHLLq R5nebOz6WZneESCubn1MLxY7Y5hTy0ablXMUq2qLJz9Um1n0DNBOUuDcg0AfA2V1tYZV qqXDUKFf3ILoqbKpLqdVg8SjN0OjfWyspEfJIbs750KFy/rGAB51buqiYQns1zUDxHFr oPceeYM0Xkdn4x9bjJc7brBI5hkOaCRrHV4uLODgDBzAMwmV52YxVrIl8SUzovU0FLhz P23w== X-Gm-Message-State: ACrzQf2bhm5+blP8A2eIFWDpJ8838yHA0QOr0F9EZ9jH/YarqiKd/EFp K/leNdTwsaO4RNvuwU9EK4e3euqbdg8= X-Received: by 2002:a05:6870:2381:b0:12c:8a51:47c6 with SMTP id e1-20020a056870238100b0012c8a5147c6mr37570771oap.12.1666641928228; Mon, 24 Oct 2022 13:05:28 -0700 (PDT) Received: from localhost.localdomain ([191.97.187.183]) by smtp.gmail.com with ESMTPSA id q14-20020a4adc4e000000b00480b7efd5d9sm386236oov.6.2022.10.24.13.05.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Oct 2022 13:05:27 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Mon, 24 Oct 2022 17:03:51 -0300 Message-Id: <20221024200351.15126-1-jamrial@gmail.com> X-Mailer: git-send-email 2.38.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] x86/intreadwrite: use intrinsics instead of inline asm for AV_ZERO128 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: QbvQuza9rgY8 When called inside a loop, the inline asm version results in one pxor unnecessarely emitted per iteration, as the contents of the __asm__() block are opaque to the compiler's instruction scheduler. This is not the case with intrinsics, where pxor will be emitted once with any half decent compiler. The code can be adapted to also work with MSVC, but for now, it will work with the same compilers previously supported (GCC, Clang, etc). Signed-off-by: James Almer --- configure | 3 +++ libavutil/x86/intreadwrite.h | 15 +++++++-------- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/configure b/configure index c5a466657f..5bb83f5b5a 100755 --- a/configure +++ b/configure @@ -2222,6 +2222,7 @@ HEADERS_LIST=" INTRINSICS_LIST=" intrinsics_neon + intrinsics_sse2 " COMPLEX_FUNCS=" @@ -2636,6 +2637,7 @@ armv6t2_deps="arm" armv8_deps="aarch64" neon_deps_any="aarch64 arm" intrinsics_neon_deps="neon" +intrinsics_sse2_deps="sse2" vfp_deps_any="aarch64 arm" vfpv3_deps="vfp" setend_deps="arm" @@ -6207,6 +6209,7 @@ elif enabled loongarch; then fi check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)" +check_cc intrinsics_sse2 emmintrin.h "__m128i test = _mm_setzero_si128()" check_ldflags -Wl,--as-needed check_ldflags -Wl,-z,noexecstack diff --git a/libavutil/x86/intreadwrite.h b/libavutil/x86/intreadwrite.h index 40f375b013..4a03e60fc6 100644 --- a/libavutil/x86/intreadwrite.h +++ b/libavutil/x86/intreadwrite.h @@ -21,6 +21,9 @@ #ifndef AVUTIL_X86_INTREADWRITE_H #define AVUTIL_X86_INTREADWRITE_H +#if HAVE_INTRINSICS_SSE2 +#include +#endif #include #include "config.h" #include "libavutil/attributes.h" @@ -79,20 +82,16 @@ static av_always_inline void AV_COPY128(void *d, const void *s) #endif /* __SSE__ */ -#ifdef __SSE2__ +#if HAVE_INTRINSICS_SSE2 && defined(__SSE2__) #define AV_ZERO128 AV_ZERO128 static av_always_inline void AV_ZERO128(void *d) { - struct v {uint64_t v[2];}; - - __asm__("pxor %%xmm0, %%xmm0 \n\t" - "movdqa %%xmm0, %0 \n\t" - : "=m"(*(struct v*)d) - :: "xmm0"); + __m128i zero = _mm_setzero_si128(); + _mm_store_si128(d, zero); } -#endif /* __SSE2__ */ +#endif /* HAVE_INTRINSICS_SSE2 && defined(__SSE2__) */ #endif /* HAVE_MMX */