From patchwork Sun Jul 10 16:08:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Triang3l X-Patchwork-Id: 36727 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:6da0:b0:8b:e47:9dbf with SMTP id wl32csp943259pzb; Sun, 10 Jul 2022 12:16:19 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tkblJ51qwxbfQXHsv+MfBb2cF/z4j94m8L8B8XZ1sBvgqn3iuhJFQGcB0DJZxM1JJFnEGd X-Received: by 2002:a05:6402:2cd:b0:43a:70f7:1af2 with SMTP id b13-20020a05640202cd00b0043a70f71af2mr19958209edx.357.1657480579649; Sun, 10 Jul 2022 12:16:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657480579; cv=none; d=google.com; s=arc-20160816; b=qC1c1xBi915fW+TF9uN8/aWM/11wjjm7rFhEoy/UyxszCsAFag4o68tGWxyczXevt3 1WltFDGOWC/7rXSGWdszWgkMSkCMZDyFg/kudFg6ZkNsSSIeNLsmqvxZFxx6fqp1ISih WDrSadycO/a7mvgKIIrpGRC/R7hwXTsuTxTd8rl5JbzuJDvt0rzuuNfCvKhy0J0+BRF5 JVahRgHLbAMHb3n12lxO5GYbAM+dtGN7PXMg1yFvJqKMgRQ/WkCAvlMS7j2Sn8Dhzc6T TkD9n7nNAnsI4svVJXtJ/u0D+KHh0RfUT+X1mxWdxTHKwrGmB/rwR0nIsdre3wyEDpUH woqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:content-language:mime-version:user-agent:date :message-id:from:to:dkim-signature:delivered-to; bh=mpU0Kf3jNX1uy+9mvpdkEYRhX6RmCj/XWY2IOH46R8E=; b=cK4q/ka613lduoBAj9qVE3Y2oIsywwO9vsmPjA9TNbSpED9LZPGU8Q/MRHo71++zKk bHYrl0wdzGwrgBk8dlT9hql7x8mvuh156WadreuFJhui8pwgAW6nroO/tEUX2vLfMt2p f2fmtnpFipkTuktNLm/P4N27fVvYROK8ZPZ2caIIVOKDj4N4N9gWRVE6F+nD6ZmnW6L7 YKK7PVyL1QQ2CrWedqajjISewrhPUQMydPFdu67UsputGrN3zaHANWZnH8ajJvIQmP4e kFz83nJcFc4ASgKxnEEz56J3/SbBcxqPFTMDlIDMaoA6xKre9evBwdz4kDG/xfxHXWHa mW0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@yandex.ru header.s=mail header.b=NBWGksTq; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=yandex.ru Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f13-20020a170906824d00b007263b593ee9si6440147ejx.526.2022.07.10.12.16.18; Sun, 10 Jul 2022 12:16:19 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@yandex.ru header.s=mail header.b=NBWGksTq; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=yandex.ru Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8F39068B71A; Sun, 10 Jul 2022 22:16:15 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from forward108p.mail.yandex.net (forward108p.mail.yandex.net [77.88.28.116]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0FF8A68B71A for ; Sun, 10 Jul 2022 22:16:09 +0300 (EEST) Received: from myt5-e6c83385b9f2.qloud-c.yandex.net (myt5-e6c83385b9f2.qloud-c.yandex.net [IPv6:2a02:6b8:c12:3ca0:0:640:e6c8:3385]) by forward108p.mail.yandex.net (Yandex) with ESMTP id E605B2678CA6 for ; Sun, 10 Jul 2022 22:08:54 +0300 (MSK) Received: from myt5-89cdf5c4a3a5.qloud-c.yandex.net (myt5-89cdf5c4a3a5.qloud-c.yandex.net [2a02:6b8:c12:289b:0:640:89cd:f5c4]) by myt5-e6c83385b9f2.qloud-c.yandex.net (mxback/Yandex) with ESMTP id eBcbR5O1QH-8sfKFa8K; Sun, 10 Jul 2022 22:08:54 +0300 X-Yandex-Fwd: 2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1657480134; bh=328pLjvHM+5Rb9nPyQp2tlfkPDqYG5aJoXoL9I9qWwE=; h=Date:Subject:From:To:Message-ID; b=NBWGksTqu6z49WipVjocFtwg7k0N7X+gnIqhouwXdOpMpeDKpuNFTwJ6CBw4IUnY9 OFAix0pasVkUx7avx2zB2sW7XEFGcFCcREkc4tUy8lXVmM0knda2DX3XZO7Dg0Vqlv ulJ9nCi5nRyS5ioEcXNUaTOV80ikWl0c1UyHlfEw= Authentication-Results: myt5-e6c83385b9f2.qloud-c.yandex.net; dkim=pass header.i=@yandex.ru Received: by myt5-89cdf5c4a3a5.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id cKFzuGwNQ1-8rdGHE2M; Sun, 10 Jul 2022 22:08:54 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) To: ffmpeg-devel@ffmpeg.org From: Triang3l Message-ID: <82e16738-3cb3-b4e8-cab5-10e9f58622c4@yandex.ru> Date: Sun, 10 Jul 2022 19:08:05 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 Content-Language: en-US Subject: [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Kznt9N/nxONm Android, starting from version 6 (API level 23, from the year 2015), requires all shared libraries to be position-independent, and refuses to load shared libraries (which are the most common native binary type on Android as in Android applications, native code is placed in shared libraries accessed via the Java Native Interface) containing dynamic relocations. To support PIC, all AArch64 assembly code in FFmpeg uses the `movrel` macro to obtain addresses of labels, such as lookup table constants, in a way that with CONFIG_PIC being 1, PC-relative addresses of labels are computed via the `adrp` instruction. This approach, however, is suitable only for labels defined in the same object file. For `adrp` to work directly between object files, the linker has to perform a text relocation. And in my scenario (libavcodec being a static library linked to a shared library, though I'm not sure if this is relevant), this resulted in the following LLVM linker errors, making my application not buildable for Android: "relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol ff_cos_32; recompile with -fPIC" "can't create dynamic relocation R_AARCH64_ADD_ABS_LO12_NC against symbol: ff_cos_32 in readonly segment; recompile object files with -fPIC or pass '-Wl,-z,notext' to allow text relocations in the output" This commit brings the solution that is already implemented in FFmpeg on AArch32 to AArch64 - a separate macro, `movrelx`, which emits instructions for computing the address of an external label through the Global Object Table (GOT). The same targets as `movrel` is implemented for are covered by this commit. For Apple targets, the instruction sequence is the one that is generated for referencing an extern variable in C code by Clang for the `aarch64-apple-darwin` target triple. For other targets (Linux), this is the sequence emitted by Clang for the `aarch64-unknown-linux-gnu` target, by GCC, and specified in the "Assembly expressions" section of the Arm Compiler Reference Guide. The hardware-assisted AddressSanitizer has no effect on the sequence - Clang generates the same `:got:` and `:got_lo12:` addressing regardless of whether `-fsanitize=hwaddress` is used. Windows has no concept of PIC, and Windows builds should be done with CONFIG_PIC being 0, so it doesn't need to be handled. The literal offset, however, is always applied using a separate `add` or `sub` instruction as the actual address is loaded indirectly for an extern object that is the whole lookup table itself. The only place where the offset is currently used with `movrelx` is VP9, with the offset being 256 or 512 bytes there. Unfortunately, that offset can't be moved to the positive immediate offset encoded in load/store instructions there without major restructuring, as the actual memory accesses are performed in a function that is common to different offset values, with the offset being pre-applied to one of its arguments instead. Without PIC though, `movrelx` is implemented exactly the same as `movrel`, with the offset applied directly to the `ldr` literal, so the non-PIC path is unaffected by this change. Testing was performed on my local build setup for Android based on ndk-build. Two things were tested: - Regression testing was done using the `libavcodec/tests/fft.c` test. `libavcodec` was built as a static library, and the test was built as a native executable (which, unlike shared libraries, isn't required to be position-independent). Both the executable without the changes and the executable with the new code were launched on a physical AArch64 device using Termux. As the length of the instruction sequences for `movrel` and `movrelx` without the offset is the same, comparing the two binaries in a diff tool has shown the expected 13 differences in the code - 12 in `fftN_neon` for different transform sizes, and 1 in `ff_fft_calc_neon`. The results for the FFT test were the same for both executables with different transform size values. - To check sufficiency and suitability for fixing the original issue, the `fft.c` test was converted into a shared library (with the `main` function renamed), and a proxy executable performing `dlopen` of the library and invoking the main test function from it via `dlsym`. Termux is built with `targetSdkVersion` 28, so the `dlopen` rule of Android API levels 23 and above disallowing dynamic relocations should apply to it. The testing device is running Android 11 (API level 30). The test was executed successfully, meaning that no relocations incompatible with PIC are required by libavcodec anymore. Signed-off-by: Triang3l ---  libavcodec/aarch64/fft_neon.S         |  4 ++--  libavcodec/aarch64/sbrdsp_neon.S      |  2 +-  libavcodec/aarch64/vp9mc_16bpp_neon.S |  4 ++--  libavcodec/aarch64/vp9mc_neon.S       |  4 ++--  libavutil/aarch64/asm.S               | 19 +++++++++++++++++++  5 files changed, 26 insertions(+), 7 deletions(-) diff --git a/libavcodec/aarch64/fft_neon.S b/libavcodec/aarch64/fft_neon.S index 9ff3f9c526..d52f14626b 100644 --- a/libavcodec/aarch64/fft_neon.S +++ b/libavcodec/aarch64/fft_neon.S @@ -353,7 +353,7 @@ function fft\n\()_neon, align=6          sub             x0,  x28, #\n4*2*8          ldp             x28, x30, [sp], #16          AARCH64_VALIDATE_LINK_REGISTER -        movrel          x4,  X(ff_cos_\n) +        movrelx         x4,  X(ff_cos_\n)          mov             x2,  #\n4>>1          b               fft_pass_neon  endfunc @@ -384,7 +384,7 @@ function ff_fft_calc_neon, export=1          movrel          x12, pmmp          ldr             x3,  [x3, x2, lsl #3]          movrel          x13, mppm -        movrel          x14, X(ff_cos_16) +        movrelx         x14, X(ff_cos_16)          ld1             {v31.16b}, [x11]          mov             x0,  x1          ld1             {v29.4s},  [x12]         // pmmp diff --git a/libavcodec/aarch64/sbrdsp_neon.S b/libavcodec/aarch64/sbrdsp_neon.S index d23717e760..bbbc3811bd 100644 --- a/libavcodec/aarch64/sbrdsp_neon.S +++ b/libavcodec/aarch64/sbrdsp_neon.S @@ -273,7 +273,7 @@ endfunc  .macro apply_noise_common          sxtw            x3, w3          sxtw            x5, w5 -        movrel          x7, X(ff_sbr_noise_table) +        movrelx         x7, X(ff_sbr_noise_table)          add             x3, x3, #1  1:      and             x3, x3, #0x1ff          add             x8, x7, x3, lsl #3 diff --git a/libavcodec/aarch64/vp9mc_16bpp_neon.S b/libavcodec/aarch64/vp9mc_16bpp_neon.S index 53b372c262..ed472eb144 100644 --- a/libavcodec/aarch64/vp9mc_16bpp_neon.S +++ b/libavcodec/aarch64/vp9mc_16bpp_neon.S @@ -287,7 +287,7 @@ do_8tap_h_size 16  .macro do_8tap_h_func type, filter, offset, size, bpp  function ff_vp9_\type\()_\filter\()\size\()_h_\bpp\()_neon, export=1          mvni            v31.8h, #((0xff << (\bpp - 8)) & 0xff), lsl #8 -        movrel          x6,  X(ff_vp9_subpel_filters), 256*\offset +        movrelx         x6,  X(ff_vp9_subpel_filters), 256*\offset          cmp             w5,  #8          add             x9,  x6,  w5, uxtw #4          mov             x5,  #2*\size @@ -574,7 +574,7 @@ do_8tap_4v avg  function ff_vp9_\type\()_\filter\()\size\()_v_\bpp\()_neon, export=1          uxtw            x4,  w4          mvni            v1.8h, #((0xff << (\bpp - 8)) & 0xff), lsl #8 -        movrel          x5,  X(ff_vp9_subpel_filters), 256*\offset +        movrelx         x5,  X(ff_vp9_subpel_filters), 256*\offset          add             x6,  x5,  w6, uxtw #4          mov             x5,  #\size  .if \size >= 8 diff --git a/libavcodec/aarch64/vp9mc_neon.S b/libavcodec/aarch64/vp9mc_neon.S index abf2bae9db..48460ae485 100644 --- a/libavcodec/aarch64/vp9mc_neon.S +++ b/libavcodec/aarch64/vp9mc_neon.S @@ -353,7 +353,7 @@ do_8tap_h_size 16  .macro do_8tap_h_func type, filter, offset, size  function ff_vp9_\type\()_\filter\()\size\()_h_neon, export=1 -        movrel          x6,  X(ff_vp9_subpel_filters), 256*\offset +        movrelx         x6,  X(ff_vp9_subpel_filters), 256*\offset          cmp             w5,  #8          add             x9,  x6,  w5, uxtw #4          mov             x5,  #\size @@ -627,7 +627,7 @@ do_8tap_4v avg, 4, 3  .macro do_8tap_v_func type, filter, offset, size  function ff_vp9_\type\()_\filter\()\size\()_v_neon, export=1          uxtw            x4,  w4 -        movrel          x5,  X(ff_vp9_subpel_filters), 256*\offset +        movrelx         x5,  X(ff_vp9_subpel_filters), 256*\offset          cmp             w6,  #8          add             x6,  x5,  w6, uxtw #4          mov             x5,  #\size diff --git a/libavutil/aarch64/asm.S b/libavutil/aarch64/asm.S index a7782415d7..a8207bf38a 100644 --- a/libavutil/aarch64/asm.S +++ b/libavutil/aarch64/asm.S @@ -229,6 +229,25 @@ ELF     .size   \name, . - \name  #endif  .endm +.macro  movrelx rd, val, offset=0 +#if CONFIG_PIC +#if defined(__APPLE__) +        adrp            \rd, \val@GOTPAGE +        ldr             \rd, [\rd, \val@GOTPAGEOFF] +#else +        adrp            \rd, :got:\val +        ldr             \rd, [\rd, :got_lo12:\val] +#endif +    .if \offset > 0 +        add             \rd, \rd, \offset +    .elseif \offset < 0 +        sub             \rd, \rd, -(\offset) +    .endif +#else +        ldr             \rd, =\val+\offset +#endif +.endm +  #define GLUE(a, b) a ## b  #define JOIN(a, b) GLUE(a, b)  #define X(s) JOIN(EXTERN_ASM, s)