From patchwork Sat Sep 2 18:17:38 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?Q2zDqW1lbnQgQsWTc2No?= X-Patchwork-Id: 4950 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.15.201 with SMTP id 70csp1692981jao; Sat, 2 Sep 2017 11:17:55 -0700 (PDT) X-Google-Smtp-Source: ADKCNb5vi5v2Lq1TzZuNdXZkgfZjISYerh4dfYj9LvZ+IJstK1Oso6JArTq8VJ/bMBRtKxdrzoni X-Received: by 10.223.157.73 with SMTP id o9mr2988311wre.52.1504376275211; Sat, 02 Sep 2017 11:17:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1504376275; cv=none; d=google.com; s=arc-20160816; b=eNS/nKhQJCxUA8uSrZNDIT3f/xmKQGuV2aswdUytdFHOx0mPJbrBjIx0Cjv7dHxTcY QevPuL966PhWZwxekFdrOeY6touMU+kZr2SVLVJLWrRiByC16yf5OVt4uefh8sPIRMMd X6906KeZbIkj36gl33te8J5YrORjK+qy1wO8HgoQ2yuJnUNf/Tf1mv+KP9SY4hR90+mM PYOBQh+I8OUTDM+xt1oPqlhWulkcteH/TjwKIlp73LbU9Vz0mTrtcWzEo7bFNj7XBbEA ZeKX8QAqJT2SMwhmHkwDhpQtNVaK//iJ3WIoah19AmcjRaZOc83Z3fgKZGjuTrkrBR1+ F3Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :domainkey-signature:dkim-signature:delivered-to :arc-authentication-results; bh=U54S415HU0xkPP/mpBpVW5yKNhPunKy/FoI0PJozc1A=; b=KrJ1rofDV0B6WmjO98TBYw3n8HalU5/9FuB/itTBNrRvxfHWeh/RQqC5nmwgk/q+vX MyHM1MuKBwzXq1CV1+Q+FvduCSddFSop9YVXDAC1aZsjWFB7vMo36WrWac22GGAuknuR nw49pNWPC+rbdWl8uBzFfHolxIBy/xajR8fj93qkKEq4aI6SP1qHLnB7qOV/Q2hlnZeT oCSDNxDMb3imYzm2g+8lkIca7WOW9a05zvV+MigIMv45TiDNJzKS60VlGrcEi3eWXFzm iesziCfS6HcuZ0O6TeGjOZsm5BnhMBXCjnaFeekHc8Uc1BPs3VJRuFKW5+L2Eojp07Kr +6sg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@pkh.me header.s=selector1 header.b=JKHUugMq; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id j16si1498101wmg.7.2017.09.02.11.17.53; Sat, 02 Sep 2017 11:17:55 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@pkh.me header.s=selector1 header.b=JKHUugMq; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 31051689D29; Sat, 2 Sep 2017 21:17:48 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from golem.pkh.me (LStLambert-657-1-117-164.w92-154.abo.wanadoo.fr [92.154.28.164]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 20CCC689C90 for ; Sat, 2 Sep 2017 21:17:42 +0300 (EEST) Received: from golem.pkh.me (localhost.localdomain [127.0.0.1]) by golem.pkh.me (OpenSMTPD) with ESMTP id 6b904bc9; Sat, 2 Sep 2017 18:17:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pkh.me; h=from:to:cc :subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=selector1; bh=Y8BPWpaQjiGOAnSG2wFr M3V4JV0=; b=JKHUugMq5oxFWWDeJd5C/uFOrMGIZk4egcG9hmiaHbF8CH6QdEYl tRiqWCqP4Ngq1Nv26rKFuJIn3MTW+u8Nct0vQf94A8LZ1jz8uoPQRw+io3Lv6L4J xj0IZ6oWj0ga2+Ja5uYqVjqitAzTxpin0p+Q39BTQEQJTnagK72SJztWMnrbnzLO crir2+BGCtAG4HD/TqT31nlG/OpSr0Z1+bsgBjRgUkRI8bZkaFCciSRrVljBvYhh NBg3hrdEU1olURgsGgT6GwfjgAPYxNVKs8u7aZZdIiytGxFXhgahGYaKaEW/EpPi qqWlRkNhduxiL+FmvokThHr9+zlk/8RM3w== DomainKey-Signature: a=rsa-sha1; c=nofws; d=pkh.me; h=from:to:cc:subject :date:message-id:mime-version:content-type :content-transfer-encoding; q=dns; s=selector1; b=M2Pf4l+GhMqwCh VkaRNcD4qX5CnC8BhwqEo9dt3h5yYBiJUz4OD737Eem2wuoq1OKVVG6hRow3d9jv bdmKp1Bke27O937zOzsQoSzH4eFMzgC0ysOkluI0WUvWIAy4he1yDO+qKri4MjXF z/EIgiYws7oNhxfXMeYb/yXV6FnxYXsoiukb85rB2ezdCtil4w9fCmsRlpzcdtF7 RuaAECPuYY3Hb144EojXrPQo3zJmw7tjJD6c2jB0BTiMosTWocWzpXV4h9j5t6PI DIjWzZ2b9BDRqrdnrTgnDR1JzMagUabja4LITbfNPSUNQRCawFKmcemp8QUVIV9W zOt1i5+Q== Received: from localhost (golem.pkh.me [local]) by golem.pkh.me (OpenSMTPD) with ESMTPA id a26b144c; Sat, 2 Sep 2017 18:17:43 +0000 (UTC) From: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= To: ffmpeg-devel@ffmpeg.org Date: Sat, 2 Sep 2017 20:17:38 +0200 Message-Id: <20170902181740.23104-1-u@pkh.me> X-Mailer: git-send-email 2.14.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/3] checkasm: use perf API on Linux ARM* X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Clément Bœsch On ARM platforms, accessing the PMU registers requires special user access permissions. Since there is no other way to get accurate timers, the current implementation of timers in FFmpeg is using these registers. Unfortunately, enabling user access to these registers on Linux is not trivial, and generally involve compiling a random and unreliable github kernel module, or patching somehow your kernel. Such module is very unlikely to reach the upstream anytime soon. Quoting Robin Murphin from ARM: > Say you do give userspace direct access to the PMU; now run two or more > programs at once that believe they can use the counters for their own > "minimal-overhead" profiling. Have fun interpreting those results... > > And that's not even getting into the implications of scheduling across > different CPUs, CPUidle, etc. where the PMU state is completely beyond > userspace's control. In general, the plan to provide userspace with > something which might happen to just about work in a few corner cases, > but is meaningless, misleading or downright broken in all others, is to > never do so. As a result, the alternative is to use the Performance Monitoring Linux API which makes use of these registers internally (assuming the PMU of your ARM board is supported in the kernel, which is definitely not a given...). While the Linux API is obviously cross platform, it does have a significant overhead which needs to be taken into account. As a result, that mode is only weakly enabled on ARM platforms exclusively. Note on the non flexibility of the implementation: the timers (native FFmpeg vs Linux API) are selected at compilation time to prevent the need of function calls, which would result in a negative impact on the cycle counters. --- configure | 3 ++ tests/checkasm/checkasm.c | 107 +++++++++++++++++++++++++++++++++++++--------- tests/checkasm/checkasm.h | 47 +++++++++++++++++--- 3 files changed, 132 insertions(+), 25 deletions(-) diff --git a/configure b/configure index 445d953e4f..dc65adfde0 100755 --- a/configure +++ b/configure @@ -448,6 +448,7 @@ Developer options (useful when working on FFmpeg itself): --libfuzzer=PATH path to libfuzzer --ignore-tests=TESTS comma-separated list (without "fate-" prefix in the name) of tests whose result is ignored + --enable-linux-perf enable Linux Performance Monitor API NOTE: Object files are built at the place where configure is launched. EOF @@ -1694,6 +1695,7 @@ CONFIG_LIST=" $SUBSYSTEM_LIST autodetect fontconfig + linux_perf memory_poisoning neon_clobber_test ossfuzz @@ -5013,6 +5015,7 @@ case $target_os in linux) enable dv1394 enable section_data_rel_ro + enabled_any arm aarch64 && enable_weak linux_perf ;; irix*) target_os=irix diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 9173ed19d9..bc5193c6e3 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -20,6 +20,14 @@ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ +#include "config.h" + +#ifdef CONFIG_LINUX_PERF +# ifndef _GNU_SOURCE +# define _GNU_SOURCE // for syscall (performance monitoring API) +# endif +#endif + #include #include #include @@ -190,8 +198,7 @@ typedef struct CheckasmFuncVersion { void *func; int ok; int cpu; - int iterations; - uint64_t cycles; + CheckasmPerf perf; } CheckasmFuncVersion; /* Binary search tree node */ @@ -212,7 +219,11 @@ static struct { int bench_pattern_len; int num_checked; int num_failed; + + /* perf */ int nop_time; + int sysfd; + int cpu_flag; const char *cpu_flag_name; const char *test_name; @@ -396,7 +407,6 @@ static const char *cpu_suffix(int cpu) return "c"; } -#ifdef AV_READ_TIME static int cmp_nop(const void *a, const void *b) { return *(const uint16_t*)a - *(const uint16_t*)b; @@ -407,10 +417,13 @@ static int measure_nop_time(void) { uint16_t nops[10000]; int i, nop_sum = 0; + av_unused const int sysfd = state.sysfd; + uint64_t t = 0; for (i = 0; i < 10000; i++) { - uint64_t t = AV_READ_TIME(); - nops[i] = AV_READ_TIME() - t; + PERF_START(t); + PERF_STOP(t); + nops[i] = t; } qsort(nops, 10000, sizeof(uint16_t), cmp_nop); @@ -430,8 +443,9 @@ static void print_benchs(CheckasmFunc *f) if (f->versions.cpu || f->versions.next) { CheckasmFuncVersion *v = &f->versions; do { - if (v->iterations) { - int decicycles = (10*v->cycles/v->iterations - state.nop_time) / 4; + CheckasmPerf *p = &v->perf; + if (p->iterations) { + int decicycles = (10*p->cycles/p->iterations - state.nop_time) / 4; printf("%s_%s: %d.%d\n", f->name, cpu_suffix(v->cpu), decicycles/10, decicycles%10); } } while ((v = v->next)); @@ -440,7 +454,6 @@ static void print_benchs(CheckasmFunc *f) print_benchs(f->child[1]); } } -#endif /* ASCIIbetical sort except preserving natural order for numbers */ static int cmp_func_names(const char *a, const char *b) @@ -543,6 +556,63 @@ static void print_cpu_name(void) } } +#ifdef CONFIG_LINUX_PERF +static int bench_init_linux(void) +{ + struct perf_event_attr attr = { + .type = PERF_TYPE_HARDWARE, + .size = sizeof(struct perf_event_attr), + .config = PERF_COUNT_HW_CPU_CYCLES, + .disabled = 1, // start counting only on demand + .exclude_kernel = 1, + .exclude_hv = 1, + }; + + printf("benchmarking with Linux Perf Monitoring API\n"); + + state.sysfd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0); + if (state.sysfd == -1) { + perror("syscall"); + return -1; + } + return 0; +} +#endif + +static int bench_init_ffmpeg(void) +{ +#ifdef AV_READ_TIME + printf("benchmarking with native FFmpeg timers\n"); + return 0; +#else + fprintf(stderr, "checkasm: --bench is not supported on your system\n"); + return -1; +#endif +} + +static int bench_init(void) +{ +#if CONFIG_LINUX_PERF + int ret = bench_init_linux(); +#else + int ret = bench_init_ffmpeg(); +#endif + if (ret < 0) + return ret; + + state.nop_time = measure_nop_time(); + printf("nop: %d.%d\n", state.nop_time/10, state.nop_time%10); + return 0; +} + +static void bench_uninit(void) +{ +#if CONFIG_LINUX_PERF + if (state.sysfd > 0) + close(state.sysfd); +#endif +} + int main(int argc, char *argv[]) { unsigned int seed = av_get_random_seed(); @@ -560,10 +630,8 @@ int main(int argc, char *argv[]) while (argc > 1) { if (!strncmp(argv[1], "--bench", 7)) { -#ifndef AV_READ_TIME - fprintf(stderr, "checkasm: --bench is not supported on your system\n"); - return 1; -#endif + if (bench_init() < 0) + return 1; if (argv[1][7] == '=') { state.bench_pattern = argv[1] + 8; state.bench_pattern_len = strlen(state.bench_pattern); @@ -591,16 +659,13 @@ int main(int argc, char *argv[]) ret = 1; } else { fprintf(stderr, "checkasm: all %d tests passed\n", state.num_checked); -#ifdef AV_READ_TIME if (state.bench_pattern) { - state.nop_time = measure_nop_time(); - printf("nop: %d.%d\n", state.nop_time/10, state.nop_time%10); print_benchs(state.funcs); } -#endif } destroy_func_tree(state.funcs); + bench_uninit(); return ret; } @@ -678,11 +743,13 @@ void checkasm_fail_func(const char *msg, ...) } } -/* Update benchmark results of the current function */ -void checkasm_update_bench(int iterations, uint64_t cycles) +/* Get the benchmark context of the current function */ +CheckasmPerf *checkasm_get_perf_context(void) { - state.current_func_ver->iterations += iterations; - state.current_func_ver->cycles += cycles; + CheckasmPerf *perf = &state.current_func_ver->perf; + memset(perf, 0, sizeof(*perf)); + perf->sysfd = state.sysfd; + return perf; } /* Print the outcome of all tests performed since the last time this function was called */ diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 3165b21086..72a1fe9a7d 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -25,6 +25,14 @@ #include #include "config.h" + +#ifdef CONFIG_LINUX_PERF +#include // read(3) +#include +#include +#include +#endif + #include "libavutil/avstring.h" #include "libavutil/cpu.h" #include "libavutil/internal.h" @@ -58,10 +66,12 @@ void checkasm_check_vp8dsp(void); void checkasm_check_vp9dsp(void); void checkasm_check_videodsp(void); +struct CheckasmPerf; + void *checkasm_check_func(void *func, const char *name, ...) av_printf_format(2, 3); int checkasm_bench_func(void); void checkasm_fail_func(const char *msg, ...) av_printf_format(1, 2); -void checkasm_update_bench(int iterations, uint64_t cycles); +struct CheckasmPerf *checkasm_get_perf_context(void); void checkasm_report(const char *name, ...) av_printf_format(1, 2); /* float compare utilities */ @@ -178,32 +188,59 @@ void checkasm_checked_call(void *func, ...); #define declare_new_float(ret, ...) declare_new(ret, __VA_ARGS__) #endif +typedef struct CheckasmPerf { + int sysfd; + uint64_t cycles; + int iterations; +} CheckasmPerf; + +#if defined(AV_READ_TIME) || CONFIG_LINUX_PERF + +#if CONFIG_LINUX_PERF +#define PERF_START(t) do { \ + ioctl(sysfd, PERF_EVENT_IOC_RESET, 0); \ + ioctl(sysfd, PERF_EVENT_IOC_ENABLE, 0); \ +} while (0) +#define PERF_STOP(t) do { \ + ioctl(sysfd, PERF_EVENT_IOC_DISABLE, 0); \ + read(sysfd, &t, sizeof(t)); \ +} while (0) +#else +#define PERF_START(t) t = AV_READ_TIME() +#define PERF_STOP(t) t = AV_READ_TIME() - t +#endif + /* Benchmark the function */ -#ifdef AV_READ_TIME #define bench_new(...)\ do {\ if (checkasm_bench_func()) {\ + struct CheckasmPerf *perf = checkasm_get_perf_context();\ + av_unused const int sysfd = perf->sysfd;\ func_type *tfunc = func_new;\ uint64_t tsum = 0;\ int ti, tcount = 0;\ + uint64_t t = 0; \ for (ti = 0; ti < BENCH_RUNS; ti++) {\ - uint64_t t = AV_READ_TIME();\ + PERF_START(t);\ tfunc(__VA_ARGS__);\ tfunc(__VA_ARGS__);\ tfunc(__VA_ARGS__);\ tfunc(__VA_ARGS__);\ - t = AV_READ_TIME() - t;\ + PERF_STOP(t);\ if (t*tcount <= tsum*4 && ti > 0) {\ tsum += t;\ tcount++;\ }\ }\ emms_c();\ - checkasm_update_bench(tcount, tsum);\ + perf->cycles += t;\ + perf->iterations++;\ }\ } while (0) #else #define bench_new(...) while(0) +#define PERF_START(t) while(0) +#define PERF_STOP(t) while(0) #endif #endif /* TESTS_CHECKASM_CHECKASM_H */