Message ID | 20170918015213.7236-1-jamrial@gmail.com |
---|---|
State | Accepted |
Commit | 18821e3ba1baa8e0fe037e11c77459ebc73f7e37 |
Headers | show |
2017-09-18 3:52 GMT+02:00 James Almer <jamrial@gmail.com>: > From: Henrik Gramner <henrik@gramner.com> > > Tested with "checkasm --test=exrdsp -bench" > > Before: > reorder_pixels_c: 5187.8 > reorder_pixels_sse2: 377.0 > reorder_pixels_avx2: 331.3 > > After: > reorder_pixels_c: 5181.5 > reorder_pixels_sse2: 377.0 > reorder_pixels_avx2: 313.8 > > I don't have the same result using a start/stop timer, but your testing approach is probably better than mine. And like, you both think it's a better way to do it, it's ok for me ! Thanks Martin
On 9/18/2017 4:35 AM, Martin Vignali wrote: > 2017-09-18 3:52 GMT+02:00 James Almer <jamrial@gmail.com>: > >> From: Henrik Gramner <henrik@gramner.com> >> >> Tested with "checkasm --test=exrdsp -bench" >> >> Before: >> reorder_pixels_c: 5187.8 >> reorder_pixels_sse2: 377.0 >> reorder_pixels_avx2: 331.3 >> >> After: >> reorder_pixels_c: 5181.5 >> reorder_pixels_sse2: 377.0 >> reorder_pixels_avx2: 313.8 >> >> I don't have the same result using a start/stop timer, > but your testing approach is probably better than mine. I also had a hard time getting to notice a difference with star/stop_timer, and it's clear why seeing how little difference this change truly makes. You can build the above tool with "make checkasm", and the executable will be in the tests/checkasm folder. The results tend to be less variable and it's better detecting small differences between functions. > And like, you both think it's a better way to do it, it's ok for me ! > > Thanks > > Martin > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >
> > I also had a hard time getting to notice a difference with > star/stop_timer, and it's clear why seeing how little difference this > change truly makes. > > You can build the above tool with "make checkasm", and the executable > will be in the tests/checkasm folder. The results tend to be less > variable and it's better detecting small differences between functions. > > <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel> > Thanks for the explanation, and the checkasm patch for exrdsp (i missed it before) I have similar result than you using checkasm, Martin
diff --git a/libavcodec/x86/exrdsp.asm b/libavcodec/x86/exrdsp.asm index b91a7be20d..06c629e59e 100644 --- a/libavcodec/x86/exrdsp.asm +++ b/libavcodec/x86/exrdsp.asm @@ -39,16 +39,15 @@ cglobal reorder_pixels, 3,4,3, dst, src1, size, src2 neg sizeq ; size = offset for dst, src1, src2 .loop: -%if cpuflag(avx2) - vpermq m0, [src1q + sizeq], 0xd8; load first part - vpermq m1, [src2q + sizeq], 0xd8; load second part -%else mova m0, [src1q+sizeq] ; load first part movu m1, [src2q+sizeq] ; load second part -%endif SBUTTERFLY bw, 0, 1, 2 ; interleaved - mova [dstq+2*sizeq ], m0 ; copy to dst - mova [dstq+2*sizeq+mmsize], m1 + mova [dstq+2*sizeq ], xm0 ; copy to dst + mova [dstq+2*sizeq+16], xm1 +%if cpuflag(avx2) + vperm2i128 m0, m0, m1, q0301 + mova [dstq+2*sizeq+32], m0 +%endif add sizeq, mmsize jl .loop RET