Message ID | 06309fa1-a0e9-9a79-5e87-c82a03bd16b5@sky.com |
---|---|
State | Superseded |
Headers | show |
> avctx->width << 1, avctx->height); > + } else if (frame->format == AV_PIX_FMT_RGBA || frame->format == > AV_PIX_FMT_RGB0) { > + av_image_copy_plane(buf, lockBufferParams->pitch, > + frame->data[0], frame->linesize[0], > + avctx->width << 2, avctx->height); > + } else if (frame->format == AV_PIX_FMT_BGRA || frame->format == > AV_PIX_FMT_BGR0) { > + av_image_copy_plane(buf, lockBufferParams->pitch, > + frame->data[0], frame->linesize[0], > + avctx->width << 2, avctx->height); > } else { These are identical, so please put them into one if. Also, why is the twist from AV_PIX_FMT_RGBA to NV_ENC_BUFFER_FORMAT_ABGR necessary? The nvenc header describes it as "8 bit Packed A8B8G8R8", so did they mess it up?
Hi! 2016-09-07 2:23 GMT+02:00 Sven C. Dack <sven.c.dack@sky.com>: > On 07/09/16 01:08, Sven C. Dack wrote: >> >> On 07/09/16 00:43, Carl Eugen Hoyos wrote: >>> >>> Should be AV_PIX_FMT_RGB0... As pointed out by Timo, this should be AV_PIX_FMT_0BGR32... >>> ... and AV_PIX_FMT_BGR0 ... and this AV_PIX_FMT_0RGB32: It makes no difference on little-endian, but matches the documentation. Sorry for not realizing this yesterday! > + AV_PIX_FMT_RGB0, > + AV_PIX_FMT_BGR0, > + AV_PIX_FMT_RGBA, > + AV_PIX_FMT_BGRA, In any case, please remove AV_PIX_FMT_RGBA and AV_PIX_FMT_BGRA, the encoder cannot deal with transparency. Carl Eugen
On 9/7/16, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > Hi! > > 2016-09-07 2:23 GMT+02:00 Sven C. Dack <sven.c.dack@sky.com>: >> On 07/09/16 01:08, Sven C. Dack wrote: >>> >>> On 07/09/16 00:43, Carl Eugen Hoyos wrote: >>>> >>>> Should be AV_PIX_FMT_RGB0... > > As pointed out by Timo, this should be AV_PIX_FMT_0BGR32... > >>>> ... and AV_PIX_FMT_BGR0 > > ... and this AV_PIX_FMT_0RGB32: It makes no difference on > little-endian, but matches the documentation. > Sorry for not realizing this yesterday! > >> + AV_PIX_FMT_RGB0, >> + AV_PIX_FMT_BGR0, > >> + AV_PIX_FMT_RGBA, >> + AV_PIX_FMT_BGRA, > > In any case, please remove AV_PIX_FMT_RGBA and AV_PIX_FMT_BGRA, > the encoder cannot deal with transparency. And how do you know that?
On 07/09/16 09:23, Timo Rothenpieler wrote: >> avctx->width << 1, avctx->height); >> + } else if (frame->format == AV_PIX_FMT_RGBA || frame->format == >> AV_PIX_FMT_RGB0) { >> + av_image_copy_plane(buf, lockBufferParams->pitch, >> + frame->data[0], frame->linesize[0], >> + avctx->width << 2, avctx->height); >> + } else if (frame->format == AV_PIX_FMT_BGRA || frame->format == >> AV_PIX_FMT_BGR0) { >> + av_image_copy_plane(buf, lockBufferParams->pitch, >> + frame->data[0], frame->linesize[0], >> + avctx->width << 2, avctx->height); >> } else { > These are identical, so please put them into one if. > > Also, why is the twist from AV_PIX_FMT_RGBA to NV_ENC_BUFFER_FORMAT_ABGR > necessary? > > The nvenc header describes it as "8 bit Packed A8B8G8R8", so did they > mess it up? It is necessary in order to make it work. The twist here is intentional as I pointed out earlier. If you do it the other way around as described in the documentation then you get false and missing colours. I'd like to keep in the transparency channel unless you know there is an actual problem with it. The encoder may not use it, but it is no reason not to pass it on. Otherwise will RGBA/BGRA have to be converted into RGB0/BGR0 and you will again get a performance penalty. Sven
Hi! > Am 07.09.2016 um 11:36 schrieb Paul B Mahol <onemda@gmail.com>: > >> On 9/7/16, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >> Hi! >> >> 2016-09-07 2:23 GMT+02:00 Sven C. Dack <sven.c.dack@sky.com>: >>>> On 07/09/16 01:08, Sven C. Dack wrote: >>>> >>>>> On 07/09/16 00:43, Carl Eugen Hoyos wrote: >>>>> >>>>> Should be AV_PIX_FMT_RGB0... >> >> As pointed out by Timo, this should be AV_PIX_FMT_0BGR32... >> >>>>> ... and AV_PIX_FMT_BGR0 >> >> ... and this AV_PIX_FMT_0RGB32: It makes no difference on >> little-endian, but matches the documentation. >> Sorry for not realizing this yesterday! >> >>> + AV_PIX_FMT_RGB0, >>> + AV_PIX_FMT_BGR0, >> >>> + AV_PIX_FMT_RGBA, >>> + AV_PIX_FMT_BGRA, >> >> In any case, please remove AV_PIX_FMT_RGBA and AV_PIX_FMT_BGRA, >> the encoder cannot deal with transparency. > > And how do you know that? If the encoder would deal with transparency (which it does not) RGB0 would be wrong but it seems everybody agrees it is correct. Carl Eugen
>> Also, why is the twist from AV_PIX_FMT_RGBA to NV_ENC_BUFFER_FORMAT_ABGR >> necessary? >> >> The nvenc header describes it as "8 bit Packed A8B8G8R8", so did they >> mess it up? > > It is necessary in order to make it work. The twist here is intentional > as I pointed out earlier. If you do it the other way around as described > in the documentation then you get false and missing colours. Carl already pointed you to the correct, native-endian pixel formats, which match with the nvenc documentation: https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/pixfmt.h#L320 > I'd like to keep in the transparency channel unless you know there is an > actual problem with it. The encoder may not use it, but it is no reason > not to pass it on. Otherwise will RGBA/BGRA have to be converted into > RGB0/BGR0 and you will again get a performance penalty. NVENC itself lists the alpha channel. So keeping it should be fine and save a conversion.
Hi! > Am 07.09.2016 um 11:40 schrieb "Sven C. Dack" <sven.c.dack@sky.com>: > > On 07/09/16 09:23, Timo Rothenpieler wrote: > Otherwise will RGBA/BGRA have to > be converted into RGB0/BGR0 > and you will again get a performance penalty. What makes you think so? (The problem is that your encoder now reports "I support transparency" and this will lead to funny effects depending in the colour space of your input file.) Please also implement my other suggestion to make documentation and implementation match (and to match existing and new colour spaces, there is already an endianness-depending pix_fmt). Carl Eugen
On 07/09/16 11:25, Carl Eugen Hoyos wrote: > Hi! > >> Am 07.09.2016 um 11:40 schrieb "Sven C. Dack" <sven.c.dack@sky.com>: >> >> On 07/09/16 09:23, Timo Rothenpieler wrote: >> Otherwise will RGBA/BGRA have to >> be converted into RGB0/BGR0 >> and you will again get a performance penalty. > What makes you think so? I have tested it. What makes you think it wouldn't? > (The problem is that your encoder now reports "I support transparency" and this will lead to funny effects depending in the colour space of your input file.) No, it doesn't. The encoder does deal with transparency. It effectively ignores it. So it is save to pass it on without FFmpeg needing to set the value to 0 but let the encoder do it. > > Please also implement my other suggestion to make documentation and implementation match (and to match existing and new colour spaces, there is already an endianness-depending pix_fmt). I have no idea what you are talking about. If you want to implement a new pixel format and colour space for what seems to be a mere twist in Nvidia's documentation then you are welcome to do so. I won't. Sven
2016-09-07 12:50 GMT+02:00 Sven C. Dack <sven.c.dack@sky.com>: > On 07/09/16 11:25, Carl Eugen Hoyos wrote: >> >>> Am 07.09.2016 um 11:40 schrieb "Sven C. Dack" <sven.c.dack@sky.com>: >>> >>> On 07/09/16 09:23, Timo Rothenpieler wrote: >>> Otherwise will RGBA/BGRA have to >>> be converted into RGB0/BGR0 >>> and you will again get a performance penalty. >> >> What makes you think so? > > I have tested it. What makes you think it wouldn't? This is a bug that should be fixed independently. >> (The problem is that your encoder now reports "I support >> transparency" and this will lead to funny effects depending >> in the colour space of your input file.) > > No, it doesn't. If your patch adds "RGBA" as supported colour space, the encoder (by definition) announces that it supports transparency. nvenc does not support transparency (if it would, it would be a - grave - bug to use "RGB0" in the patch). So please do not add it. Carl Eugen
Am 07.09.2016 um 13:27 schrieb Carl Eugen Hoyos: > 2016-09-07 12:50 GMT+02:00 Sven C. Dack <sven.c.dack@sky.com>: >> On 07/09/16 11:25, Carl Eugen Hoyos wrote: >>> >>>> Am 07.09.2016 um 11:40 schrieb "Sven C. Dack" <sven.c.dack@sky.com>: >>>> >>>> On 07/09/16 09:23, Timo Rothenpieler wrote: >>>> Otherwise will RGBA/BGRA have to >>>> be converted into RGB0/BGR0 >>>> and you will again get a performance penalty. >>> >>> What makes you think so? >> >> I have tested it. What makes you think it wouldn't? > > This is a bug that should be fixed independently. libavutil/pixfmt.h defines AV_PIX_FMT_RGB0 and the other ones like this: packed RGB 8:8:8, 32bpp, XRGBXRGB... X=unused/undefined So I would expect the Alpha-Channel to be anything, and converting from RGBA to RGB0 to be a no-op "conversion".
On 07/09/16 12:40, Timo Rothenpieler wrote: > libavutil/pixfmt.h defines AV_PIX_FMT_RGB0 and the other ones like this: > > packed RGB 8:8:8, 32bpp, XRGBXRGB... X=unused/undefined > > So I would expect the Alpha-Channel to be anything, and converting from > RGBA to RGB0 to be a no-op "conversion". It is not an issue. x11grab produces BGR0 and nvenc can handle it with the patch. It's giving me 100fp/s (up from 47fp/s) with a 1920x1080 monitor. I'd imagine people with 4K displays will be happy, too, although they will have to live with lower speeds of perhaps 30 fp/s. Would be interesting to know how it performs on 4K though. If there is really an RGBA/BGRA input then it needs to be convert to RGB0/BGR0. Until then is it a theoretical issue. Might be the module producing RGBA/BGRA can produce RGB0/BGR0, too. Sven
Am 07.09.2016 um 15:26 schrieb Sven C. Dack: > On 07/09/16 12:40, Timo Rothenpieler wrote: >> libavutil/pixfmt.h defines AV_PIX_FMT_RGB0 and the other ones like this: >> >> packed RGB 8:8:8, 32bpp, XRGBXRGB... X=unused/undefined >> >> So I would expect the Alpha-Channel to be anything, and converting from >> RGBA to RGB0 to be a no-op "conversion". > > It is not an issue. x11grab produces BGR0 and nvenc can handle it with > the patch. It's giving me 100fp/s (up from 47fp/s) with a 1920x1080 > monitor. I'd imagine people with 4K displays will be happy, too, > although they will have to live with lower speeds of perhaps 30 fp/s. > Would be interesting to know how it performs on 4K though. > > If there is really an RGBA/BGRA input then it needs to be convert to > RGB0/BGR0. Until then is it a theoretical issue. Might be the module > producing RGBA/BGRA can produce RGB0/BGR0, too. 0RGB/0BGR does not mean the alpha bits are zeroed. It means they are undefined, so you convert from ARGB to 0RGB by doing nothing. There is no performance to gain by supporting a format that falsely advertises support for an alpha channel. Also, the correct formats to use are AV_PIX_FMT_0RGB32, which corresponds to NV_ENC_BUFFER_FORMAT_ARGB, and AV_PIX_FMT_0BGR32 for ABGR. Will apply with those. For the future, please use git format-patch, and ideally also git send-email for your patches. Attaching the patches is just fine though, preferably only one per mail for patchwork to pick it up cleanly.
Sven C. Dack wrote: > It is not an issue. x11grab produces BGR0 and nvenc can handle it > with the patch. It's giving me 100fp/s (up from 47fp/s) with a > 1920x1080 monitor. I'd imagine people with 4K displays will be happy, > too, although they will have to live with lower speeds of perhaps 30 > fp/s. Would be interesting to know how it performs on 4K though. Maybe xrandr --output <output-name> --fb 3840x2160 --panning 3840x2160 I don't know what it is about x11grab/CSC with ffmpeg, but on my old CPU gstreamer is twice as fast.
2016-09-08 12:01 GMT+02:00 Andy Furniss <adf.lists@gmail.com>: > I don't know what it is about x11grab/CSC with ffmpeg, but > on my old CPU gstreamer is twice as fast. x11grab or xcb? Carl Eugen
Carl Eugen Hoyos wrote: > 2016-09-08 12:01 GMT+02:00 Andy Furniss <adf.lists@gmail.com>: > >> I don't know what it is about x11grab/CSC with ffmpeg, but >> on my old CPU gstreamer is twice as fast. > > x11grab or xcb? I guess xcb as I don't --enable anything related with configure.
2016-09-08 16:17 GMT+02:00 Andy Furniss <adf.lists@gmail.com>: > Carl Eugen Hoyos wrote: >> >> 2016-09-08 12:01 GMT+02:00 Andy Furniss <adf.lists@gmail.com>: >> >>> I don't know what it is about x11grab/CSC with ffmpeg, but >>> on my old CPU gstreamer is twice as fast. >> >> x11grab or xcb? > > I guess xcb as I don't --enable anything related with configure. Then please test if this is a regression over x11grab, use: configure --disable-libxcb --enable-x11grab --enable-gpl (It is very likely that nobody ever tested.) Carl Eugen
On 08/09/16 15:17, Andy Furniss wrote: > Carl Eugen Hoyos wrote: >> 2016-09-08 12:01 GMT+02:00 Andy Furniss <adf.lists@gmail.com>: >> >>> I don't know what it is about x11grab/CSC with ffmpeg, but >>> on my old CPU gstreamer is twice as fast. >> >> x11grab or xcb? > > I guess xcb as I don't --enable anything related with configure. In case it got mistaken... The following is what gives the 100 fp/s (with a GTX 960): $ ffmpeg -f x11grab -framerate 200 -s hd1080 -i :0.0 -c:v hevc_nvenc -y test.mkv The mentioned 100 fp/s refers to the HEVS/H.265 hardware encoder (nvenc). X11grab is here only the input device. Without the encoder does it give me 160 fp/s: $ ffmpeg -f x11grab -s hd1080 -framerate 500 -i :0.0 -f null - Different sizes then give different results: - For hd720 is it 200 fp/s (with nvenc) and 420 fp/s (without nvenc). - For hd480 is it 460 fp/s (with nvenc) and 890 fp/s (without nvenc). Can you compare gstreamer to those numbers? Is anyone also capable of running this in a true 4K display with a Pascal card? There the speed matters the most. Sven
2016-09-08 16:52 GMT+02:00 Sven C. Dack <sven.c.dack@sky.com>:
> X11grab is here only the input device.
Of course.
I know that and I assume Andy also knows.
Still Andy claims that (our) xcb input is slower than GStreamer's x11
input and I would like him to test if the reason is our change to xcb.
Carl Eugen
Carl Eugen Hoyos wrote: > 2016-09-08 16:17 GMT+02:00 Andy Furniss <adf.lists@gmail.com>: >> Carl Eugen Hoyos wrote: >>> >>> 2016-09-08 12:01 GMT+02:00 Andy Furniss <adf.lists@gmail.com>: >>> >>>> I don't know what it is about x11grab/CSC with ffmpeg, but >>>> on my old CPU gstreamer is twice as fast. >>> >>> x11grab or xcb? >> >> I guess xcb as I don't --enable anything related with configure. > > Then please test if this is a regression over x11grab, use: > configure --disable-libxcb --enable-x11grab --enable-gpl > > (It is very likely that nobody ever tested.) It is a bit faster, comparing results I posted on users to new build - (vsync 2 for consistency as that's how the old tests were done it isn't really needed/makes no difference for -f null but was used as I previously also tested -f rawvideo -y /dev/null and when using -framerate with that it is essential). xcb -> x11grab. ffmpeg -vsync 2 -framerate 200 -f x11grab -s 1920x1080 -i :0.0 -vframes 1000 -f null - 138 -> 179 fps ffmpeg -vsync 2 -framerate 200 -f x11grab -s 1920x1080 -i :0.0 -vframes 1000 -pix_fmt yuv420p -f null - 47 -> 56 fps ffmpeg -vsync 2 -framerate 200 -f x11grab -s 1920x1080 -i :0.0 -vframes 1000 -pix_fmt nv12 -f null - 33 -> 37 fps
2016-09-08 19:16 GMT+02:00 Andy Furniss <adf.lists@gmail.com>:
> It is a bit faster
xcb is slower?
Carl Eugen
Carl Eugen Hoyos wrote: > 2016-09-08 19:16 GMT+02:00 Andy Furniss <adf.lists@gmail.com>: > >> It is a bit faster > > xcb is slower? Yes (unless I am mixing things up) --disable-libxcb --enable-x11grab is faster than autodetect
Le tridi 23 fructidor, an CCXXIV, Carl Eugen Hoyos a écrit :
> xcb is slower?
Looking at the code, I see that xcbgrab allocates and frees the shared
memory segment for each frame, while x11grab only allocates it once at the
beginning. xcbgrab does that so that it can wrap the segment in a refcounted
buffer.
On the other hand, x11grab seems to be returning the shared memory segment
as the packet payload as is. If I am not mistaken, this is just invalid, and
works just by luck. It could be tested with the following procedure: grab a
packet, dump it, grab a second packet, dump the first one again.
Still, it is probably possible to do something faster than xcbgrab while
still correct: only allocate a new segment if the first one is still in use.
Regards,
Sven C. Dack wrote: > On 08/09/16 15:17, Andy Furniss wrote: >> Carl Eugen Hoyos wrote: >>> 2016-09-08 12:01 GMT+02:00 Andy Furniss <adf.lists@gmail.com>: >>> >>>> I don't know what it is about x11grab/CSC with ffmpeg, but >>>> on my old CPU gstreamer is twice as fast. >>> >>> x11grab or xcb? >> >> I guess xcb as I don't --enable anything related with configure. > > In case it got mistaken... > > The following is what gives the 100 fp/s (with a GTX 960): > $ ffmpeg -f x11grab -framerate 200 -s hd1080 -i :0.0 -c:v hevc_nvenc -y > test.mkv > > The mentioned 100 fp/s refers to the HEVS/H.265 hardware encoder > (nvenc). X11grab is here only the input device. > > Without the encoder does it give me 160 fp/s: > $ ffmpeg -f x11grab -s hd1080 -framerate 500 -i :0.0 -f null - > > Different sizes then give different results: > > - For hd720 is it 200 fp/s (with nvenc) and 420 fp/s (without nvenc). > - For hd480 is it 460 fp/s (with nvenc) and 890 fp/s (without nvenc). > > Can you compare gstreamer to those numbers? With gstreamer 1080p I can get around 350 fps testing like - gst-launch-1.0 -f ximagesrc use-damage=0 startx=0 starty=0 endx=1919 endy=1079 num-buffers=5000 ! queue ! videoconvert ! video/x-raw,framerate=500/1,format=BGRx ! fakesink Setting pipeline to PAUSED ... Pipeline is live and does not need PREROLL ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:14.205120141 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... 5000/14.205120141 = 351.98.... Of course I don't know if that's "real" as such. I do know that I have really grabbed and encoded 1080p60 with my AMD h/w and including nv12 conversion gives a sane looking result - gst-launch-1.0 -f ximagesrc use-damage=0 startx=0 starty=0 endx=1919 endy=1079 num-buffers=1000 ! queue ! videoconvert ! video/x-raw,framerate=100/1,format=NV12 ! fakesink Setting pipeline to PAUSED ... Pipeline is live and does not need PREROLL ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:14.419928745 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... 1000/14.419928745 = 69.3.... I also tried (fake) 2160p and got around 90fps BGRx and 17 nv12. My PC is old = Phenom II x4 965 3.4GHz Cpus, 1333 ram.
Andy Furniss wrote: > With gstreamer 1080p I can get around 350 fps testing like - > > gst-launch-1.0 -f ximagesrc use-damage=0 startx=0 starty=0 endx=1919 > endy=1079 num-buffers=5000 ! queue ! videoconvert ! > video/x-raw,framerate=500/1,format=BGRx ! fakesink Setting pipeline > to PAUSED ... Pipeline is live and does not need PREROLL ... Setting > pipeline to PLAYING ... New clock: GstSystemClock Got EOS from > element "pipeline0". Execution ended after 0:00:14.205120141 Setting > pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline > to NULL ... Freeing pipeline ... > > 5000/14.205120141 = 351.98.... > > Of course I don't know if that's "real" as such. Looking into this more with sysprof it seems with this test gstreamer is twice as fast because it doesn't copy from shm but ffmpeg does. Of course I may be misunderstanding, but >90% load according to sysprof is __memcpy_sse2_unaligned. ffmpeg is not using xcb for this test and like this both ffmpeg and gstreamer are using XShmGetImage. With gstreamer all calls to __memcpy_sse2_unaligned are from X with ffmpeg half are from X and half ffmpeg. OK I think, maybe the test was flawed, so I changed fakesink to filesink location=/mnt/ramdisk/out.bgr0 where ramdisk is tmpfs. Sure enough it's slower = 188fps which is more like ffmpeg (179 fps with -f null -). Unfortunately if I force ffmpeg to really output to tmpfs using -f rawvideo -y /mnt/ramdisk/out.bgr0 I only get 68fps so I am still slower. FWIW -f rawvideo -y /dev/null gives 90fps.
Le quintidi 25 fructidor, an CCXXIV, Andy Furniss a écrit : > Looking into this more with sysprof it seems with this test gstreamer is > twice as fast because it doesn't copy from shm but ffmpeg does. > > Of course I may be misunderstanding, but >90% load according to sysprof > is __memcpy_sse2_unaligned. Can you detect what call to memcpy() in the ffmpeg code is responsible for this? I had an inkling that it was about the packet -> frame conversion, but it seems rawdec does the right thing in this particular case. Still, I suggest you add a quick debug av_log() in libavcodec/rawdec.c to check that need_copy is false. Also, did you have a look at my last mail on the topic: http://ffmpeg.org/pipermail/ffmpeg-devel/2016-September/199156.html about the speed difference between xcb and xlib? Regards,
On 10/09/16 18:35, Andy Furniss wrote: > Andy Furniss wrote: > >> With gstreamer 1080p I can get around 350 fps testing like - >> >> gst-launch-1.0 -f ximagesrc use-damage=0 startx=0 starty=0 endx=1919 >> endy=1079 num-buffers=5000 ! queue ! videoconvert ! >> video/x-raw,framerate=500/1,format=BGRx ! fakesink Setting pipeline >> to PAUSED ... Pipeline is live and does not need PREROLL ... Setting >> pipeline to PLAYING ... New clock: GstSystemClock Got EOS from >> element "pipeline0". Execution ended after 0:00:14.205120141 Setting >> pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline >> to NULL ... Freeing pipeline ... >> >> 5000/14.205120141 = 351.98.... >> >> Of course I don't know if that's "real" as such. > > Looking into this more with sysprof it seems with this test gstreamer is > twice as fast because it doesn't copy from shm but ffmpeg does. > > Of course I may be misunderstanding, but >90% load according to sysprof > is __memcpy_sse2_unaligned. > > ffmpeg is not using xcb for this test and like this both ffmpeg and > gstreamer are using XShmGetImage. > > With gstreamer all calls to __memcpy_sse2_unaligned are from X > with ffmpeg half are from X and half ffmpeg. > > OK I think, maybe the test was flawed, so I changed fakesink to filesink > location=/mnt/ramdisk/out.bgr0 where ramdisk is tmpfs. Sure enough it's > slower = 188fps which is more like ffmpeg (179 fps with -f null -). > > Unfortunately if I force ffmpeg to really output to tmpfs using > -f rawvideo -y /mnt/ramdisk/out.bgr0 I only get 68fps so I am still slower. > > FWIW -f rawvideo -y /dev/null gives 90fps. I'd be more interested in grabbing the desktop synchronous to the display refresh rate at this point. 60Hz is a key mark, going above it not so much. What would be the next mark anyway? 100Hz?
Nicolas George wrote: > Le quintidi 25 fructidor, an CCXXIV, Andy Furniss a écrit : >> Looking into this more with sysprof it seems with this test >> gstreamer is twice as fast because it doesn't copy from shm but >> ffmpeg does. >> >> Of course I may be misunderstanding, but >90% load according to >> sysprof is __memcpy_sse2_unaligned. > > Can you detect what call to memcpy() in the ffmpeg code is > responsible for this? Not with sysprof unfortunately. > I had an inkling that it was about the packet -> frame conversion, > but it seems rawdec does the right thing in this particular case. > Still, I suggest you add a quick debug av_log() in > libavcodec/rawdec.c to check that need_copy is false. I shall have go later this evening. > Also, did you have a look at my last mail on the topic: > http://ffmpeg.org/pipermail/ffmpeg-devel/2016-September/199156.html > about the speed difference between xcb and xlib? I did and thanks for taking an interest - as I don't really know much about the code I thought that one was more for Carl Eugen :-) But yes I get the version I am currently testing is unsafe - just it seems to be more like gstreamer in using XShmGetImage so I thought the sysprofs would be easier to compare.
Sven C. Dack wrote: > I'd be more interested in grabbing the desktop synchronous to the > display refresh rate at this point. 60Hz is a key mark, going above > it not so much. What would be the next mark anyway? 100Hz? That would be good - I am in a different situation to you though, in that my h/w won't currently take RGB anyway. Benching bgr0 is just a starting point. With nv12 conversion I can't do 60fps with ffmpeg but can with gstreamer. As for monitors - there are plenty that do 144Hz marketed at gamers and even at 60Hz there's increasing use of 1440p and 2160p screens to consider.
Andy Furniss wrote: > I do know that I have really grabbed and encoded 1080p60 with my AMD > h/w and including nv12 conversion gives a sane looking result - > > gst-launch-1.0 -f ximagesrc use-damage=0 startx=0 starty=0 endx=1919 > endy=1079 num-buffers=1000 ! queue ! videoconvert ! > video/x-raw,framerate=100/1,format=NV12 ! fakesink Setting pipeline > to PAUSED ... Pipeline is live and does not need PREROLL ... Setting > pipeline to PLAYING ... New clock: GstSystemClock Got EOS from > element "pipeline0". Execution ended after 0:00:14.419928745 Setting > pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline > to NULL ... Freeing pipeline ... > > 1000/14.419928745 = 69.3 Over the weekend I looked at the CSC aspect of this without using x11grab = benching bgr0 on tmpfs to nv12 and managed with a bit of luck to get ffmpeg to beat gstreamer. Starting point gstreamer bgr0 to nv12 = 70fps, to I420 68fps. ffmpeg benched using -f null as -f rawvideo to ram or /dev/null is slower and I suspect/hope for my intended usage = vaapi upload -f null will be more representative, but of course I don't know that. ffmpeg -f rawvideo -s 1920x1080 -pix_fmt bgr0 -i /mnt/ramdisk/out.bgr0 -pix_fmt nv12 -f null - =41 fps, yuv420p = 66fps So yuv420p is close to gstreamer but nv12 is poor. By chance I wondered how much worse it would be if I used -sws_flags as I have done in the past. Result it was faster, it turns out that +full_chroma_inp takes yuv420p from 66 to 84fps and nv12 to 47fps. The reason being that with no flags time is spent in bgr32toUV_half_c with flag above I don't use that and see various sse in use like ff_rgbatoUV_sse2. nv12 is still too slow though. Looking with sysprof I see that time is spent in yuv2nv12cX_c. Seemed slow when remembering yuv420p -> nv12 conversions from the past so I benched 1080p yuv420p -> nv12 and got > 500fps. Doing this didn't use yuv2nv12cX_c at all so I got to make a new command line - ffmpeg -f rawvideo -s 1920x1080 -pix_fmt bgr0 -i /mnt/ramdisk/out.bgr0 -vf scale=flags=+full_chroma_inp,format=yuv420p,format=nv12 -f null - = 78fps, nice. So at least I can beat gstreamer on CSC now. Testing the new commandline with x11grab gets me close to gst using the legacy x11grab = 65 fps. libxcb x11grab is 52 fps though, so it would be good if that can be fixed up.
Hi! 2016-09-08 19:34 GMT+02:00 Nicolas George <george@nsup.org>: > Still, it is probably possible to do something faster than xcbgrab while > still correct: only allocate a new segment if the first one is still in use. Will you try this? Thank you, Carl Eugen
--- a/libavcodec/nvenc.c +++ b/libavcodec/nvenc.c @@ -81,6 +81,10 @@ const enum AVPixelFormat ff_nvenc_pix_fmts[] = { AV_PIX_FMT_P010, AV_PIX_FMT_YUV444P, AV_PIX_FMT_YUV444P16, + AV_PIX_FMT_RGB0, + AV_PIX_FMT_BGR0, + AV_PIX_FMT_RGBA, + AV_PIX_FMT_BGRA, #if CONFIG_CUDA AV_PIX_FMT_CUDA, #endif @@ -1032,6 +1036,16 @@ static av_cold int nvenc_alloc_surface(AVCodecContext *avctx, int idx) ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YUV444_10BIT; break; + case AV_PIX_FMT_RGB0: + case AV_PIX_FMT_RGBA: + ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_ABGR; + break; + + case AV_PIX_FMT_BGR0: + case AV_PIX_FMT_BGRA: + ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_ARGB; + break; + default: av_log(avctx, AV_LOG_FATAL, "Invalid input pixel format\n");