![]() |
|
#1
|
||||
|
||||
|
It's well-known that StretchBlt is slow, especially in halftone mode. What's less well-known is that it can bring Windows to its knees. You might find this incredible, but it's easy to prove. I made a test application that repeatedly calls StretchBlt to resize a bitmap from 6000 x 4500 down to 200 x 150 in halftone mode. The source bitmap is constant, and both the source and the destination are memory device contexts, i.e. the test does not repaint any windows. Running this test severely degrades the performance of all other applications, including the Task Manager, regardless of process priorities, and despite plenty of idle cores. Applications repaint themselves belatedly or not at all, and video applications drop their frame rates to nearly zero.
Why is this so? It turns out that every GDI operation involves acquiring a system-wide lock, called the GDI lock. This strategy works well enough provided GDI operations take very little time, which is usually the case. However as soon as one process hogs the lock by doing long GDI operations, all other processes are screwed, because applications typically spend most of their time updating their windows, and windows can only be updated via GDI calls. The GDI lock is a system-wide bottleneck that potentially reduces Windows to cooperative multitasking, which fails when one process doesn't cooperate. But why is a lock required at all when the application is only resizing bitmaps in memory? How can this possibly affect other applications? The issue is that all GDI objects including device contexts, pens, brushes and so on are maintained at the system level, not per-process. The purpose of the GDI lock is to protect GDI objects and attributes from being corrupted by simultaneous modification from multiple threads. In other words, even though my test application's source and destination bitmaps are in memory and invisible to other processes, the StretchBlt has to be serialized anyway because it potentially affects GDI state and GDI state is global. Microsoft has known about this issue all along, though they didn't publicize it for obvious reasons. They finally got around to doing something about it in Windows 7, however it seems they were unable to get rid of the GDI lock altogether, so instead they substituted a large number of finer-grained locks for the one monolithic GDI lock. In theory this might help, but it remains to be seen whether it fixes the pathological case I'm describing. The consensus seems to be that Window 7 generally exhibits poor 2D performance compared to XP, and my limited testing bears this out. This issue has serious implications for FFRend. It means that the Monitor bar potentially limits FFRend's overall throughput, because the monitor window's StretchBlt can block the rendering thread from blitting to the output window, particularly for large (HD or higher) frame sizes and smooth (halftone) monitor quality. Apparently DirectDraw also has to acquire the GDI lock, even if only Blt is called (as opposed to GetDC/ReleaseDC), because the issue occurs even though the output window uses DirectDraw instead of GDI, and even in full-screen exclusive mode. This last point is especially egregious. The whole point of full-screen exclusive mode is that the system allows one window to not cooperate with other windows on a given monitor, because the other windows will be covered anyway. Incredibly, my StretchBlt test starves FFRend even when FFRend is in full-screen exclusive mode and covering the test application's window. This seems totally wrong to me. The only solution I can see is to avoid StretchBlt, but this means FFRend has to include its own bitmap resizing code. Why not roll my own file system too while I'm at it? Bitmap resizing is no joke when quality and performance are both goals. Bilinear interpolation is easy enough but image quality degrades as the size difference between the source and destination bitmaps increases. Bicubic behaves better but it's slow and complicated and full of floating point. It would probably have to be implemented in SSE2 assembler to perform well enough. And of course it would be nice if the code supported all the Freeframe bitmap formats, i.e. not only 32-bit but 24-bit and 5-6-5 too. Right. Another possibility would be to still use StretchBlt, but pre-scale the frames before feeding them to StretchBlt, maybe only if the difference between the frame size and the monitor window size exceeds a certain threshold. Implementing a 2x2 averaging down-sample is a relatively simple matter. Of course this plan only works if the frame size is divisible by two in both axes (usually true), and the resulting image quality remains to be seen. The test source code is available here: http://whorld.org/ffrend/temp/StretchBltTest.zip Microsoft's admission of the problem can be found in an obscure MSDN blog post (Engineering Windows 7 Graphics Performance): http://blogs.msdn.com/b/e7/archive/2...rformance.aspx Last edited by victimofleisure; 6th June 2012 at 04:15 AM. |
|
#2
|
||||
|
||||
|
I assume using a modern technology like Direct2D for the last display stage instead of StretchBlt is out of the question?
If so why? Pretty sure we have come across this limitation in OpenTzT, sadly there is basically no more development going on on OpenTzT, so the move to more modern tech like Direct2D aint gonna happen. Sort of frustrating and stupid that old apps run slower in windows7 than on XP, but then again, i don't think one should really complain about MS maintaining application compatibility for 15+ years. |
|
#3
|
||||
|
||||
|
can't you use direct3d or openGL? draw your 2d texture with the old crappy 2d functions you seem to like, then write to a 3d texture where you can display, scale, add FFGL effects and do whatever else you may wish to try with a modern approach.
__________________
Putting the cross into crossplatform www.vjstore.org Free Clips!! AVHire.net Equipment Rental for VJs by VJs |
|
#4
|
||||
|
||||
|
It appears you get your jollies from insulting people. How pathetic. Your aim's off in this case because I couldn't care less what you think of my taste in functions.
|
|
#5
|
||||
|
||||
|
???
I was not insulting you, i was insulting the GDI. It is both crappy (in performance terms - isn't this what this thread is about?) and old. But the suggestion remains valid. Draw to GDI bitmap and then copy to 3d land where resizing and loads of other functions which are impossibly slow in 2d graphics al become very easily handled by the GPU.
__________________
Putting the cross into crossplatform www.vjstore.org Free Clips!! AVHire.net Equipment Rental for VJs by VJs |
|
#6
|
|||
|
|||
|
so you want to rescale a 6000 x 4500 bitmap every frame? why? its not going to change between frames, rescale it once. cache it.
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|