Hi! In this article I want to show how Intel GPA can be useful for flashers who works with Stage3D.
Intel GPA is a tool for those who really cares about performance and quality their products. And this tool gives me inspiration, inspiration to learn something new, for example, I can run games or 3D-applications and see how it works, new techniques, ingenious solutions. I can scrabble inside day in and day out. I am crazy about this and have few articles where Intel GPA was used: “L.A. Noire”, “UnrealEngine”
When you develop something new, you need tool to play, to play with drawcalls, render states, sampler states, see performance, and starting out from this you can make more effective decisions. It’s awesome!
With this tool you can easy to analyze graphics pipeline, discover most expensive drawcalls, see DirectX API calls, change states, show pixel history, work with renderTargets, see detailed activity of entire frame or selected drawcalls, change textures, shaders, see shader constants!, see scene overdraw, do experiments and see new performance of your entire frame and many other things.
When you work with Intel GPA, you need to know Erg term. The term “erg” refers to any work item within that frame that potentially renders pixels, which includes draw calls, clears, and other graphics API calls.
Stencil and Depth buffers
I have CSG example which works on GPU, I don’t share it, because one hole takes minimum 5 drawcalls(4 drawcalls for hole and 1 for depth buffer normalize). And It’s hard to write correct depth buffer to insert it in my scene render pipeline.
Ok, I have next render:
And through all development process I’ve used IntelGPA to view my depth and stencil buffer and play with render states:
How you can see, there are no hole in the depth. I’ve cleared depth buffer and wrote wall in depth.
API Log displays a summary of all Microsoft DirectX API calls for the items in your erg selection set; especially useful for tracking down “expensive” ergs by seeing what API calls are within one or more ergs.
I had very interesting situation. All day long I’ve tried to fix a bug. The problem was in that I set drawcall(upload programs, upload index and vertex buffers, textures) but I don’t see the result on my screen after compile. With IntelGPA I’ve detected that my drawcall is absent in DirectX API calls. And then I suddenly thought of enableErrorChaking property. My drawcall failed on validation in separate render thread and when enableErrorChaking is set to false, I can’t get feedback from validation process. In the end it emerged that my vertex buffer is wrong.
2) Running Experiments:
Use the 2×2 Textures override mode to help identify potential performance bottlenecks in your use of texture maps within the application. All textures for a scene are replaced with simple 2×2 pixel textures. Usually the Intel GPA Frame Analyzer uses a simple halftone or a colorized bitmap for this option. If using this override mode significantly improves the frame rate, then the GPU may be thrashing while loading texture maps from the CPU instead of using a cached version of that texture map from the GPU. If the total size of your texture maps is high for a scene, consider reducing one of the texture maps so that all the texture maps fit into the GPU’s texture cache for that scene.
1×1 Scissor Rect
The 1×1 Scissor Rect override mode is a DirectX* API override. However, the implementation of this override mode is highly dependent upon your specific graphics configuration; in particular, scissoring may occur either before or after the pixel shader stage.
Simple Pixel Shader
The Simple Pixel Shader experiment replaces the pixel shaders in your frame with a simple pixel shader, which writes a constant color to the render target for every selected erg. If the frame rate significantly decreases as a result of this experiment, you may want to perform further analysis of your shaders to see whether you can reduce rendering time without detracting from the visual quality of your scene. Enabling this experiment for ergs that do not reference a pixel shader in your original scene may actually result in a slower rendering time when using this override mode. This may seem counter-intuitive, but realize that all ergs are now forced to use a pixel shader, and this pixel shader may be slower than the fixed function shader that is used when an erg does not have an explicit pixel shader associated with it.
Use this option to keep the selected ergs from being rendered. Use this option to test scene efficiency.
3) Modifying the Shader Code and also you can see shader constants!
For example, passing m4x4 to vertex shader
4) Determining whether Texture Bandwidth is a Performance Bottleneck
If reducing the mip level for the selected ergs significantly reduces the rendering time, then texture bandwidth may be a bottleneck, and if the change in visual quality is acceptable, you can gain speed by using a smaller texture.
If there is little or no visual difference in the scene when using a smaller mip level, then you are wasting texture bandwidth for this scene, and you can regain some texture memory by using a smaller texture. This usually happens when geometry is rendered smaller than expected. The hardware (in standard filtering modes) always chooses the best match for the screen pixels being displayed. So, the top-level mip will not be used if the geometry being rendered is significantly smaller than the resolution of the texture.
5) Minimizing Overdraw
If you see that two ergs affecting the pixel are of the same type, you may improve performance by removing one of them. To see how much performance you can gain by removing the erg, select it, and check the Disable Erg(s) check box in the Experiments tab.
6) Use Intel GPA System Analyzer in realtime!
I always use this feature when I work with dynamic geometry or work on particle system engine. I can see:
* Resource Creations, Index Buffer Creations, Vertex Buffer Creations, Texture Creations, Shader Creations
* Locks and locks time
* setRenderTarget calls per frame
How to start use for Stage3D
1) Download Intel GPA.
It contains: Monitor, Frame Analyzer, Platform Analyzer, System Analyzer.
2) Run Intel GPA Monitor and enable “Auto-detect launched applications” in preferences
3) Just compile and run your swf and you will see:
Press Ctrl+F1 to switch modes, and you will see:
You can run System Analyzer and see runtime performance changes by switching states.
4) Capture frame by pressing Ctrl+Shift+C
5) Now you can close your application, run Frame Analyzer
6) Select your captured frame and start
7) Use online help
Thanks for reading!