Intel GPA for Stage3D developers

Hi! In this article I want to show how Intel GPA can be useful for flashers who works with Stage3D.


Intel GPA is a tool for those who really cares about performance and quality their products. And this tool gives me inspiration, inspiration to learn something new, for example, I can run games or 3D-applications and see how it works, new techniques, ingenious solutions. I can scrabble inside day in and day out. I am crazy about this and have few articles where Intel GPA was used: “L.A. Noire”, “UnrealEngine”

When you develop something new, you need tool to play, to play with drawcalls, render states, sampler states, see performance, and starting out from this you can make more effective decisions. It’s awesome!

With this tool you can easy to analyze graphics pipeline, discover most expensive drawcalls, see DirectX API calls, change states, show pixel history, work with renderTargets, see detailed activity of entire frame or selected drawcalls, change textures, shaders, see shader constants!, see scene overdraw, do experiments and see new performance of your entire frame and many other things.

When you work with Intel GPA, you need to know Erg term. The term “erg” refers to any work item within that frame that potentially renders pixels, which includes draw calls, clears, and other graphics API calls.

Stencil and Depth buffers

I have CSG example which works on GPU, I don’t share it, because one hole takes minimum 5 drawcalls(4 drawcalls for hole and 1 for depth buffer normalize). And It’s hard to write correct depth buffer to insert it in my scene render pipeline.
Ok, I have next render:

And through all development process I’ve used IntelGPA to view my depth and stencil buffer and play with render states:

How you can see, there are no hole in the depth. I’ve cleared depth buffer and wrote wall in depth.


API Log displays a summary of all Microsoft DirectX API calls for the items in your erg selection set; especially useful for tracking down “expensive” ergs by seeing what API calls are within one or more ergs.

I had very interesting situation. All day long I’ve tried to fix a bug. The problem was in that I set drawcall(upload programs, upload index and vertex buffers, textures) but I don’t see the result on my screen after compile. With IntelGPA I’ve detected that my drawcall is absent in DirectX API calls. And then I suddenly thought of enableErrorChaking property. My drawcall failed on validation in separate render thread and when enableErrorChaking is set to false, I can’t get feedback from validation process. In the end it emerged that my vertex buffer is wrong.


1) Discover most expensive drawcalls.

2) Running Experiments:
2×2 Textures
Use the 2×2 Textures override mode to help identify potential performance bottlenecks in your use of texture maps within the application. All textures for a scene are replaced with simple 2×2 pixel textures. Usually the Intel GPA Frame Analyzer uses a simple halftone or a colorized bitmap for this option. If using this override mode significantly improves the frame rate, then the GPU may be thrashing while loading texture maps from the CPU instead of using a cached version of that texture map from the GPU. If the total size of your texture maps is high for a scene, consider reducing one of the texture maps so that all the texture maps fit into the GPU’s texture cache for that scene.

1×1 Scissor Rect
The 1×1 Scissor Rect override mode is a DirectX* API override. However, the implementation of this override mode is highly dependent upon your specific graphics configuration; in particular, scissoring may occur either before or after the pixel shader stage.

Simple Pixel Shader
The Simple Pixel Shader experiment replaces the pixel shaders in your frame with a simple pixel shader, which writes a constant color to the render target for every selected erg. If the frame rate significantly decreases as a result of this experiment, you may want to perform further analysis of your shaders to see whether you can reduce rendering time without detracting from the visual quality of your scene. Enabling this experiment for ergs that do not reference a pixel shader in your original scene may actually result in a slower rendering time when using this override mode. This may seem counter-intuitive, but realize that all ergs are now forced to use a pixel shader, and this pixel shader may be slower than the fixed function shader that is used when an erg does not have an explicit pixel shader associated with it.

Disable Drawcalls(s)
Use this option to keep the selected ergs from being rendered. Use this option to test scene efficiency.

3) Modifying the Shader Code and also you can see shader constants!
For example, passing m4x4 to vertex shader

4) Determining whether Texture Bandwidth is a Performance Bottleneck

If reducing the mip level for the selected ergs significantly reduces the rendering time, then texture bandwidth may be a bottleneck, and if the change in visual quality is acceptable, you can gain speed by using a smaller texture.

If there is little or no visual difference in the scene when using a smaller mip level, then you are wasting texture bandwidth for this scene, and you can regain some texture memory by using a smaller texture. This usually happens when geometry is rendered smaller than expected. The hardware (in standard filtering modes) always chooses the best match for the screen pixels being displayed. So, the top-level mip will not be used if the geometry being rendered is significantly smaller than the resolution of the texture.

5) Minimizing Overdraw

If you see that two ergs affecting the pixel are of the same type, you may improve performance by removing one of them. To see how much performance you can gain by removing the erg, select it, and check the Disable Erg(s) check box in the Experiments tab.

6) Use Intel GPA System Analyzer in realtime!
I always use this feature when I work with dynamic geometry or work on particle system engine. I can see:
* Resource Creations, Index Buffer Creations, Vertex Buffer Creations, Texture Creations, Shader Creations
* Locks and locks time
* setRenderTarget calls per frame

How to start use for Stage3D

1) Download Intel GPA.
It contains: Monitor, Frame Analyzer, Platform Analyzer, System Analyzer.
2) Run Intel GPA Monitor and enable “Auto-detect launched applications” in preferences
3) Just compile and run your swf and you will see:

Press Ctrl+F1 to switch modes, and you will see:

You can run System Analyzer and see runtime performance changes by switching states.
4) Capture frame by pressing Ctrl+Shift+C
5) Now you can close your application, run Frame Analyzer
6) Select your captured frame and start
7) Use online help

Thanks for reading!

Visit Gonchar Website.

6 Responses to “Intel GPA for Stage3D developers”

  1. flare3d says:

    Awesome tool!, thanks for sharing!
    I have been using PIX from Microsoft (also very useful tool), but this one seems to have some really nice features.

  2. MMMaXXX says:

    Спасибо Серег!
    Как всегда, полезные новости.

  3. Gonchar says:

    thank you guys! I am glad that you like it!

  4. ben w says:

    well done for sharing, have been using this tool for a while now and glad someone took the time to explain it so more people can benefit (I was too lazy).

  5. pigiuz says:

    this is one of these posts that’ll make your server blow up 🙂
    thank you for sharing

  6. […] Each frame: 1) clear backbuffer by calling Context3D.clear 2) config GPU drawing state for the next drawTriangles calls(drawcall), for instance, setting up a blendMode, bind textures, bind shaders, bind constants data for shaders, depthbuffer compare and write mode, stencilbuffer actions, bind vertex buffers, index buffers and etc. * 3) Call drawTriangles() to draw the triangles defining the objects 4) Repeat until the scene is entirely rendered 5) Call the present() method to display the rendered scene on the stage. It copies the backbuffer to a framebuffer * It would be great for perfomance if you will batch the drawcalls by similar states, but it depends on the scene type. I will write a separate post about optimizations, but as an addition you can read my previous article “How to debug Stage3D with Intel GPA” […]

Leave a Reply

%d bloggers like this: