Hi guys! This is my experimental article where I will explain the common principles of Stage3D API with small but very important details.
Stage3D API is a part of Flash API since Flash Player 11.2 and Adobe AIR 3.2 release. It is built on top of OpenGL and Direct3D APIs to give you a crossplatform access to the GPU. Stage3D API supports: OpenGL, Direct3D9, Direct3D11. It means that each gpu call is translated for the running device and shader bytecode is compiled into the native shader language for the current device. The name “Stage3D API” means the bunch of classes in Flash API for working with GPU. Stage3D class gives you an access to Context3D which represents a rendering surface.
How to get rendering surface?
You can’t create Context3D objects with the Context3D constructor. It is constructed and available as a property of a Stage3D instance.
The rendering pipeline is:
– a creation of Context3D object with Stage3D instance, by a calling of requestContext3D()
– configuration of a backbuffer by calling configureBackBuffer()
– a creation and uploading of resources with Context3D methods: createCubeTexture, createIndexBuffer, createProgram, createRectangleTexture, createTexture, createVertexBuffer
1) clear backbuffer by calling Context3D.clear
2) config GPU drawing state for the next drawTriangles calls(drawcall), for instance, setting up a blendMode, bind textures, bind shaders, bind constants data for shaders, depthbuffer compare and write mode, stencilbuffer actions, bind vertex buffers, index buffers and etc. *
3) Call drawTriangles() to draw the triangles defining the objects
4) Repeat until the scene is entirely rendered
5) Call the present() method to display the rendered scene on the stage. It copies the backbuffer to a framebuffer
* It would be great for perfomance if you will batch the drawcalls by similar states, but it depends on the scene type. I will write a separate post about optimizations, but as an addition you can read my previous article “How to debug Stage3D with Intel GPA”
Content drawn to the Stage3D viewport is composited with other visible Flash graphics objects in a predefined order. The most distant are all StageVideo surfaces. Stage3D comes next, with traditional Flash display object content being rendered last, on top of all others. StageVideo and Stage3D layers are rendered with no transparency; thus a viewport completely obscures any other Stage3D or StageVideo viewports positioned underneath it. Display list content is rendered with transparency.
Flash has two threads, the first is main thread and the second is for GPU rendering. drawTriangles call takes all state calls for the drawcall, copies it into a queue and then immediatly returns. Then the render thread pulls from the queue, validate states and then send them directly to the GPU. And if you want to get an error messages from a validation you need to set enableErrorChecking = true, it sets the rendering pipeline to the main thread and you can get the errors.
after present() method call, the main and rendering threads must be synchronized, and you must keep in mind that CPU(physics computation and etc.) and GPU can contain bottlenecks which can drop your FPS, because one thread will wait for the other.
When you request Context3D you can specify the profile which help you specify a feature level support.
– Context3DProfile.BASELINE – Use the default feature support profile
– Context3DProfile.BASELINE_CONSTRAINED – Use a constrained feature support profile to target older GPUs. This profile is primarily targeted at devices that only support PS_2.0 level shaders like the Intel GMA 9xx series. Also the same shader and pipeline states can work faster with constrained profile because of low shader level and feature set
– Context3DProfile.BASELINE_EXTENDED – Use an extended feature support profile to target newer GPUs which support larger textures. This profile increases the maximum 2D Texture and RectangleTexture size to 4096×4096.
* Today we have Flash Player 14 Beta which enables the huge opportunities with GPU rendering on Flash Platform. Check out the latest online meeting with Adobe. It gives us an opportunity to create AAA-level graphics like Crysis 2, Starcraft II and etc
GPU memory is a very limited. So, you always must think about your texture sizes because it will help you to save memory and has low texture bandwidth while using in shaders. Stage3D has a few texture types and texture formats.
– Context3DTextureFormat.BGRA 8 bits per channel, commonly used for render targets = 32 bits per pixel
– Context3DTextureFormat.BGRA_PACKED 4 bits per channel = 16 bits per pixel
– Context3DTextureFormat.BGR_PACKED 565 = 16 bits per pixel
– Context3DTextureFormat.COMPRESSED 4 bits per pixel!
– Context3DTextureFormat.COMPRESSED_ALPHA 8 bits per pixel!
Stage3D supports realtime texture compression for desktop and web platforms. So you can just specify desired texture compression and upload raw image data with 8bits per channel from BitmapData or ByteArray objects.
WARNING! Be careful of premulted alpha values when uploading from BitmapData or BitmapData.getPixels() you must restore these values in fragment shader or use an appropriate BlendMode.
You will gain the best performance for your application if you will use COMPRESSED and COMPRESSED_ALPHA texture formats. The only limitatation is that textures must be a square because of block compression. All possible compressed 2d texture sizes 4×4, 8×8, 16×16, 32×32, 64×64,128×128, 256×256, 512×512, 1024×1024, 2048×2048. Please read “Adobe Texture Format” by Thibault Imbert for more detailed info and how to create these textures for mobile platforms with ATF SDK.
– Rectangle Textures are avalialble only with Adobe AIR and they can have any size. The one limitation is that you can’t use COMPRESSED and COMPRESSED_ALPHA texture formats with it
You can use each texture type as a render target to perform the redering in the texture, instead of backbuffer. For instance, if you want do some post effects for the whole scene image you will do the following render pipeline:
1) clear backbuffer
4) do drawcalls
6) draw screen quad with binded RT texture from step 2, with special shader which reads a texel color and performs posteffect calculations
WARNING! You can use only Context3DTextureFormat.BGRA texture format for RT textures. It means 1 byte per channel.
You need to specify optimizeForRenderToTexture=true attribute inside one of the texture creation methods: createTexture, createCubeTexture, createRectangleTexture.
Backbuffer is your drawing surface and it consists of two textures:
1) Color texture 8 bits per channel = 32 bits per pixel
2) Combined Depth Buffer & Stencil Buffer. 24bits for depth value and 8 bits for stencil value
Stage3D uses double buffering. It means that the main thread must wait until the finished drawing in backbuffer is copied or swapped to the frame buffer before starting the next drawing. This waiting period could be several milliseconds during which neither buffer can be touched. You can read more about it here.
This optimization helps to avoid image flickering.
Vsync is an optional GPU driver property, which solves the synchronization problem of frame rate with vertical blanking interval of monitor for smooth swapping from back to front(at first we draw to the back buffer and then it swaps to the front). Without Vsync you can get some artifacts, for instance, one part of the image is a previous frame and other part is the current frame. This produces a small penalty in latency, because the program has to wait until the video controller has finished transmitting the image to the display before continuing.
The GPU only draws frames on a vsync(1/60th of a second). When Flash asks it to draw, it waits for the next vsync. This can be long for two reasons:
- The CPU work per frame is less than 1/60th of a second. In this case the GPU swap time represents “idle” time while we wait for a vsync. This is harmless. This is probably the case for DisplayList apps.
- Flash has its own, separate timer that drives the frame rate, and it can get offset from vsync. Imagine two lights blinking at different frequencies. At first they blink simultaneously, then they gradually drift apart, then they come together again, and then drift apart again. During some phases of this cycle, GPU swap will be small: Flash asks the GPU to draw just before a vsync. During other phases, GPU swap will be big: Flash asks the GPU to draw just after a vsync, and the GPU waits for the next one. This results in periodic dropped frames. Try changing your framerate to 20, 30 and 60.
(thanks to Adam Cath for the explanation)
Programs, VertexBuffer and IndexBuffer
Program3D is a gpu program that consist of vertex and fragment shader parts. The vertex shader operates on data defined in VertexBuffer3D objects and is responsible for projecting vertices into clip space and passing any required data to the fragment shader. Vertex shader runs per each vertex during the current drawcall and fragment shader runs for every rasterized fragment of a triangle in the drawcall. You can transfer data from vertex shader to fragment shader through varying variables. You can pass data to a shader through vertex buffers and shader constants. Constants can be passed directly to vertex or fragment program. If you want to interpolate constants per pixel just pass these values per vertex and transfer through varying variables to the fragment shader and It will be automatically interpolated per pixel. Also you can use these values for ddx and ddy opcodes(will be avaliable from Flash Player 14 release, now you can download it from labs)
You need to set a IndexBuffer3D for each drawcall. It defines the order of the vertex data rendering through triangles. Vertex consist of attributes, attributes is just a numbers. Vertex data can be binded to the vertex shader with a few vertex buffer instances. It really useful if you have a dynamic and static vertex data parts. So you can upload only particlar peace of data to the GPU and save uploading data bandwidth. For instance, you have vertices with position and uvs attributes and you need to change and upload only positions on enter frame. So you can hold two VertexBuffer3Ds for positions and uvs and upload only positions values. You don’t need to upload the whole vertex data each frame. Also please check Context3DBufferUsage flags. I plan to write a separate article about this, with my experince from Realaxy project with a huge amount of dynamic geometry.
By the default state, you always perform a depth testing during the rendering and you can specify a type of comparison used for depth testing. The depth of the source pixel output from the fragment shader program is compared to the current value in the depth buffer. If the comparison evaluates as false, then the source pixel is discarded. If true, then the source pixel is processed by the next step in the rendering pipeline, the stencil test. In addition, the depth buffer is updated with the depth of the source pixel, as long as the depthMask parameter is set to true.
You can also perform a stencil test for a discarding of source pixel. You need to specify an 8-bit stencil reference value that can be associated with each draw call. During rendering, the stencil reference value can be tested against values stored previously in stencil buffer. The result of the test can control the draw action and whether or how the stored stencil value is updated. In addition, depth testing controls whether stencil testing is performed. A failed depth test can also be used to control the action taken on the stencil buffer.
In the pixel processing pipeline, depth testing is performed first. If the depth test fails, a stencil buffer update action can be taken, but no further evaluation of the stencil buffer value can be made. If the depth test passes, then the stencil test is performed. Alternate actions can be taken depending on the outcome of the stencil test. The stencil reference value is set using setStencilReferenceValue(). Stencil Buffer is very useful for masking, also you can implement shadows with it, so called “Stencil Shadows”
Also you can use scissorRect if you want to mask the drawing region by a rectangle.
PowerVR SGX is the graphics processor in the iPhone 3GS, iPhone 4, third-generation iPod touch, and the iPad, and is designed to support OpenGL ES 2.0. The PowerVR SGX uses a technique known as tile based deferred rendering (TBDR). When you call OpenGL ES functions to submit rendering commands to the hardware, those commands are buffered until a large list of commands are accumulated. These commands are rendered by the hardware as a single operation. To render the image, the framebuffer is divided into tiles, and the commands are drawn once for each tile, with each tile rendering only the primitives that are visible within it. The key advantage to a deferred renderer is that it accesses memory very efficiently. Partitioning rendering into tiles allows the GPU to more effectively cache the pixel values from the framebuffer, making depth testing and blending more efficient.
Another advantage of deferred rendering is that it allows the GPU to perform hidden surface removal before fragments are processed. Pixels that are not visible are discarded without sampling textures or performing fragment processing, significantly reducing the calculations that the GPU must perform to render the tile. To gain the most benefit from this feature, draw as much of the frame with opaque content as possible and minimize use of blending and the kil instruction in AGAL shaders. Because the hardware performs hidden surface removal, it is not necessary for your application to sort primitives from front to back.
If your application uses the kil opcode in the fragment shader, this hardware depth-buffer optimizations must be disabled. An alternative to using alpha test or kil is to use alpha blending with alpha forced to zero. This effectively eliminates any contribution to the framebuffer color while retaining the Z-buffer optimizations. This does change the value stored in the depth buffer and so may require back-to-front sorting of the transparent primitives.
Additional part: Setting Up
if you are new to flash and came from other platform please check out the brief info and useful links about Flash Platform in one place
Supported Platforms: Web(Windows, Mac, Linux in Google Chrome browser), Desktop, iOS, Android
– Adobe Flash Professional CC http://www.adobe.com/products/flash.html
– Adobe Flash Builder http://www.adobe.com/products/flash-builder.html
– FlashDevelop http://www.flashdevelop.org/
– Intellij IDEA Ultimate Editionhttp://www.jetbrains.com/idea/
– FDT http://fdt.powerflasher.com/
– CodeDrive http://www.codedrive.com/
– Adobe AIR SDK http://www.adobe.com/devnet/air/air-sdk-download.html
– Apache Flex SDK http://flex.apache.org/
– Crossbridge https://github.com/adobe-flash/crossbridgeC/C++ to Flash
– Flash http://forums.adobe.com/community/flash
– AIR http://forums.adobe.com/community/air
– Flash Player Beta http://forums.adobe.com/community/labs/flashruntimes/flashplayer
– AIR Beta http://forums.adobe.com/community/labs/flashruntimes/air
if you target only web platform you can use Flex SDK or ASC2.0(part of Adobe AIR SDK & Compiler) for the compilation of swf file which you embed on a html page.
The difference between Flex SDK and ASC2.0:
– Flex SDK contains Flex Framework, which brings the declarative GUI programming thorugh mxml files with a lot of features, UI components and easy to extend and skinning. As a rule you use flex if you need complex gui for the application.
– ASC2.0 is the latest compiler. Please check out the features Introduction to ASC2.0
if you target Desktop or Mobile platforms you need to use Adobe AIR SDK + (ASC2.0 or Flex SDK if you need flex framework)
WARGNING! if you want to use the latest Flex SDK with the latest version of Adobe AIR SDK you need to download the original Adobe AIR SDK without new compiler at the bottom of the page and extract it inside Flex SDK directory.