rendering - How much performance do conditionals and unused samplers/textures add to SM2/3 pixel shaders? -


We used a pixel shader in HLSL, which is used for some different things in some places, and like That many conditions are conditional, obstruction means that in some cases complex functionality can be left out. Also, this means that we pass texture as a sample parameter which can not always be used.

I do not know how both of these things have been added, but especially when we support SM 2.0 on integrated graphics chips, incompetence is a problem, does any texture pass and its Does not mean that there is no additional overhead? And if uses to work just to add some instructions or does it affect the things due to stalls and while doing CPU optimization while doing so? Setting the texture on the GPU takes some CPU time, but it is much less than the actual cost.

if it never does not reference.

Now, there are three ways that branches can be handled:

First of all, if the position of the branch is always going to be the same thing (if it is only compile-time Depending on the constant), then one side of the branch may be completely written out. In many cases it would be better to compile many versions of your shader if it makes it possible to terminate important branches in this manner.

The other technique is that Shader can evaluate both sides of the branch and then select the right result based on conditional, all actually without branches (this arithmetic). This is best when the code in the branch is small.

And in the end, it can actually use branches instructions. First, there is a small instruction count cost of branch instructions and then there is a long serial pipeline in the pipeline X86, which you can easily Can stall. The GPU is a completely different, parallel pipeline.

The GPU evaluates the set of pieces (pixels) in parallel, once executed the piece program for several pieces at a time. If all the pieces in a group take the same branch, then you only have the execution cost of that branch. If they take two (or more) branches, the adolescent should be executed to cover all the branches, sometimes for a group of pieces.

Because fragmented groups have on-screen areas, it helps if your branches are similar to on-screen areas:

Now , Sha ader compiler usually does a great job of selecting which of the previous two methods to choose (for the first method, the compiler will be inline, but you will have to create multiple shatter versions). But if you are optimizing performance, then it can be useful to see the actual output of the compiler. For this termination, to get the disassembly view of the compiled shader, / Fc & lt; File & gt; With the option, use fxc.exe in the DirectX SDK utility.

(As this is the performance advice: Remember to always measure your performance, the extent to which you are killing, and then worry about optimizing it. If you make your Shadar branches There is nothing to customize, for example.)

Additional References:.

Comments