GLSL(OpenGL Shader Language) compilation bug with for loop on Adreno 205, Android.

One of the biggest advantages of the OpenGL API specification is that OpenGL is language agnostic.

That means it can be implemented on almost any programming language which makes it a very portable library.

However, there is a serious issue with OpenGL. It’s shader language(GLSL) has no specification for compilation. You can’t rely on binary files of compiled shaders to work on different devices.

Not only that but compiling the GLSL source code while running the app on different devices might produce different results or even silent bugs(Depending on driver implementation).

My game Shotgun Practice was running perfectly on my device(Galaxy Note N7000) but didn’t work on my friend’s device(HTC Desire Z).

On my friend’s ‘HTC Desire Z‘ Android device with the ‘Adreno 205‘ GPU it had graphics artifacts.

After quite some tests I found that a specific shader was the culprit. That shader was the vertex shader of skinned objects.

It took me a lot of tests because the driver for HTC Desire Z didn’t report any error or warning upon compiling and validating the skinning shader.

Eventually it boiled down to the part of code that transforms the vertices with the relevant bones.

Doesn’t work on HTC Desire Z

for(int i = 0; i < 4; ++i)
	mat4 m = BoneTransform[Index[i]];
	posOut += (w[i]*m*vec4(position, 1.0)).xyz;
	normalOut += (w[i]*m*vec4(normal, 0.0)).xyz;

Works on HTC Desire Z

mat4 m = BoneTransform[Index[0]];
posOut += (w[0]*m*vec4(position, 1.0)).xyz;
normalOut += (w[0]*m*vec4(normal, 0.0)).xyz;
m = BoneTransform[Index[1]];
posOut += (w[1]*m*vec4(position, 1.0)).xyz;
normalOut += (w[1]*m*vec4(normal, 0.0)).xyz;
m = BoneTransform[Index[2]];
posOut += (w[2]*m*vec4(position, 1.0)).xyz;
normalOut += (w[2]*m*vec4(normal, 0.0)).xyz;
m = BoneTransform[Index[3]];
posOut += (w[3]*m*vec4(position, 1.0)).xyz;
normalOut += (w[3]*m*vec4(normal, 0.0)).xyz;

As you can see the code that doesn’t work has a ‘for loop’ and in the code that works I manually unrolled the ‘for loop’.

I also tested if the issue was that ‘mat4 m’ was inside the ‘for loop’ block or that using a hard coded number of iterations would cause a faulty loop unrolling.

Neither attempts worked. I don’t know exactly what is the driver issue with this but I was told you should use ‘for loops’ very cautiously in GLSL meant for mobile devices.


Beware of ‘for loops’ and generally branching in GLSL meant for mobile devices.

But even worse, some drivers(hopefully only old devices) might not warn you that the shader isn’t going to work on the device even though it passed all the validation.

SoundPool doesn’t loop? Android

In my new small Android racing game called ‘Diesel Racer’ I have the need to play an engine sound in a loop.

I play my sounds in Java using SoundPool.

My first attempt to play a sound in a loop was calling the method setLoop  just before playing the sound like so:

Pool.setLoop(sampleId, -1);

That did not have any affect and the sound was playing just once.

I tried to put the same function after I called the play method but it didn’t help either.

I have later found that the play method has a ‘loop’ parameter as well. Since in the past I only needed to play the sound once, I passed 0 to the ‘loop’ parameter which would set the sound to play once.

Setting the ‘loop’ value before calling play will have no effect.

However, I am not sure why setting the ‘loop’ parameter after using play didn’t have any effect either.

In order to play the sound looping forever I just passed -1 to the ‘loop’ parameter when using play.

(I am still not sure how am I suppose to use setLoop).

For the sake of completion here is the code I use to play a SoundPool sound looped:

					try {
						n.SoundID = Pool.load(a.assetManager.openFd(n.Path), Thread.NORM_PRIORITY);
						if (n.IsLoop)
							Pool.setOnLoadCompleteListener(new SoundPool.OnLoadCompleteListener() {
							    public void onLoadComplete(SoundPool Pool, int sampleId,int status) {, 1, 1, 0, -1, 1);
					} catch (IOException e) {

Do notice that this code will loop both ‘.ogg’ and ‘.wav’. I saw some post somewhere that said ‘.ogg’ might not be looped with a SoundPool, but ‘.ogg’ does loop for me with this code.

Tessellation Simplified

I have implemented tessellation + displacement map for my game Shoe String Shooter.
To decide how much to tessellate a triangle I calculated it’s screen space area. The more screen space area it takes, the more I tessellated it.

This would theoretically make the triangles uniformly sized across the screen space. It made sense to me, but it has some major issues.

The displacement occurs along the triangle’s normal. If the triangle is facing the camera, its normal is facing the camera as well.
There is very little visible effect when the geometry is displaced towards the viewer. A flat triangle with a normal mapped texture would be just as good(almost).

Another thing is that triangles which  are 90 degrees from the camera will almost disappear in screen space and will have a very small area in screen space.
However, when displacing those triangles the geometry is very much visible since the normal is tangential to the screen space.

The simpler approach

The new approach I have taken is simpler and give better results. It also requires only 3 control points instead of 6.

The first step is to tessellate every edge of the triangle according to how long the maximum displacement vector(at the direction of the normal) is on the screen space.
This metric will have the facing and farther triangles tessellate less, and the conspicuous and closer triangles tessellate more.

This is not enough. Some triangles are very big in world space and we don’t modulate the tessellation with the triangle’s area any more. So large triangles will appear more coarse.

The solution is to modulate the displacement vector tessellation with world space edge length. This will achieve spatial uniformity in world space.
Tessellating according to edge length has some advantages compared to tessellating according to triangle area.
First, we only need 3 control points instead of 6.
Second, we tessellate more along the longer edges and tessellate less along the shorter edges. Area calculation will not differ between a golden triangle and a very narrow but long triangle.

The last step is to bound the tessellation amount since we don’t want unnecessary triangles on the geometry that is up close. We don’t set a constant tessellation bound, but instead set a bound modulated with the world space edge length.


PatchTess ScreenSpaceTessellator(float3 w[3], float4 p[3], float4 q[3])
	PatchTess pt;
	float Res = 768.0;
	float Cell1 = 16.0;
	float Cell2 = 8.0;
	float MaxTes = 10.0;

	unsigned int i=0;
	for (i=0; i<3; i++)
		pt.EdgeTess[i] = 1;
	pt.InsideTess = 1;
	float Tess[3] = {0, 0, 0};
	if (IsScreenCull (p[0], p[1], p[2]))
		return pt;
	for (i=0; i<3; i++)
	for (i=0; i<3; i++)
	for (i=0; i<3; i++)
		float3 a1 = (w[(i+1)%3]-w[(i+2)%3]);
		Tess[i] = length(a1)/Cell1;
//		Tess[i] = max(Tess[i], 1);
		float2 a2 = (q[(i+1)%3].xy-p[(i+1)%3].xy)*Res*0.5;
		float2 b2 = (q[(i+2)%3].xy-p[(i+2)%3].xy)*Res*0.5;
		Tess[i] *= 0.5*(length(a2)+length(b2))/Cell2;
		Tess[i] = min(max(Tess[i], 1), MaxTes*length(a1)/Cell1);
	for (i=0; i<3; i++)
		pt.EdgeTess[i] = Tess[i];
	pt.InsideTess = (Tess[0]+Tess[1]+Tess[2])/3.0;
	return pt;

Motion Blur of Skinned Meshs, Particles and a Bug Fix.

In the previous post about motion blur I presented my results even though I was not completely happy with them.

For the motion blur I was calculating screen space vectors in an offscreen texture. These velocity values are then used by the compute shader post processing to calculate how much to push pixels around to make pixels smear in places with high velocity.

It turns out I forgot to normalize the screen space x and y coordinates by the w component which actually stores a depth value to make the perspective projection. This resulted in a very acute blur for far objects, and a very minute blur for closer objects.

Another thing I wanted to add is motion blur for skinned meshes. To calculate the velocity of each pixel for each object I was calculating each vertex’s position AND what would be its position in the previous frame. I was considering world movement and camera movement. 
However, for skinned meshes I didn’t include object deformation differences in the previous frame, so a skinned mesh had only extrinsic blur and not intrinsic blur.

Similar to calculating the extrinsic velocity, in order to calculate bone affected position of the vertices, I had to calculate their position in the previous frame. To do that I simply sent a copy of the previous frame’s bones offsets. This might prove to be a performance issue since now I send twice the amount of bones’ offsets, but for now it’s good enough.

Finally, I also did motion blur for particles. I simply rendered the velocity vectors of the particles with blending into the velocity offscreen texture. It works good enough, but I didn’t put more effort than that. I might do something else later.

I think the next stage would be to combine motion blur with Depth Of Field. Hopefully it’s just a matter of pipelining the two filters.

Skinned motion blur.

Depth Of Field

In photography DOF(Depth Of Field) is the range in which the image blur is smaller than one pixel. That means that outside the DOF the image gets blurry.

As you might have already seen, I had a Blur Compute Shader implemented. So making a DOF is just a simple matter of doing blur adjusted to the depth of the pixel, right? Not really.
Doing DOF is more complex than I initially thought.

The first issue I encountered is that when I wanted to blur pixels, which are before the DOF, my blur wasn’t wide enough. I didn’t want to increase the kernel so I just sampled with steps of 2 pixels instead of 1.

That was simple enough but then I realized something else. Even though I was sampling wider and even though I was performing a blur there were still hard edges in areas with discontinuity.
What do I mean? Imagine you look behind the corner of a wall. The wall that is near you should be blurred, and the wall that is far should be sharp.
However, even though the nearer wall is blurred, there is still a hard edge because the blur is happening on a per pixel basis.
What we wanted to happen is for the closer wall to smear and blur above the far wall. To do that I had to consider an environment of depth values when deciding on the blur factor instead of just considering the depth value of the current pixel.

Hard Edge

Soft Edge

 The final step was to make the DOF adjust according to what the player is looking at. For that I sampled the 4×4 center pixels of the depth buffer inside a separate Compute Shader and wrote out the result into a 1×1 texture that will be read by the DOF Compute Shader as a shader resource.  

Warnning! fxc compilation bugs(DirectX)

I have been working on DX11 and tessellation lately. It turns out my tessellation shaders are broken when I let fxc optimize my shader. The “solution” is to specify the flag /Od which tells fxc to not do any optimizations.
This will have to do for now, but you realize how deadly it can be. You might work on a problem for a long time and not solve it because of unsafe optimizations, and then suddenly you turn into debug and it works. Scary stuff.
I must note that I don’t have any warnings from fxc.

I tried to look at the assembly code, and the biggest thing I noticed is that the “Release” code has loop unrolling while the “Debug” code has real loops. I think most of the optimizations are the loop unrolling anyway.

Here is the unoptimized shader results compared to the optimized shader results: