Memory Usage and Normal Map Texture Formats in OpenGL ES2.


I am working on a 3D mobile racing game using OpenGL ES2.

For the cars in the game I am using normal maps to give them more details while using the same amount of geometry.

32 Bit RGBA uncompressed normal mapped car.

32 Bit RGBA uncompressed normal mapped car.

Normal maps are images that contain normal vectors per texel of the texture.

The normal vector is usually packed inside the 24 bit RGB values of the texture and is converted into a vector inside the fragment shader.

vec3 texNormal =texture2D(normalSampler, texOut).xyz;
texNormal = (texNormal*2.0)-1.0;

Here is an example of a test normal map my artist created for the car(it doesn’t have a lot of details it’s just a test):

Test Normal Map

Test Normal Map

Memory Usage

When you load a texture into OpenGL it doesn’t matter how much space it takes on the disk, what matters is how much space it takes in memory.

For instance an image could be saved as a PNG and take only 700KB on disk but when loaded to OpenGL it would take 4MB because it’s a 1024×1024 32 bit RGBA uncompressed format.

In my racing game I would have 4 different cars, each one with it’s own normal map. If each normal map is a 1024×1024 RGBA uncompressed image, it would take 16MB of memory with just the normal maps!

We would like to reduce the memory usage of these textures.

Compressed Texture

OpenGL ES2 supports loading compressed texture using glCompressedTexImage2D.

In the case of iOS the native supported compressed texture format is PVRTC.

The advantage of these compressed formats is that they are stored compressed in the OpenGL memory and they are only decoded in real time. So the memory usage is only of the compressed texture size.

A 4 bit PVRTC always give you a compression of 1/8 of the uncompressed 32 bit RGBA bitmap.

While PVRTC is great for diffuse or color textures, with normal maps the compression of the texture make it seem like there are lumps on the surface.

PVRTC compressed normal mapped car.

PVRTC compressed normal mapped car.

We might be missing the point of making things look better when we have such artifacts on the surface of our car.

16 Bit 565 RGB

Our original texture is 32 bit RGBA per pixels. Which means we have 4 bytes per pixel of the image.

There are uncompressed formats that use only 2 bytes per pixel.

For instance GL_UNSIGNED_SHORT_5_6_5.

We only need RGB since we don’t use the Alpha channel in our texture.

What will happen if we store the xyz components of the normal inside 16 bit RGB instead of 24 bit RGB?

16 bit RGB normal mapped car.

16 bit RGB normal mapped car.

Again, with the reduced accuracy of the normals the end result is not very good.

16 Bit Two Components Packing

Our normals are normalized to the length of 1. We can actually derive the z component of our normal from our x and y components using Pitagoras theorem.

We only need 16 bits to store the x and y components.

However, we do not have a pixel format of two 8 bit components.

We do have this format: GL_UNSIGNED_SHORT_4_4_4_4.

We are going to pack the two 8 bit RG components into the four 4 bit RGBA components of our new format.


	stbi_uc * Tmp = Data;
	Data = (unsigned char *)new unsigned short[w*h];
	unsigned int n = w*h;
	for (unsigned int i=0; i<n; i++)
		unsigned char r = Tmp[4*i];
		unsigned char g = Tmp[4*i+1];
		unsigned short v = (unsigned short)r;
		v = v<<8;
		v+=(unsigned short)g;
		((unsigned short*)Data)[i] = v;
	delete [] Tmp;

After packing the texture into 16 bits we need to change the code in our fragment shader to unpack the texel into a normal vector.

vec4 compressedNormal =texture2D(normalSampler, texOut);
vec3 texNormal;
texNormal.r = 2.0*(compressedNormal.r*240.0+compressedNormal.g*15.0)/255.0-1.0;
texNormal.g = 2.0*(compressedNormal.b*240.0+compressedNormal.a*15.0)/255.0-1.0;
texNormal.b = sqrt(1.0-texNormal.r*texNormal.r-texNormal.g*texNormal.g);

In this case we reduce the memory usage by half of the original 32 bit texture and we don’t compromise any quality.

We might be paying in GPU processing power since now our fragment shader is more complex.

However, I didn’t notice any difference in performance on my iPad New.

We also assume our z component is always positive so we cannot represent normals that go inward.

However, in most cases we do not need normals that go into the triangle or the surface.

16 bit packed normal mapped car.

16 bit packed normal mapped car.

Talk To Your Artist

Until now we were able to reduce the memory usage by a factor of 2 without sacrificing the quality.

However, this might not be enough.

There is another way to reduce memory usage and that is having the artist optimize the texture mapping.

In our case the bottom part of the car is rarely seen or is only partially seen.

We don’t need a lot of details in the bottom part of the car so the artist can allocate a much smaller area of the texture for the bottom part.

He might also separate the car into two surfaces(which means two passes) and to have different textures for each surface while the bottom part will have a much lower resolution texture.

In our game I have told the artist to reduce the texture size from 1024×1024 to 512×512 while taking into consideration that we don’t need a lot of details in the bottom part.

With both the 16 bit texture packing and the reduction of the resolution of the texture, we will get a factor of 8 in reducing the memory usage.

Projecting A Dynamic Decal onto a 3D Mesh.


For my new mobile racing game I have implemented decals for a few reasons.

Decals are useful to add extra details to a mesh when it is impossible to include all the details with one big UV mapped texture.

They are also useful for adding extra details dynamically without having to change the texture of the mesh they are being added to.

In this article I am going to focus on how to calculate the decal’s geometry so it can be used inside your code in real time.

How to Derive a Decal From a Mesh

The decal geometry is derived from the mesh’s geometry we are adding the decal into.

What we want to do is to project a 2D rectangle(or any convex polygon) onto the mesh’s surface and cut out the geometry of the projected rectangle.

Think of it as like using a cookie cutter to cut out a shape from the dough only our dough does not have to be flat.

A mesh in our case is made out of triangles.

What we want to do is cut all the triangles using our 2D rectangle(or “cookie cutter”) and then add the resulting triangles to our vertex buffer so we could send it for rendering.

We could go over all the triangles in our mesh and cut them one by one.

However, most of the triangles will be cut out completely so it will be a wasteful process.

Instead we need to find an optimized method to select only the relevant triangles. For instance this method: Using a Spatial Partitioning Grid to Improve Performance.

Making a Convex “Cookie Cutter” using Planes.

In order to cut out the geometry from the mesh’s surface using our convex shape we use 3D planes to cut out each triangle.

This is the reason why we can only cut a convex shape using this method.

Each edge in our “cookie cutter” shape will be represented by a plane that it’s normal is the cross product of a vector on the edge and a vector that represent the direction of the projection.

So the edge and the vector that represents the direction of the projection are both contained by the plane.

When we cut a triangle using a plane we keep the part of the triangle that is on one side of the plane and we throw the part of the triangle that is on the other side.

It is possible that the triangle is completely thrown away or kept whole.

There are two scenarios in the case where the triangle is cut into a smaller piece.

Either the piece that we keep is made out of the “tip” of the triangle(and hence made out of one triangle), or it is made out of the “base” of the triangle. And thus it is a quad and is made out of two triangles.

When I say “base” and “tip” it could be relative to any of the 3 vertices of the triangle.

A possible algorithm would be:

1) Find all relevant triangles.
2) If relevant triangles list is empty go to (12)
  3) Add top relevant triangle to cutList
  4) If cutList is empty go to (10)
    5) Get next plane in the convex shape and cut the triangle into 0 to 2 triangles.
    6) Add the cut out triangles(if any) to the cutList.
    7) If didn't reach end of plane list go to (4)
  8) Add cutList triangles to finalGeometry list(if any).
  9) clear cutList
  10) Pop top triangle of relevant triangles list.  
  11) go to (2)
12) Finish (finalGeomtry list contain all the relevant triangles).

 UV Mapping of The Decal

There is another issue we didn’t even talk about.

The decal need it’s own UV coordinates. It cannot use the mapping of the mesh since they are mapped over the entire mesh and we want to map an entire texture into our decal.

I will explain briefly how to calculate the UV coordinates for a quad based decal.

The UV coordinates correspond to the projection.

Each one of the 4 vertices in our quad will be mapped to 4 UV coordinates on the texture space.

However, when we cut the mesh’s triangles using planes we might get vertices that are anywhere in between and inside the projection of our quad.

In order to calculate the UV  coordinates we would need to project the new decal vertices back into the plane of our quad(or back into the “cookie cutter”).

Our projection is orthogonal and in our case it is in the direction of (0, -1, 0) so in order to project vertices back to the quad we simpley drop the y component.

If our quad was a rectangle we could have calculated the UVs from the distance of the projected vertex to the edges of the rectangle.

However, we want to solve this for a generic quad.

In order to do this we solve the following set of equations:

p = (v1*(1-t1)+v2*t1)*(1-t2)+(v3*(1-t1)+v4*t1)*t2;

This is a set of two equations where t1 and t2 are our variables(what we are trying to find), p is a 2D vector which is the point we projected back to the quad and v1, v2, v3 and v4 are 4 2D points of the quad itself.

To solve this we need to express t2 using t1 with one equation and then assign t2 to the other equation.

Then we will get a second order equation in t1.

However, we are suppose to get only one solution, right?

Well without getting into the details, we just need to get the most positive solution or just reduce the equation into a first order equation(where the factor that multiplies t1*t1 is nearly zero).

(Edit: We need to get the solution that is inside or closest to (0..1).

We do this by using the solution that is closest to 0.5.)

What will guide you which equation you need to solve is by testing every place that divides by an expression that it is not zero and test that a Square root is not negative.

When those expressions are zero(in the case of division) or negative(in the case of the Square root) you will know which equation you need to solve.

For each vertex in the decal you get it’s UV coordinates from solving the set of equations for the respective projection of the vertex into the quad(our p).

t1 and t2 are our UV coordinates.

Rendering The Decal

We render the decal in a separate pass than the mesh itself.

We would use similar shaders to the mesh but we would also use a blend mode to blend the decal into the mesh.

For instance normal blending but otherwise rendering the decal is very similar to rendering the mesh.

There is another difference though. Since the decal is derived from the mesh geometry, it will cause Z Fighting in the Z Buffer.

In order to have our decal render over the mesh we need to offset the decal’s depth.

In GLES2 we have a method called glPolygonOffset that does that.

For the sake of completion here is the relevant source code:


    class GenerateQuadMeshShader: public Graphics3D::GeometryShader {
        GenerateQuadMeshShader(std::vector<math::Triangle> & Triangles, Graphics2D::Position StartLeft, Graphics2D::Position StartRight, Graphics2D::Position EndLeft, Graphics2D::Position EndRight):Triangles(Triangles), StartLeft(StartLeft), StartRight(StartRight), EndLeft(EndLeft), EndRight(EndRight)
            StartLeft.y = 0;
            StartRight.y = 0;
            EndLeft.y = 0;
            EndRight.y = 0;


        void Init()

        Graphics3D::GeometryVertex GenerateVertex(unsigned int i, unsigned int n)
            Graphics3D::GeometryVertex v;
            float3 Vertex = Triangles[i/3].Vertex(i%3);
            Graphics2D::Position v1(Vertex.x, Vertex.y, Vertex.z);
            v.pos.x = v1.x;
            v.pos.y = v1.y;
            v.pos.z = v1.z;
            v1.y = 0;
//            double t1;
//            double t2 = (v1.x-(StartLeft.x*(1.0-t1)+StartRight.x*t1))/((EndLeft.x*(1.0-t1)+EndRight.x*t1)-(StartLeft.x*(1.0-t1)+StartRight.x*t1));
            double a = (StartLeft.x-StartRight.x)*(-EndLeft.z+EndRight.z+StartLeft.z-StartRight.z)-(StartLeft.z-StartRight.z)*(-EndLeft.x+EndRight.x+StartLeft.x-StartRight.x);
            double b = (v1.x-StartLeft.x)*(-EndLeft.z+EndRight.z+StartLeft.z-StartRight.z)-(v1.z-StartLeft.z)*(-EndLeft.x+EndRight.x+StartLeft.x-StartRight.x)+(StartLeft.x-StartRight.x)*(EndLeft.z-StartLeft.z)-(StartLeft.z-StartRight.z)*(EndLeft.x-StartLeft.x);
            double c = (v1.x-StartLeft.x)*(EndLeft.z-StartLeft.z)-(v1.z-StartLeft.z)*(EndLeft.x-StartLeft.x);
            double t1 = 0;
            if (fabs(a)<0.000001)
                t1 = -c/b;
                double delta = b*b-4*a*c;
                t1 = -b/(2.0*a);
                if (delta>0.)
                    t1 = (-b+sqrt(delta))/(2.0*a);
                    // Fix!
                    double Solution2 = (-b-sqrt(delta))/(2.0*a);
                    if (fabs(Solution2-0.5)<fabs(t1-0.5))
                        t1 = Solution2;
                    //if (t1<0.0)
                    //    t1 = (-b-sqrt(delta))/(2.0*a);
            double x1 = ((EndLeft.x*(1.0-t1)+EndRight.x*t1)-(StartLeft.x*(1.0-t1)+StartRight.x*t1));
            double t2 = 0;
            if (fabs(x1)<0.0001)
                double z1 = ((EndLeft.z*(1.0-t1)+EndRight.z*t1)-(StartLeft.z*(1.0-t1)+StartRight.z*t1));
                t2 = (v1.z-(StartLeft.z*(1.0-t1)+StartRight.z*t1))/z1;
                t2 = (v1.x-(StartLeft.x*(1.0-t1)+StartRight.x*t1))/x1;
            v.u = t1;
            v.v = t2;

            return v;
        unsigned int VertexAmount(unsigned int n)
            return (unsigned int)Triangles.size()*3;

        unsigned int SurfaceAmount()
            return 1;

        std::vector<math::Triangle> & Triangles;
        Graphics2D::Position StartLeft, StartRight, EndLeft, EndRight;

    class DecalMesh {
        DecalMesh (Graphics3D::Factory & f, Graphics3D::Mesh m1):Map(m1), f(f)
            std::vector <Graphics2D::Position> & p1 = m1->GetPosition(0);
            std::vector <unsigned int> & i1 = m1->GetIndex(0);
            for (unsigned int i=0; i<Triangles.size(); i++)
                unsigned int j = i*3;
                Triangles[i] = math::Triangle(float3(p1[i1[j]].x, p1[i1[j]].y, p1[i1[j]].z), float3(p1[i1[j+1]].x, p1[i1[j+1]].y, p1[i1[j+1]].z), float3(p1[i1[j+2]].x, p1[i1[j+2]].y, p1[i1[j+2]].z));

            DecalMat = f.LoadMaterial(0.0);
            DecalMat->SetResource (f.LoadResourceEx ("Resource/DecalSS.png", true));
            DecalMat->SetSpecular (f.LoadResourceEx ("Resource/White.IC", true));
            DecalMat->SetSpecular(Graphics3D::Float4D(1., 1., 0.8, 8.0));
            DecalMat->SetAmbient(Graphics3D::Float4D(1, 1, 1., 0));
            DecalMat->SetDiffuse(Graphics3D::Float4D(0, 0, 0, 1));


        void AddStripe (Graphics2D::Position StartLeft, Graphics2D::Position StartRight, Graphics2D::Position EndLeft, Graphics2D::Position EndRight)
            Graphics2D::Position Min = StartLeft;
            Graphics2D::Position Max = StartLeft;
            Min = PosMin(Min, StartRight);
            Max = PosMax(Max, StartRight);
            Min = PosMin(Min, EndLeft);
            Max = PosMax(Max, EndLeft);
            Min = PosMin(Min, EndRight);
            Max = PosMax(Max, EndRight);
            std::vector<unsigned int> List1 = Map.GetPositionRangeTriangles(Min, Max);
            std::vector<math::Triangle> FinalList;
            for (unsigned int i=0; i<List1.size(); i++)
                std::list<math::Triangle> TriangleList;

                std::vector<Graphics2D::Position> p1;
                p1[0] = StartLeft;
                p1[1] = EndLeft;
                p1[2] = EndRight;
                p1[3] = StartRight;
                std::vector<math::Plane> Planes;

                for (unsigned int k=0; k<p1.size(); k++)
                    p1[k].y = 0;
                for (unsigned int k=0; k<p1.size(); k++)
                    float3 a3(p1[k].x, p1[k].y, p1[k].z);
                    Graphics2D::Position q = Graphics2D::Position(0, 1, 0).Cross((p1[(k+1)%p1.size()]-p1[k]).Normalize()).Normalize();
                    Planes[k] = math::Plane(a3, float3(q.x, q.y, q.z));
                for (unsigned int j=0; j<Planes.size() && TriangleList.size()>0; j++)
                    std::list<math::Triangle> NextList;
                    math::Plane Plane1 = Planes[j];
                    std::list<math::Triangle>::iterator q;
                    for (q=TriangleList.begin(); q!=TriangleList.end(); q++)
                        math::Triangle t1 = *q;
                        math::Triangle ResultT1, ResultT2;
                        float3 Factor1, Factor2;
                        unsigned int Cycle1 = 0;
                        unsigned int Type = Plane1.Clip2(t1, ResultT1, ResultT2, Factor1, Factor2, Cycle1);
                        if (Type==0)
                        if (Type==3)
                        else if (Type==1)
                        } else
                    TriangleList = NextList;
                while (!TriangleList.empty())
            GenerateQuadMeshShader g(FinalList, StartLeft, StartRight, EndLeft, EndRight);
            Graphics3D::Mesh Result = f.GenerateMesh(g);
            CurrentMesh = Result;

        Graphics3D::Mesh CurrentMesh;
        Graphics3D::Material DecalMat;
        Graphics2D::Position PosMin(Graphics2D::Position p1, Graphics2D::Position p2)
            Graphics2D::Position Result = p1;
            Result.x = std::min(Result.x, p2.x);
            Result.y = std::min(Result.y, p2.y);
            Result.z = std::min(Result.z, p2.z);

            return Result;

        Graphics2D::Position PosMax(Graphics2D::Position p1, Graphics2D::Position p2)
            Graphics2D::Position Result = p1;
            Result.x = std::max(Result.x, p2.x);
            Result.y = std::max(Result.y, p2.y);
            Result.z = std::max(Result.z, p2.z);

            return Result;
        SurfaceTriangleMap Map;
        std::vector <math::Triangle> Triangles;
        Graphics3D::Factory & f;