Make a Custom NSToolBar Item in xCode’s Interface Builder.

For my Mac app Pompi Video Editor I decided to make the interface for selecting video operations using NSToolBar.

NSToolBar lets you create NSToolBarItem as a button with either a label, an image or both.

In Pompi Video Editor the toolbar looks like this:

ToolBarSample

This kind of toolbar can be easily made using xCode’s interface builder.

In the interface builder this tool bar looks like this:

ToolbarInterfaceBuilder

Notice all the items are either NSToolBarItem or a Flexible Space.

There is one exception though… the Custom View.

I wanted to make a custom button which also have a drop down button next to it.

There is a way to add custom NSToolBarItem using NSToolBarDelegate.

The problem with using NSToolBarDelegate is that you cannot make use of the order you placed the items using the Interface Builder.

You will have to make a list of the items in code(Or at least I didn’t find a way to not do this in code using the delegate).

If you look at the screenshot above you will see that I also have a Custom View and this Custom View do display in the app in the same order I palced it in the toolbar.

In order to achieve this you need to create an NSView in the same nib your NSToolbar is at.

The NSView’s content will be the custom button itself(in my case I have put two buttons inside, one square and one more narrow for additional options).

After that you only need to drag your off window NSView under your NSToolbar in the interface builder and xCode will create an NSToolbarItem with your own NSView underneath it.

You can then drag the item with the custom view into the default toolbar items area.

However, your custom view will not display in the tool bar this way and will just leave a blank space.

In order to make your custom NSView display we need to set it’s parent’s(the NSToolbarItem) view property to your actual view. I didn’t find a way to do this from within the interface builder so I subclessed NSToolBaritem and did this in code.

For the sake of completion here is the custom NSToolbarItem code(the customView IBOutlet is refernced into your NSView that is under the CustomToolbarItem) :

 

#import <Cocoa/Cocoa.h>

@interface CustomToolbarItem : NSToolbarItem
{
    IBOutlet NSView * customView;
}

@end
#import "CustomToolbarItem.h"

@implementation CustomToolbarItem

-(instancetype)initWithItemIdentifier:(NSString *)itemIdentifier
{
    self = [super initWithItemIdentifier:itemIdentifier];
    if (self)
    {
    }
    return self;
}

-(void)awakeFromNib
{
    [self setView:customView];
}
@end

iOS touches(touchesBegan, touchesMoved…) for Games

I have been working on a game called Flat Out Hockey for Android, iOS, Mac and OUYA.

GooglePlayFeatureFinal

For this game I needed both “Touch buttons”(which are buttons mimicking gamepad buttons) and a touch area where you would hold your finger and the character would follow:

Nexus7Screenshot

 

It worked fine for the most part, but if I pressed several buttons at once or tried to do crazy tapping and swiping my touch area would stop responding.

All the touches in the game’s iOS version are handled using the touches functions(touchesBegan, touchesMoved, touchesCancelled, touchesEnded).

I would go over all the UITouch of the UIEvent(using [event allTouches]) and track all the touches that have started, moved or ended(there could be multiple of them at once).

What I didn’t realize is that even when you are inside the touchesEnded function(for instance) you may get UITouch which did not actually end.

I am thinking that maybe UIEvent allTouches give all the current touches regardless which touches function has been called, which means you always need to check for the UITouchPhase of every UITouch!

This is quite simple and makes sense but you might easily miss that…(like I did)

For the sake of completion here is the Objective C part of my touches code:

 

@interface TouchNode: NSObject
{
@public
    NSNumber * hash;
    unsigned int originalIndex;
}
@end
-(double)zoomFactor
{
    if (IS_ZOOMED_IPHONE_6_PLUS || IS_ZOOMED_IPHONE_6)
    {
        double s = [[UIScreen mainScreen] nativeScale]/[[UIScreen mainScreen] scale];
        return s;
    }
    return 1.0;
}


-(void)touchesBegan:(NSSet *)touches withEvent:(UIEvent *)event
{
//    double retinaFactor = isRetina?2.0:1.0;
    for (UITouch *touch in [event allTouches]) {
        CGPoint touchLocation = [touch locationInView:self.view];
        touchLocation.x*=[self zoomFactor];
        touchLocation.y*=[self zoomFactor];
        if (IsInput())
        {
            
            if (touch.phase == UITouchPhaseBegan)
            {
                TouchNode * n = [TouchNode new];
                n->hash = @(touch.hash);
                n->originalIndex = (unsigned int)[touchHash count];
                [touchHash addObject:n];
            }
            else
                continue;
            unsigned int i=0;
            for (; i<[touchHash count]; i++)
                if (touchHash[i]!=[NSNull null] && [((TouchNode*)touchHash[i])->hash integerValue]==touch.hash)
                    break;
            if (i<[touchHash count])
            {
                if (touch.phase == UITouchPhaseEnded)
                {
                    ReleaseTouch(retinaFactor*touchLocation.x, retinaFactor*touchLocation.y, ((TouchNode*)touchHash[i])->originalIndex);
                    touchHash[i] = [NSNull null];
                    [touchHash removeObjectAtIndex:i];
                }
                else
                    Touch(retinaFactor*touchLocation.x, retinaFactor*touchLocation.y, ((TouchNode*)touchHash[i])->originalIndex);
            }
            //            NSLog(@"%d", i);
        }
    }
}

-(void)touchesMoved:(NSSet *)touches withEvent:(UIEvent *)event
{
//    double retinaFactor = isRetina?2.0:1.0;
    for (UITouch *touch in [event allTouches]) {
        CGPoint touchLocation = [touch locationInView:self.view];
        touchLocation.x*=[self zoomFactor];
        touchLocation.y*=[self zoomFactor];
        if (IsInput())
        {
            if (touch.phase != UITouchPhaseMoved)
                continue;
            unsigned int i=0;
            for (; i<[touchHash count]; i++)
                if (touchHash[i]!=[NSNull null] && [((TouchNode*)touchHash[i])->hash integerValue]==touch.hash)
                    break;
            if (i<[touchHash count])
            {
                if (touch.phase == UITouchPhaseEnded)
                {
                    ReleaseTouch(retinaFactor*touchLocation.x, retinaFactor*touchLocation.y, ((TouchNode*)touchHash[i])->originalIndex);
                    touchHash[i] = [NSNull null];
                    [touchHash removeObjectAtIndex:i];
                }
                else
                    Touch(retinaFactor*touchLocation.x, retinaFactor*touchLocation.y, ((TouchNode*)touchHash[i])->originalIndex);
            }
            //            NSLog(@"%d", i);
        }
    }
}

-(void)touchesCancelled:(NSSet *)touches withEvent:(UIEvent *)event
{
    touchHash = [NSMutableArray new];
}

-(void)touchesEnded:(NSSet *)touches withEvent:(UIEvent *)event
{
//    double retinaFactor = isRetina?2.0:1.0;
    for (UITouch *touch in [event allTouches]) {
        CGPoint touchLocation = [touch locationInView:self.view];
        touchLocation.x*=[self zoomFactor];
        touchLocation.y*=[self zoomFactor];
        if (IsInput())
        {
            if (touch.phase != UITouchPhaseEnded)
                continue;
            unsigned int i=0;
            for (; i<[touchHash count]; i++)
                if (touchHash[i]!=[NSNull null] && [((TouchNode*)touchHash[i])->hash integerValue]==touch.hash)
                    break;
            if (i<[touchHash count])
            {
                unsigned int Count = 0;
                for (unsigned int j=0; j<[touchHash count]; j++)
                    if (touchHash[i]!=[NSNull null])
                        Count++;
                if (touch.phase == UITouchPhaseEnded)
                {
                    if (Count==1)
                        ReleaseTouch(retinaFactor*touchLocation.x, retinaFactor*touchLocation.y);
                    else
                    {
                        ReleaseTouch(retinaFactor*touchLocation.x, retinaFactor*touchLocation.y, ((TouchNode*)touchHash[i])->originalIndex);
                    }
                    touchHash[i] = [NSNull null];
                    [touchHash removeObjectAtIndex:i];
                }
                else
                    Touch(retinaFactor*touchLocation.x, retinaFactor*touchLocation.y, ((TouchNode*)touchHash[i])->originalIndex);
            }
            //            NSLog(@"%d", i);
        }
    }
}

What is “Performance is limited by CPU vertex processing”?

I was working on a different OpenGLES2 technique of particles to draw fire for my new(WIP) game Dragons High.

The thing I noticed is that whenever the particles effect is enabled I get stalled for 14 ms on glDrawElements of that specific particles VBO.

In this particle technique I am rendering the triangles of a certain mesh instanced 500 times.

Each instance has it’s own time and a few other parameters.

The vertex structure I used was:

typedef struct _ParticleVertex
{
    float3 Pos;
//    float Alpha;
    float3 Velocity;
    float Time;
    float4 color;
} ParticleVertex;

Velocity is the particle’s initial velocity.
Time is the time the particle first spawns, and color is the particle’s color tint and transparency(with the alpha component).
Instead of color I can use Alpha which only has the alpha component of color.

The thing I noticed is that if I used Alpha instead of color, the CPU will no longer stall on the particle’s draw call and it would take less than 1 ms to complete the command on the CPU side.

Both methods rendered the particles correctly:

FireTest

 

However, when I analyzed a frame with color in xCode, I found the following warning:

“Performance is limited by CPU vertex processing”

If you know how GLES2 works it would sound weird that the vertices are being processed on the CPU. The vertex shader in GLES2 runs on the GPU and the analyzer cannot know if I updated the VBO with vertex data that was preprocessed on the CPU.

When running the OpenGL ES Analyzer instrument in the xCode profiler I found the following error:

“GL Error:Invalid Value”

This error pointed to glVertexAttribPointer with all the parameters set to 0.

It turns out that when creating the VBO there was a boolean parameter(mUseColor) that I used to signify that the vertex struct has a color data.

When creating the VAO for this vertex struct I was adding 4 to the array length for mUseColor==true instead of 1.

This resulted in the VAO array have 3 more entires than it should. Since those entries were not initialized they were left with 0 values.

Why were the particles rendered correctly eventhough the VAO was incorrect? And why the rendering cost so much CPU? It’s hard to tell, but it’s possible the iOS driver did something to compensate on the structural error of the VAO.

I guess a good practice is to clean all the errors before you profile… but it could be interesting to understand what was going under the hood and what was the CPU vertex processing xCode warned about.

 

std::deque 4K Bytes high memory cost on iOS, C++

Preface

My iOS build of my game Dragons High was using about 150MB of live memory.

I used the Allocations instrument to find out that I had more than 10,000 allocations of the size of 4K Bytes.

All those allocations were of the same class, std::deque<unsigned int>

In Dragons High there is a terrain of the world. The terrain has collision geometry data.

Basically I test the dragon or other moving characters against the triangles of the terrain geometry so they won’t go through it.

In order to not go over all the triangles I have a grid of lists(std::deque) with indices to the triangles, so if the character is inside a certain cell in the grid I just test against the triangles in that cell and not against all the triangles in the mesh.

std::deque Cost

It turns out that for every item in std::vector<std::deque<unsigned int>>, std::deque was allocating 4K Bytes even if it had less than 20 items.

For my grid I allocated 128×128 cells, so as a result I had more than 50MB usage just for this std::vector.

For every std::deque<unsigned int> in the std::vector in my iOS game I had 4K allocated even if there were very few elements in the std::deque.

My solution was to use my own custom Data Structure which used dynamically allocated unsigned int * pointer.

This way I was able to shave about 50MB of memory usage.

EDIT:

A better alternative to making your own custom class with a pointer is to use std::vector.

std::vector takes care of allocations for you, so it’s safer and it does not have a noticeable memory footprint(for a small amount of items).

Rubinstein’s Fix for Depth Pass(Z-Pass) Shadow Volume.

Preface

In my new 3D mobile racing game Diesel Racer 2 I have decided to use Shadow Volume as the technique for generating shadows in real time.

There are several reasons why I preferred this technique over shadow mapping.

Shadow mapping requires render targets. Apart from taking more texture memory some older mobile devices suffer a lot in performance when using render targets(and using them as texture resource).

Another reason is that it would be hard to get high resolution shadow maps on mobile GPUs.

Perhaps on more powerful devices, but I aimed to target weaker devices such as iPod 4th generation.

How Does Shadow Volume Work?

You can read about Shadow Volume in this wikipedia page: http://en.wikipedia.org/wiki/Shadow_volume

In short, with Shadow Volume you would need to generate the geometry that contains the volume of the shadow cast by an object.

You do so by creating fins from the edges that connect two triangles. One triangle is culled from the point of view of the light source and the other is not.

The collection of all those culled/unculled edge fins would then contain the volume of the shadow the light source cast from this object.

You would also need the caps of the shadow in order to have a complete closed geometry of the shadow volume but the caps are not always required when rendering the shadow volume.

Now that we have the shadow volume we can find the shadows it cast on the object by counting how many front faced fins minus how many back faced fins we have on every pixel on the object’s surface.

We would only count the fins that are visible to the camera and are not occluded by objects, or in other words, fins that pass the Z-Buffer test(in a specific pixel in the view).

If there are more front facing fins than there are back it means that the pixel is inside the shadow volume and thus the object casting the shadow occludes the light source for this specific pixel.

 Z-Pass vs Z-Fail

The technique I described above is called the Z-Pass technique.

It works well enough unless the camera is inside the shadow volume.

In this case part of the volume fins are inside the viewing frustum and part of them are clipped.

Moreover, it may be that part of the fins are only partially inside the viewing frustum.

This will cause an under valued count of the amount of fins that are front facing the pixel and the amount of fins that are back facing.

One might suggests that you would add geometry on the front clipping plane of the frustum to accommodate for the fins that are behind the camera.

However, such geometry would be difficult to create especially since it’s rasterization might not sit perfectly with the partially clipped fin(with the front clipping plane of the frustum).

The reason this technique is called the Z-Pass technique is because we only count the fins that pass the Z-Buffer test or are not occluded by objects in the scene.

There is another technique called the Z-Fail(or “Carmack’s Reverse”).

Instead of counting the fins that pass the Z-Buffer test we could count the fins that fail the Z-Buffer test(the fins that are behind the objects in the scene).

The reason this works is because we are actually interested in the pixels on the surface of the scene’s objects and so it doesn’t matter much if we count the fins from one side of the surface or from the other side of the surface.

This will solve the issue of having the camera inside the shadow volume because Z-Buffer values are usually greater than 0(the front clipping plane{-1 for OpenGL}) and even if the Z-Buffer had values right on top of the front clipping plane, you wouldn’t see much anyway.

This technique requires that the shadow volume has caps in the end otherwise you would have ghost shadows since now the end of the shadow volume is more likely to be rendered(because it is more likely to be occluded by objects).

Rubinstein’s Fix for Z-Pass

We mentioned that in Z-Pass there is an issue when the camera is inside the shadow volume.

We also mentioned that we would could have fixed this issue if only we would create geometry on the front clipping plane that would negate the clipped fins or partially clipped fins.

However, that is not entirely true.

We don’t need to create geometry on the front clipping plane yet we can negate those back fins with no counterpart.

Lets say we have a closed mesh.

If we render this mesh with no Depth test(Z-Buffer test is set to always pass) and the camera is not inside it, then the front facing fins and the back facing fins will have to cancel each other and we would just get 0 as the total count of front facing fins minus back facing fins.

What if the camera is inside this volume?

Then we would get a negative count of all the missing or partially missing fins!

So in order to fix the issue of rendering inside a shadow volume with Z-Pass we would just need to render the shadow volume again but with no Depth test at all and adding the negative of the count!

The important thing to remember though is that in the fixing pass we need the shadow volume to be a closed shape(using caps).

Otherwise we would have ghosting shadows just like in the Z-Fail technique.

Broken Z-Pass Shadow

Broken Z-Pass Shadow

 

Rubinstein's Fix

Rubinstein’s Fix

 

Final Fixed Result

Final Fixed Result

Conclusion

The technique I presented could be an alternative to “Carmack’s Reverse”.

However, I am not sure how much more beneficial it is.

It probably has it’s own benefits and might be used for other things.

I am using this because it was easier for me to implement and because I wasn’t sure what was going on with the Z-Fail technique although now it’s easier for me to understand both.

In any case, I hope this article was beneficial for you.

p.s. Please don’t forget to check out Diesel Racer 2 on Google Play, Amazon, Apple Store and OUYA.

p.p.s. the Shadow Fix technique might not be available on all platforms yet, it takes time to update versions.

Edit:

I was told on r/gamedev that this technique is almost identical to “Carmack’s Reverse”.

While “Carmack’s Reverse” is doing Depth Fail I am doing Always minus Depth Pass which ultimately is equal to Depth Fail since A = DP + DF.

However, in my method I could render only a small part of the volume mesh in Always. Rendering it with DF would require to render the entire thing as you also need to render the shadows instead of only fixing the DP.

Here are the two meshes I use. One for the visuals and one for Shadow Volume ‘A’ fix.

Visual Mesh

Visual Mesh

Fix Volume

Fix Volume

Textures didn’t load on 64 bit device due to byte alignment.

I submitted my game Diesel Racer 2 for review to the Apple app store.

My game was rejected due to the fact that on a 64 bit iOS device some of the textures were black.

The textures that were black were PowerVR compressed textures.

The reason they didn’t load was due to Byte alignment.

In C++ structs may have variables of different byte length. For instance an unsigned short is usually 2 Bytes while an int as 4 Bytes.

In 32 bit structs are compiled so the variables are aligned to 4 Bytes(= 32 bit).

In 64 bit structs are compiled so that variables are aligned to 8 Bytes(= 64 bit).

When loading a PVR textures the first thing I read is the file’s header.

This is the version 3 PVRTC header:

 

typedef struct _PVRTexHeaderV3{
    uint32_t    version;
    uint32_t    flags;
    uint64_t    pixelFormat;
    uint32_t    colourSpace;
    uint32_t    channelType;
    uint32_t    height;
    uint32_t    width;
    uint32_t    depth;
    uint32_t    numSurfaces;
    uint32_t    numFaces;
    uint32_t    numMipmaps;
    uint32_t    metaDataSize;
} PVRTexHeaderV3;

When packed to 4 bytes alignment this struct is 52 Bytes long.

However, when packed to 8 bytes alignment this struct is 56 bytes long! But in the file the header is saved to 52 Bytes and does not have 4 bytes of padding at the end like in memory.

The solution is to always read 52 Bytes from the file(unless one day we will have 128 bit systems?).

We can read it as is into the memory of the struct in 64 bit, since the alignment only adds 4 bytes to the end of the struct in this specific case.

Do note that byte alignment might add bytes in the middle of the struct in both 32 bits and 64 bits.

Here is such example:

 

typedef struct _Foo{
    uint16_t    a;
    uint64_t    b;
} Foo;

In 64 bit systems, Foo will be 16 Bytes long and the extra 6 padding bytes will be between a and b.

In 32 bit systems, Foo will be 12 Bytes long and will have 2 padding bytes in between a and b.

Apple has documents about what you need to consider when porting your app from 32 bit to 64 bit, too bad I forgot to read it before I submitted the first build.

https://developer.apple.com/library/ios/documentation/General/Conceptual/CocoaTouch64BitGuide/ConvertingYourAppto64-Bit/ConvertingYourAppto64-Bit.html

However, I fixed it in the following build.

Diesel Racer 2! Now available.

IconV2_512

“A former illegal racing driver is living a peaceful life with her new family.

Until one day she realize she has to race again for the same people who got her son into trouble.

The competition is young, the competition is fierce but she race for life.”

 

I have released the game “Diesel Racer 2” for several mobile and micro console platforms.

It is now available on Amazon, Google Play and OUYA(The OUYA build is not up to date). It will be also soon availalbe on iOS.

The game plays with either the motion sensors(on mobile) or a game pad(OUYA, FireTV).

It is a good looking 3D racing game.

Originally I wanted Diesel Racer 2 to have a much larger scope but I had to cut out a lot of the features.

I would like to add a lot more features in the near future, so consider it in the current state as a kind of an Alpha.

There are two track to race.

Features I would like to add:

  • More tracks.
  • Multiplayer.
  • Weapons and car upgrades.
  • Story mode.
  • Death match areana.
  • And more…