The previous article left us at the point where we could display tiles without wasting time drawing any of these out of the screen.
I will now explain the next step, which is to be more efficient at drawing the tiles.
Drawing tiles one by one
The code examples I gave in the previous articles were relying on a semi mythical function:
The first parameter of the function is a texture reference. It may be a value, a pointer, some obscure handle returned by the graphic API1you are using. The important point here is that for the graphic card, each different texture reference is a different 'object' (this will become important later during the optimization phase).
The second and third parameters are the coordinates of the tile on the screen. It can be expressed in screen coordinates, it may use homogeneous2 coordinates, it may have inverted values.
That does not matter, the concept is still valid :)
In almost every case3 we end-up with something like that:
Given a texture (any rectangular image with 4 corners) and a set of coordinates, we can send a command to display the entity on the screen.
The graphic card will then draw our square using two textured triangles4:
- the first one, drawn in red, will be using the vertices (0,1,2)
- the second one, drawn in green, will be using the vertices (1,2,3)
- the first one, on the top left, with texture coordinates 0.0/0.0
- the second one, on the top right, with texture coordinates 1.0/0.0
- the third one, on the bottom left, with texture coordinates 0.0/1.0
- the fourth one, on the bottom right, with texture coordinates 1.0/1.0)
The OpenGL equivalent would be:
// Draw a rectangle using two trianglesBy doing that for each tile with the correct texture reference and screen position we eventually end up with a screen on which our tilemap is displayed.
glBegin(GL_TRIANGLES);
// First triangle
glTexCoord2f( 0.0f, 0.0f );
glVertex3f( 0.0f, 0.0f, 0.0f ); // vertex 0 (top left)
glTexCoord2f( 1.0f, 0.0f );
glVertex3f( 1.0f, 0.0f, 0.0f ); // vertex 1 (rop right)
glTexCoord2f( 0.0f, 1.0f );
glVertex3f( 0.0f, 1.0f, 0.0f ); // vertex 2 (bottom left)
// Second triangle
glTexCoord2f( 1.0f, 0.0f );
glVertex3f( 1.0f, 0.0f, 0.0f ); // vertex 1 (top right)
glTexCoord2f( 0.0f, 1.0f );
glVertex3f( 0.0f, 1.0f, 0.0f ); // vertex 2 (bottom left)
glTexCoord2f( 1.0f, 1.0f );
glVertex3f( 1.0f, 1.0f, 0.0f ); // vertex 3 (bottom right)
glEnd();
Each of these drawing commands is called a Draw Call.
This approach has the merit of being simple to implement and easy to understand, but from a CPU and GPU point of view is far from being optimal because it generates many context changes and uses quite many draw calls.
Batching
You have to imagine that modern hardware is like a high speed train: When it has been going on for a while it's pretty efficient, but each stop at a station has a serious impact on the train's average speed.When you are drawing your tiles one by one you are more or less doing the same thing as stopping the train. The solution for that is to try to render as many tiles as possible in one single operation.
The general idea is that the less draw calls you perform, the higher the performance.
The main problem here is that you cannot change textures during drawing, so you need to have access to all the tiles graphics in one single texture: This is called a Texture Atlas (also called a texture dictionary).
If you remember our previous Space Station example, the texture atlas would look like that:
This atlas contains four sub-textures.
In my earlier example I gave sets of texture coordinates: They were all set to either 0 or 1. The reason was simply that we were using the whole surface of the image as our source texture for our tile.
When multiple images are packed together in a texture atlas, it become necessary to use fractional values to access each individual image in the larger texture:
- the starfield, on the top left, from 0.0/0.0 to 0.5/0.5
- the metal plate, on the top right, from 0.5/0.0 to 1.0/0.5
- the painted plate, on the bottom left, from 0.0/0.5 to 0.5/1.0
- the unused texture, on the bottom right, from 0.5/0.5 to 1.0/1.0
With all our images stored in one single texture it is now possible to draw the whole tilemap in much less DrawCalls, and ideally only one should be sufficient.
Strips
Recent hardware is able to perform complex operations using shaders of many kinds, and are able to handle sophisticated forms of indexing in order to describe the geometry to render efficiently.Earlier, the most efficient way to reduce the number of draw calls was to use triangle stripping.
I could just skip the explanation, but I believe that it does not hurt to know the history of things, and who knows: you may have the opportunity to use hardware on which this knowledge will come in handy.
So, how does stripping works?
Well, the concept is very simple:
- You start the strip by describing a triangle using three coordinates (0,1,2).
- You can just describe a second triangle by adding one new coordinate, forming the (1,2,3) triangle.
- Repeat the process by adding one new coordinate to create one new triangle.
So, let's go back to our space station. Do you think you could generate a strip for this tilemap?
Ok, I will admit that the solution may not be obvious, and as often it generally helps to try to solve the problem by small pieces.
If the source texture had all the right images at the right location we could just have done this, starting from the top left and proceed to the top right of the tilemap:
This is a strip made of 16 triangles drawing the 8 top tiles in one single operation.
Using OpenGL this would give something similar5 to this:
// Draw a bunch of rectangles using triangle stripping
glBegin(GL_TRIANGLE_STRIP);
// First triangle
glTexCoord2f( 0.0f, 0.0f );
glVertex3f( 0.0f, 0.0f, 0.0f ); // vertex 0
glTexCoord2f( 1.0f, 0.0f );
glVertex3f( 1.0f, 0.0f, 0.0f ); // vertex 1
glTexCoord2f( 0.0f, 1.0f );
glVertex3f( 0.0f, 1.0f, 0.0f ); // vertex 2
// Second triangle
glTexCoord2f( 1.0f, 1.0f );
glVertex3f( 1.0f, 1.0f, 0.0f ); // vertex 3
// Third triangle
glTexCoord2f( 2.0f, 0.0f );
glVertex3f( 2.0f, 0.0f, 0.0f ); // vertex 4
// Fourth triangle
glTexCoord2f( 2.0f, 1.0f );
glVertex3f( 2.0f, 1.0f, 0.0f ); // vertex 5
// Fifth triangle
glTexCoord2f( 3.0f, 0.0f );
glVertex3f( 3.0f, 0.0f, 0.0f ); // vertex 6
(...)
// Sixteenth triangle
glTexCoord2f( 16.0f, 1.0f );
glVertex3f( 16.0f, 1.0f, 0.0f ); // vertex 17
glEnd();
The problem is that this does not work when we use a texture atlas: It works for the first tile, but when we start drawing the second tile we are having problems with the texture coordinates:
- The display coordinates of vertices 2 and 3 are identical.
- The texture coordinates of vertices 2 and 3 are different.
To solve this problem we will have to use Degenerate Triangles7.
A degenerate triangle is simply speaking a triangle without area. To create them all you need to do is to give the same coordinate twice, this will "flatten" the triangle and the result is that nothing should be drawn.
These 'invisible' triangles can be used to reset the UVs to the values we want.
The code would then looks like this:
// Draw a bunch of rectangles using triangle stripping
glBegin(GL_TRIANGLE_STRIP);
// First tile (all UVs point to the first image)
glTexCoord2f( 0.0f, 0.0f );
glVertex3f( 0.0f, 0.0f, 0.0f ); // vertex 0
glTexCoord2f( 1.0f, 0.0f );
glVertex3f( 1.0f, 0.0f, 0.0f ); // vertex 1
glTexCoord2f( 0.0f, 1.0f );
glVertex3f( 0.0f, 1.0f, 0.0f ); // vertex 2
glTexCoord2f( 1.0f, 1.0f );
glVertex3f( 1.0f, 1.0f, 0.0f ); // vertex 3
// Second tile (all the UVs we now use are for the second image)
glTexCoord2f( 1.0f, 0.0f );
glVertex3f( 1.0f, 0.0f, 0.0f ); // vertex 2 (degenerated)
glTexCoord2f( 1.0f, 0.0f );
glVertex3f( 1.0f, 0.0f, 0.0f ); // vertex 2 (degenerated)
glTexCoord2f( 1.0f, 1.0f );
glVertex3f( 1.0f, 1.0f, 0.0f ); // vertex 3
glTexCoord2f( 2.0f, 0.0f );
glVertex3f( 2.0f, 0.0f, 0.0f ); // vertex 4
glTexCoord2f( 2.0f, 1.0f );
glVertex3f( 2.0f, 1.0f, 0.0f ); // vertex 5
(...)
glEnd();
The trick is that since the two degenerate triangles are not actually visible, it is possible to set UVs that point to the new part of the texture atlas we want to use for the next tile.
That's a bit ugly, adding the degenerate triangles does reduce the interest of using triangle strips in first place, but it normally works.
Index and Vertex buffers
I was planning to talk about index and vertex buffers as the second way to perform batching, but this article is already long enough and contains a lot to digest so I will introduce these in the next part.The more technical it gets, the more risks I introduce some silly mistake, so please don't hesitate to signal any unclear or blatantly wrong statement!
1. API stands for Application Programming Interface, a set of functions you are allowed to use to perform various operations.↩
2. Homogeneous coordinates are standardized values in the -1 to +1 range.↩
3. I'm referring to modern hardware. Older machines did not have graphic cards and would use totally different ways to achieve the same result.↩
4. This is called a Triangle Strip, each added vertex defines a new triangle.↩
5. The texture coordinates should be between 0 and 1 and using fractional values, but that would make the example less readable↩
6. Remember, UV is the common name to describe texture coordinates.↩
7. Degenerate triangles may not be accepted by all graphic apis.↩