The random ramblings of a French programmer living in Norway...
2012
← Tilemap tutorial (part 3)Tilemap tutorial (part 5) →
  Tilemap tutorial (part 4)
Sun 2nd December 2012   
Welcome to the fourth part of the Tilemap tutorial.

The previous article explained the concept of batching and using texture atlases to draw a tilemap more efficiently.

This new article will finally attack the final topic: How to write a Tilemap optimized shader.

Vertex buffers

Before starting on the shader idea, I would just quickly address the concept of vertex and index buffers.

The main problem with the traditional way of using OpenGL is that a lot of redundant work is done; many of the vertices informations are copied over and over again, even when they should be shared between different polygons.

The situation was not that bad in the early days when the most common things were flat or gouraud1 shaded polygons. The performance really started to suffer when each vertex had to have a position, color, normal, multiple sets of texture UVs, ...

A simple solution was to add an indirection level:
  • Store all the unique sets of values in a large array.
  • Define the polygons by using simple numerical indexes to indicate the location of the values to use in the large array.
If you want to know more about these, search about Vertex Arrays, Index Arrays or Vertex Buffer Objects as keywords depending of your API of choice.

Tilemap as a sprite

If you go back to the previous articles, you will quickly realize that what we did with the CPU could instead totally have been done with the GPU:
  • The entire drawing code is totally predictable and does not rely on any external information.
  • The only operation it does are iterations on the horizontal and vertical axis, plus some array look-ups.
A modern GPU is able to perform all these operations very efficiently. The trick is to find how to present the data to the GPU in a way it can perform efficiently.

The ultimate goal is to be able to draw the whole tilemap on the screen as one single quad entity, one single large big sprite!

What we want is to have the same result as if we already had this tilemap available as a texture ready to display:


If we had this texture, then all we would have to do is to draw two triangles using the four corners of the texture for texturing coordinates.

Of course, the whole point of the exercise was to not have this tilemap texture in memory in first place. How can we achieve the same result without having this big tilemap texture in memory?

CPU Shader?

Let's refresh your memory and bring back the very first code example I gave:

// Our tilemap is 8x8 tiles  
tileMapWidth =8
tileMapHeight=8

// '0' represents the deep space tiles, '1' and '2' are the metal plates
tileMap = [0,0,0,0,0,0,0,0]
[0,2,1,0,0,2,2,0]
[0,0,1,1,1,2,2,0]
[0,0,1,0,0,1,1,0]
[0,0,1,0,0,1,1,0]
[0,0,1,0,0,1,1,0]
[0,2,1,1,1,1,1,0]
[0,0,0,0,0,0,0,0] As Bytes

// This returns an array of 3 tiles each containing a picture
tiles = ["tile_starfield",
"tile_metal_plate",
"tile_metal_plate_painted"] as Pictures

// We now iterate through all the cells, column by column, row by row
For row From 1 To tileMapHeight
For column From 1 To tileMapWidth
// Read the tilemap to find the number of the tile at this location
tileIndex = tileMap[column][row]
// From the index get the picture from the texture array
tileToDraw = tiles[tileIndex]
// Draw it at some magical coordinates that matches the tilemap ones :)
DrawPicture(tileToDraw,column,row)
Next column
Next row

Let's try to pretend that the CPU is actually a GPU, and replace the DrawPicture function by an equivalent code using DrawPixel:

(previous code) 

// Our tiles are 16x16 pixels
tileWidth =16
tileHeight=16

For rowTilemap From 1 To tileMapHeight
For columnTilemap From 1 To tileMapWidth
// Read the tilemap to find the number of the tile at this location
tileIndex = tileMap[columnTilemap][rowTilemap]
// From the index get the picture from the texture array
tileToDraw = tiles[tileIndex]
// Draw each pixel of the tile
For rowTile From 1 To tileHeight
For columnTile From 1 To tileWidth
// Get the color of this particular pixel in the tile
pixelColor = tileToDraw[columnTile][rowTile]
// Compute the screen coordinates
xScreen=(columnTilemap * tileWidth) + columnTile
yScreen=(rowTilemap * tileHeight) + rowTile
// Draw the pixel
DrawPixel(pixelColor,xScreen,yScreen)
Next columnTile
Next rowTile
Next column
Next row
This code is functionally equivalent to the previous one: We have inlined the DrawPicture.

The problem is that it is not representative of the way a shader works on a graphic card. Not at all.

Reversed logic

What makes GPU so much faster at drawing things on the screen is not that they are incredibly more powerful than the CPUs.
The reason is just that they have been designed to be able to perform very efficiently a small subset of operation in parallel, using multiple units working on the same problem.
When implementing an operation on the GPU, you have to think of how the GPU is working internally when it needs to render a triangle.

Simplified, the process looks a bit like that:
    - The GPU receive the informations required to render a particular triangle
    - The camera position is applied on the vertex coordinates
    - Vertex coordinates are projected to screen space
    - A 2D rasterizer2 draws the pixels that fits inside the triangle shap>
The original informations can come from single triangles, strips, arrays/buffers. It does not really matter: In the end current GPUs are just very optimized triangle drawing engines.

The vertex coordinate transformation phase is nothing more than vector and matrix operations. These can be controlled by the user with Vertex Shaders.

I described the last operation in very vague terms. The reason is that many methods can be used for rasterizing
3. This freedom left graphic card designers able to optimize the rendering.

The trick that allows this parallelization is that basically instead of starting from the texture and finding out where it should be displayed on the screen what the card do is to find out all the pixels that should be drawn on the screen, and from there
find out how each individual pixel should be drawn by recomputing all the texture parameters.

Obviously this involves a lot of processing power since the card cannot reuse information it already processed for the previously drawn pixels, but on the other hand it means that each pixel could theoretically be drawn separately by a different processing unit.

This really is all what Pixel Shaders are: Small programs able to compute the final value of one individual pixel to draw on the screen.

Let's see if we can change our program to mimic this behavior.

CPU Shader!

First we need to write this single pixel shading function:

// Input: horizontal and vertical screen coordinates 
// Returns: color of the pixel at these coordiantes
// Note: Uses 'tileMap' and 'tiles' global values
Function ComputePixelColor(xScreen,yScreen)
// Deduce the tile coordinates from the screen coordinates
columnTilemap=xScreen / tileWidth
rowTilemap =yScreen / tileHeight

// Read the tilemap to find the number of the tile at this location
tileIndex = tileMap[columnTilemap][rowTilemap]

// From the index get the picture from the texture array
tileToDraw = tiles[tileIndex]

// Deduce the internal tile coordinates from the screen coordinates
columnTile=xScreen Modulo tileWidth
rowTile =yScreen Modulo tileHeight

// Get the color of this particular pixel in the tile
pixelColor = tileToDraw[columnTile][rowTile]
return pixelColor
EndFunction

What this function do is not very complicated.

If you know the dimension of your tiles, given any screen coordinate it is trivial to find which tile it belongs to: Just divide the x and y coordinates by the width and height of the tile, and the numbers you get are the columns and rows in the tilemap.

By using modulo4 instead of divide, the value we get is the x and y coordinates of the texel to select in the tile texture itself.

Then the rest of the code is identical: Select the correct tile from the tilemap coordinates, and get the color in the tile at the given coordinates.

Let's see how our main loop looks now:

// We now iterate through all the pixels of the tilemap 
For yScreen From 1 To tileMapHeight*tileHeight
For xScreen From 1 To tileMapWidth*tileWidth
// Get the color of this particular pixel in the tile
pixelColor = ComputePixelColor(xScreen,yScreen)
// Draw the pixel
DrawPixel(pixelColor,xScreen,yScreen)
Next xScreen
Next yScreen

The code is now much simpler because all the hard work is done in the "shader" function.

Unfortunately there is still a problem. The ComputePixelColor is using the texture array while a real shader would not be able to access an unlimited number of textures.

To make it work like a real GPU shader would work, we need to modify the code so it uses the texture atlas we described on the previous article:


Let's modify the ComputePixelColor so it uses the texture atlas instead of the array of textures.

// Our atlas contains 2x2 textures 
atlasWidth =2
atlasHeight=2

// Input: horizontal and vertical screen coordinates
// Returns: color of the pixel at these coordiantes
// Note: Uses 'tileMap' and 'textureAtlas' global values
Function ComputePixelColor(xScreen,yScreen)
// Deduce the tile coordinates from the screen coordinates
columnTilemap = xScreen / tileWidth
rowTilemap = yScreen / tileHeight

// Read the tilemap to find the number of the tile at this location
tileIndex = tileMap[columnTilemap][rowTilemap]

// From the index get the top left coordinates of the texture in the atlas
xAtlas = tileIndex Modulo atlasWidth
yAtlas = tileIndex / atlasHeight

// Deduce the internal tile coordinates from the screen coordinates
xAtlas = xAtlas + (xScreen Modulo tileWidth)
yAtlas = yAtlas + (yScreen Modulo tileHeight)

// Get the color of this particular pixel in the tile
pixelColor = textureAtlas[xAtlas][yAtlas]
return pixelColor
EndFunction

What this new variant of the code is to compute the texture coordinates of the top left pixel of the texture from the value of tileIndex and then add the internal tile texture coordinates.

This is all for today; as usual feel free to comment or ask questions!

The final article will show some actual glsl5 shader code.


1. Shading method where the colors are interpolated linearly between vertices. Attributed to Henri Gouraud
2. Rasterisation is the name used to describe any operation that convert geometrical shapes defined by points and lines (vector graphics) to actual pixels drawn on a bitmap.
3. The most common method is the scanline rendering, but for example the Dreamcast™ console benefited from a totally different method called Tile Rendering.
4. The modulo is nothing else than the rest of an integer division: 18 divided by 16 gives 1 and remains 2: the modulo
5. OpenGL Shading Language
comments powered by Disqus