2011
← How fragile is this Internet thingCommunication protocol exploits →
  Simple C++ threading
Sun 20th February 2011   
For some reason, people tend to think that creating multi-threaded applications in C++ is a very difficult thing to do1.

It is in fact not that difficult, but there are basically two main issues to consider: How to make the code correct (it should not crash, deadlock, corrupt data, etc...) and how to make it efficiently use as many cores as possible2.

The usual suspects

One of the things I noticed is that quite often the code of the application is relatively decent, but the poor performance is due to the usage of inefficient functions called by the application.

Modern applications are not working in a vacuum; they have to allocate memory, load or save files, interact with the user through a GUI and input peripherals, they may use network connectivity, log information, encrypt data, etc... Gone are the days when a programmer would write every single component used by his application. Today you use APIs provided by third parties, and if these API's are not designed correctly your application is going to suffer.

We are now going to investigate the fancy world of Logging Systems. Let's examine how the typical logging code is done and what can be done about it.

Logs, the basic concept

Typically when you want to find out what your program is really doing you start adding printf all over the place3. Soon you realise it is not very practical, and you wish you had nicely formatted text files showing the log entries with accurate time stamps. Your program would look then like this:


#include "log.h"

void main()
{
Log("Starting application");
Log("Calling something");
Something();
Log("Calling something else");
SomethingElse();
Log("Quitting application");
}


2011-02-19 08:23:00 Starting application
2011-02-19 08:23:00 Calling something
2011-02-19 08:23:01 Calling something else
2011-02-19 08:23:03 Quitting application

This is a very basic system, but with that you have a pretty clear idea of what was called, when, and by extension - how long it took to run -. A slightly more advanced system would use millisecond timestamps, and would allow you to add categories (information, warning, error, ...) as well as advanced formatting to handle variables4.

The devil is in the implementation details

The performance of such a system may be good, or it may be terrible. It all depends of how the Log function was implemented.

The simplest way to implement it would be something like that:

void Log(const char* message)
{
handle=FileOpenForWrite("LOG.TXT");
FileWriteToEnd(handle,message,strlen(message));
FileFlushAndClose(handle);
}

Please note that this is not valid C++, it is just an example of the whole idea of adding a new line of text at the end of a text file. In the real world you would check for error conditions, use whatever file handling function you wish to, would handle the carriage return at the end, the timestamp, etc...


The subtlety here, is that most people write the Log system, test it, measure the performance on some small examples, find it satisfactory, and then run ahead using it without second thought. When they start to experience performance issues later on they will generally not think about suspecting the Log system because it was apparently good enough on the small initial tests.

What happens in the real world, is that two phenomenons happen: You may have reached some contention at the operating system level in the file buffering code, and you may be victim of thread synchronisation when you use the code in a threaded application.

So, what is wrong in this code?

Thread safety

If you use naïve code that shares common resources (in this case the file called "LOG.TXT") without any form of synchronisation, what will happen is that the instructions in the function will get interleaved. This will results in unpredictable results, which most probably would be different depending of the operating system in charge of the handling of the files, the performance of the processor, etc...

If you want to have a rough idea of what will happen, try to imagine you are having two threads trying to print "Hello world\r\n" on the screen at the same time.

Which result do you expect:
Hello world
Hello world
HHeelllloo wwoorrlldd
HeHelllloo worworlld
d
The truth is, you don't know because the result is not defined. Just don't do it. You have to protect the shared resources to avoid simultaneous access to it. Either make sure that only one thread is in charge of writing information to the file, or use some of the synchronisation primitives provided by your operating system like the critical sections and the mutex.

Performance and synchronization

Ideally you want to be able to use your logger without incurring heavy runtime performance hits. If you have to disable your logger in order to use your program then you are never going to use it.

So, how bad is the performance of our naïve Log function? Well, it depends of how often it is called, of how efficient your operating system is to write to your hard drive, but also of how much stress you have on your disk at the moment: If you are copying a multi-gigabyte file in the background it's possible that your Log function stalls for a while until it can actually write to disk.

If you want to be thread safe, you also need to have the synchronization primitive to avoid two simultaneous calls to Log to be interfering with each other, which in practice tend to make your threads to synchronize when they try to write to the log file. This is bad.

A stupid example

Let's say we want to write a stupidly inefficient, pointless and incorrect program to compute the factorial of the 50 first integers. We could do it like this:

#include "log.h"

void main()
{
Log("Computing factorial");

Log("Starting 50 threads to compute factorials");
int results[50];
for (int i=0;i<50;i++)
{
StartThread(ComputeFactorialThread,i,&results[i]);
}
Log("Waiting for all threads to finish");
WaitForAllThreadsBeingDone();

Log("Showing results");
int results[50];
for (int i=0;i<50;i++)
{
std::count "Factorial of " << i << " is equal to " << results[i];
}
Log("Quitting application");
}

void ComputeFactorialThread(int threadID,int n,int &result)
{
Log("Starting thread #" << threadID);
result=1;
while (n--)
{
Log("Adding value " << n);
result+=(n*n);
}
Log("Exiting thread #" << threadID);
}

Assuming the code is working (ie: that it does not overflow when computing 50!, that all the missing threading boilerplate is there and that the code actually compute a factorial), what it does is to start 50 threads, the first in charge of computing 1!, the second computing 2!, until the last one computing 50!.

These operations are very trivial, but since we are very insecure programmers we decided to log everything, so before multiplying each new value we decide to log the operation we are going to perform. Obviously it's never going to be fast, if only because just calling a function to display a text takes more time than multiplying two integers together. But that's not really the core of the problem: If the Log function was written correctly, it would just use the necessary time to receive the new text entry and return as fast as possible.

Instead every single Log operation is going to:
  • Try to lock the mutex (or wait until it gets available)
  • Open the file
  • Add a new line at the end of the file
  • Close the file while ensuring everything is really written to disk
  • Release the mutex
  • Return to the caller

Thread 1Thread 2Thread 3Thread 4
Wants the lockWants the lockWants the lockWants the lock
Has the lockWaitsWaitsWaits
Open File
Write To File
Close File
Releases the lock
WorkingHas the lock
Open File
Write To File
Close File
Releases the lock
WorkingHas the lock

When you have 50 threads running on micro-tasks, they will basically spend almost all their time trying to catch the mutex. As soon as it will be freed one other thread will have a chance to catch it while others still wait: Basically you just made your program single-threaded.

The solution

The solution, as often5 is pretty simple. Instead of writing to the file directly in the Log function, just make it write to a queue and have a worker thread regularly empty the queue and perform the actual writing to the disk.

At first it may sound just like deporting the problem somewhere else, but if you think about it this allows some pretty nifty optimizations: If you write each line of text as soon as you receive it, you spend your time opening and closing the file. If you do that in a worker thread instead, then you can write a bunch of entries in the log file in one single operation, that's a lot more efficient from a system and I/O point of view.

After this change, our new Log function looks like this:

void Log(const char* message)
{
AquireMutex();
Push(MainLogQueue,message);
ReleaseMutex();
}

Note: It is sometime possible to use a "Lock Free" container instead of the Lock/Push/Unlock. As long as the atomicity of the operation is guaranteed it is fine.
And of course we will need our worker thread as well:

void LogWriterThread()
{
while (IsRunning())
{
// Wait a bit
Wait(DELAY_BETWEEN_WRITES);

// Swap the queues
AquireMutex();
SwapQueue(LocalQueue,MainLogQueue);
ReleaseMutex();

// Is there something to write?
if (!LocalQueue.empty())
{
handle=FileOpenForWrite("LOG.TXT");
FileSeekToEnd(handle);
while (!LocalQueue.empty())
{
message=LocalQueue.pop_front();
FileWrite(handle,message,strlen(message));
}
FileFlush(handle);
FileClose(handle);
}
}
}

If you are wondering about the SwapQueue, it's an idiom I use very often when doing multi-threading programming. Instead of copying the content of the queue and clearing it, you just swap it out with an empty one: It takes almost no time if your container is implemented correctly. The result is that you keep the lock on the mutex for a very short period of time, so even during the regular moments when you flush the logged entries to disk, you do not lock out your other threads that try to write to the log queue.

The state of the art

I've done a small search on the examples of loggers you can find online, and they tend to be pretty bad. Most of them are pretty, seem to handle quite a lot of situations (multiple logs, template meta-programming to handle parameters, automatic creation and deletion of the logger instances, ...) but many would behave terribly in real multi-threaded usage. That being said, there are few that seem to use some form of asynchronous file writing, these ones are probably ok to use.

So, if you decided to use a logger you found on the internet, I really encourage you to take a bit of time trying to understand how it was implemented in order to avoid performance issues later on.

Better safe than sorry :)

Have fun !


1. Probably because unfortunately the efficient C++ programming style is not really what people got taught at universities. I wish university teachers had some actual experience in the real world instead of repeating the nonsensical crap found in most textbooks.
2. Obviously, if making your code correct leads to making it inefficiently use your cores then you should not bother trying to thread it at all: Just keep the code nice and simple - and single threaded.
3. Or std::cout, choose your poison
4. But this is just minor implementation details, it does not change anything to the base concept and the associated issues.
5. Some people tend to make things more complex than they really need to be. If you are not being paid by the number of lines of code you write, I advise you to try to write less code.
comments powered by Disqus