../
2016-05-31: The /* Programming Comments */ documents have moved, and will no longer be updated or maintained at this location! Please update your bookmarks: http://www.ccoderun.ca/programming/.

Summary

Tick, tock, tick, tock... How much can you get done in a second? While I know how long a second is, and I have some idea as to how long a millisecond is, I found I was always forgetting and having to look up the units of measurement smaller than milliseconds. Here is an easy place to look them up:

Worded differently in case it is useful to you, that same list becomes:

At the time I wrote this in mid-2013, on an Intel/AMD 3.4 GHz desktop running Ubuntu 13.04-64bit, nanoseconds is the finest scale available. And even then, as you'll see, there are some important limitations.

Available calls

Seconds

When working with seconds, there are a few common calls available. The standard time-related calls from 40 years ago in C of course had time(3) which worked with seconds as the basic unit of time. To observe 1 second elapse, the following calls can be made:

Milliseconds

The next scale down -- milliseconds -- has the following example calls:

Microseconds

The next scale down -- microseconds -- has the following example calls:

Note that usleep(3) has been obsoleted. See the man page for details.

Nanoseconds

And at the finest end of the scale we have nanoseconds:

How long is a nanosecond

Let's drop immediately down to the most precise unit of time available on the GNU C++11 compiler I'm using. A nanosecond is 1/1,000,000,000 of a second. To attempt to put some meaning behind this number, it means there are as many nanoseconds in 1 second as there are seconds in 31.7 years.

Place 2 high-resolution timestamps side-by-side using C++11 (though the same calls can be made with Boost) and see how much time elapses between 2 function calls:

for ( size_t iteration = 0; iteration < 500; iteration ++ ) { std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now(); std::chrono::high_resolution_clock::time_point end = std::chrono::high_resolution_clock::now(); std::chrono::nanoseconds duration = end - start; std::cout << "time elapsed: " << duration.count() << " ns" << std::endl; }

On my 3.4 GHz desktop, I typically see between 40 ns and 43 ns have elapsed between the two consecutive calls to now():

Of course, these results are specific to systems similar to mine. I wouldn't expect a 1 GHz system to perform the same way.

Yielding

If a yield() call is introduced between the timestamps like this:

for ( size_t iteration = 0; iteration < 500; iteration ++ ) { std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now(); std::this_thread::yield(); std::chrono::high_resolution_clock::time_point end = std::chrono::high_resolution_clock::now(); std::chrono::nanoseconds duration = end - start; std::cout << "time elapsed: " << duration.count() << " ns" << std::endl; }

I see results of ~ 440 ns between the two calls:

Remember that yield doesn't cause the thread to sleep. It just loses the remainder of the current timeslice. I suspect that on a busy system running a multi-threaded application, calling yield() could introduce much longer delays than when I tested this on my mostly-idle desktop.

Sleeping

Sleeping with nanoseconds resulted in surprising numbers. Since consecutive calls to get a timestamp showed me that at best I was seeing a minimum of 40 ns, I certianly didn't expect that sleeping for any number of ns less than 40 would be precise. But this code:

for ( size_t iteration = 0; iteration < 500; iteration ++ ) { std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now(); std::this_thread::sleep_for( std::chrono::nanoseconds( 1 ) ); std::chrono::high_resolution_clock::time_point end = std::chrono::high_resolution_clock::now(); std::chrono::nanoseconds duration = end - start; std::cout << "time elapsed: " << duration.count() << " ns" << std::endl; }

...showed that attemping to sleep for 1 ns actually took between 54,000 and 55,000 nanoseconds! My results looked like this:

Turns out this is the price to pay to make a system call into the kernel, remove the thread from the run queue, sleep for the specified duration, and place the thread back onto a queue where it can run. To put the numbers into context:

Changing the sleep from 1 ns to 0 ns didn't have an effect. But using a negative value did have a big effect:

for ( size_t iteration = 0; iteration < 500; iteration ++ ) { std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now(); std::this_thread::sleep_for( std::chrono::nanoseconds( -1 ) ); std::chrono::high_resolution_clock::time_point end = std::chrono::high_resolution_clock::now(); std::chrono::nanoseconds duration = end - start; std::cout << "time elapsed: " << duration.count() << " ns" << std::endl; }

The call to sleep_for( -1 ) seems to short circuit and (relatively) quickly return to the caller, because this gave me the following results:

So in essense, this causes a few CPU cycles to be wasted without having put the thread into proper sleep.

CPU Frequencies

Having an idle desktop while running these timing tests negatively affected the output. Modern CPUs of course use frequency scaling to save power when a system isn't in heavy use. I installed cpufrequtils and forced my system to use the highest cpu frequency:

sudo apt-get install cpufrequtils
sudo cpufreq-set -r -g performance

Once I did that, the number of nanoseconds between consecutive calls dropped from 40 ns to less than 20 ns:

The call to yield() dropped from ~ 440 ns to 175+ ns:

The call to sleep_for( 1 ns ) took 51,600+ ns instead of the previous 54,000+ ns:

And the call to sleep_for( -1 ns ) dropped from ~ 145 ns to ~ 60 ns:

Last modified: 2014-09-28
Stéphane Charette, stephanecharette@gmail.com
../