Programming Comments - How to do bandwidth control

Summary

I've never before tried to limit a transfer rate to a specific value. While I've worked in the past on devices that perform rate control, I hadn't been involved in that part of the code, and didn't understand how it works.

Turns out the solution is relatively easy to implement. It requires a bit of math, std::chrono, and breaking one large write call into smaller pieces.

Math

From previous code, I know that 14 to 20 nanoseconds is the expected smallest consecutive timestamps I can expect to see from Linux on a relatively new 3.4 GHz CPU. And with further testing I've performed since then on Windows using Microsoft Visual Studio, the std::chrono timestamps are actually much coarser.

Knowing this, I decided to use 1 millisecond time durations to control bandwidth. This means 1000 possible corrections per second, which also helps reduce loading the early part of each second. All the math is done using nanoseconds, but the eventual goal is 1 millisecond time slices.

Reminder:

1 second == 1,000 milliseconds
1 second == 1,000,000 microseconds
1 second == 1,000,000,000 nanoseconds

There are 2 configuration values needed before we run the math:

payload size; for this example, use 1200 byte buffers
bandwidth in bps; for this example, use 10 Mbps

The trick is to figure out how many payload-sized writes to perform in each 1 millisecond time slice. Once the number of buffers have been written, we then sleep until the start of the next time slice.

I also briefly toyed with sleeping for a smaller equal number of nanoseconds between each written buffer, but this introduced too much overhead to be usable at high speeds.

If the bandwidth is high enough, or the payload size is small enough, then many writes may be necessary per time slice.

The following code figures out the best length of each time slice, and scales it up when necessary to be greater than or equal to 1 millisecond:

const size_t    payload_size        = 1200;
const uint64_t  bits_per_second     = 10000000; // 10 Mbps
const uint64_t  bytes_per_second    = bits_per_second / 8;
uint64_t        time_slice_in_ns    = 1000000000; // 1 second == 1,000,000,000 ns

time_slice_in_ns *= payload_size;
time_slice_in_ns /= bytes_per_second;

// if the time slice is too short (less than 1 millisecond) we need to write multiple buffers
uint64_t factor = 1;
if ( time_slice_in_ns < 1000000 ) // 1,000,000 nanoseconds == 1 millisecond
{
    // figure out a multiplication factor to use to get us closer to 1 millisecond
    factor = std::ceil( 1000000.0 / time_slice_in_ns );
    time_slice_in_ns *= factor;
}
const size_t number_of_buffers_to_write = factor;

std::cout   << "Need to write "
            << number_of_buffers_to_write  << " "
            << payload_size                << "-byte buffer(s) every "
            << time_slice_in_ns            << " nanoseconds to achieve "
            << bits_per_second             << " bps"
            << std::endl;
	

By the end of this code, time_slice_in_ns tells us the exact length of each time slice (should be approximately 1 millisecond), while number_of_buffers_to_write contains the exact number of payload-sized buffers which must be written during each time slice.

Examples

Several examples to show how this works:

1st example: Given a payload size of 1200 bytes and rate control of 10 Mbps:

bits_per_second == 10,000,000
bytes_per_second == 1,250,000
time_slice_in_ns == 960,000
since the time slice is less than 1 millisecond in length:
- factor == 2
- time_slice_in_ns == 1,920,000 (1.92 milliseconds)
- number_of_buffers_to_write == 2
So 2 × 1200 bytes every 1.92 millisecond works out to:
- 1,250,000 Bps
- 10 Mbps
- 1042 buffers per second, each of which is 1200 bytes in length.

2nd example: Given a payload size of 1000 bytes and rate control of 150 Mbps:

bits_per_second == 150,000,000
bytes_per_second == 18,750,000
time_slice_in_ns == 53,333
since the time slice is less than 1 millisecond:
- factor == 19
- time_slice_in_ns == 1,013,327 (1.013 milliseconds)
- number_of_buffers_to_write == 19
So 19 × 1000 bytes every 1.013327 milliseconds works out to:
- 18,750,117 Bps
- 150 Mbps
- 18750 buffers per second, each of which is 1000 bytes in length.

3rd example: Slower speeds will have a factor of 1. Given a payload size of 1234 bytes and rate control of 56 Kbps:

bits_per_second == 56,000
bytes_per_second == 7,000
time_slice_in_ns == 176,285,714
So 1234 bytes every 176.285714 milliseconds works out to:
- 7000 Bps
- 56 Kbps
- 5.7 buffers per second, each of which is 1234 bytes in length.

Writing the payload

Now that we know the amount of data to write and how frequently, the rest becomes quite simple. Pick a starting timestamp, write the required number of buffers, then sleep until the next timeslice begins. For example:

#include <chrono>
#include <thread>
...

// convert our time_slice_in_ns into a high resolution time duration
const std::chrono::high_resolution_clock::duration length_of_time_slice = std::chrono::duration_cast(std::chrono::nanoseconds(time_slice_in_ns));

// calculate the exact time at which the next time slice is scheduled to start
std::chrono::high_resolution_clock::time_point next_time_point = std::chrono::high_resolution_clock::now() + length_of_time_slice;

size_t buffer_counter = 0;
while ( not_done )
{
	buffer_counter ++;
	write_buffer( ... );

	if ( buffer_counter >= number_of_buffers_to_write )
	{
		buffer_counter = 0;
		const std::chrono::high_resolution_clock::time_point now = std::chrono::high_resolution_clock::now();
		if ( now < next_time_point )
		{
			std::this_thread::sleep_until( next_time_point );
		}
		next_time_point += length_of_time_slice;
	}
}
	

Several things to note:

This writes out a fixed number of buffers per time slice, and sleeps when necessary until the next time slice is scheduled to begin.
Each buffer must be exactly payload_size bytes in length, otherwise the math falls apart. If buffers are of different length, then obviously track the number of bytes instead of the number of buffers.
By using sleep_until() with an absolute time point instead of sleep_for() with a fixed duration, this ensures that overhead is automatically taken into account and prevents time drift.