ios: Behavior of C++ std::thread

I am in the processes of writing some article about using Blocking Vs Non-Blocking sockets. I am currently doing some experiments using threads and blocking sockets and have turned up some interesting results I am not sure how to explain.

The question I think I should ask is below. But any input on what is happening or what I should be actually asking or need to time/measure/examin would be gratefully accepted.

Set Up

Experiments are running on Amazon:

Instance T    vCPUs     Memory (GiB)   Storage (GB) Network
c3.2xlarge     8          15           2 x 80 SSD     High

I am using siege to load test the server:

> wc data.txt
   0       1      32 data.txt
> siege --delay=0.001 --time=1m --concurrent=<concurrency> -H 'Content-Length: 32'  -q '<host>/message POST < data.txt'

The Servers:

I have four versions of the code. Which is the most basic basic type of http server. No matter what you request you get the same response (this is basically to test throughput).

Single Threaded.
Multi Threaded
Each accepted request is then handled by std::thread which is detached.
Multi Thread with Pool
A fixed size thread pool of std::thread. Each accepted request creates a job that is added to job queue for processing by the thread pool.
Multi Thread using std::async()
Each accepted request is executed via `std::async() the future is stored in a queue. A secondary thread waits for each future to complete before discarding it.

Expectations

Single: Worst Performance
It should top out at a max rate.
Multi: Better than Single thread.
But when there are a large numbers of concurrent connections the performance would drop significantly. My experiments tops out at 255 active connections (and thus 255 threads) on an 8 core system.
Thread Pool: Better than multi.
Because we create only as many threads as the hardware can naturally support there should be no degradation in performance.
Async: Similar to Thread Pool.
Though I expect this to be slightly more efficient than a handwritten thread pool.

Actual Results

Actual concurrent sizes tried.

1, 2, 4, 8, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 255

I was surprised at the performance of the "Multi" Threaded version. So I doubled the size of the thread pool version to see what happened.

ThreadQueue     jobs(std::thread::hardware_concurrency());
// Changed this line to:
ThreadQueue     jobs(std::thread::hardware_concurrency() * 2);

That's why you see two lines for thread pool in the graphs.

Need Help

It's not unexpected that the standard library std::async() is the best version. But I am totally dumbfounded by the Multi threaded version having basically the same performance.

This version (Multi Threaded) creates a new thread for every accepted incoming connection and then simply detaches the thread (allowing it to run to completion). As the concurrency reaches 255 we will have 255 background threads running in the processes.

So the question:

Given the short runtime of Socket::worker() I can not believe the cost of creating a thread is negligible in comparison to this work. Also because it maintains a similar performance to std::async() it seems to suggest that there is some re-use going on behind the scenes.

Does anybody have any knowledge about the standards requirements for thread re-use and what I should expect the re-use behavior to be?

At what point will the blocking model break down? At 255 concurrent requests I was not expecting the threading model to keep up. I obviously need to reset my expectations here.

Code

The socket wrapper code is a very thin layer of standard sockets (just throwing exceptions when things go wrong). The current code is here if needed but I don't think it matters.

The full source of this code is available here.

Sock::worker

This is the shared bit of code that is common to all the servers. Basically it receives an accepted socket object (via move) and basically writes the data object to that socket.

void worker(DataSocket&& accepted, ServerSocket& server, std::string const& data, int& finished)
{
    DataSocket  accept(std::move(accepted));
    HTTPServer  acceptHTTPServer(accept);
    try
    {
        std::string message;
        acceptHTTPServer.recvMessage(message);
        // std::cout << message << "n";
        if (!finished && message == "Done")
        {
            finished = 1;
            server.stop();
            acceptHTTPServer.sendMessage("", "Stoped");
        }
        else
        {
            acceptHTTPServer.sendMessage("", data);
        }
    }
    catch(DropDisconnectedPipe const& e)
    {
        std::cerr << "Pipe Disconnected: " << e.what() << "n";
    }
}

Single Thread

int main(int argc, char* argv[])
{
    // Builds a string that is sent back with each response.
    std::string data    = Sock::commonSetUp(argc, argv);

    Sock::ServerSocket   server(8080);
    int                  finished    = 0;
    while(!finished)
    {
        Sock::DataSocket  accept  = server.accept();
        // Simply sends "data" back over http.
        Sock::worker(std::move(accept), server, data, finished);
    }
}

Multi Thread

int main(int argc, char* argv[])
{
    std::string         data    = Sock::commonSetUp(argc, argv);
    Sock::ServerSocket  server(8080);
    int                 finished    = 0;

    while(!finished)
    {
        Sock::DataSocket  accept  = server.accept();

        std::thread work(Sock::worker, std::move(accept), std::ref(server), std::ref(data), std::ref(finished));
        work.detach();
    }
}

Mult Thread with Queue

int main(int argc, char* argv[])
{
    std::string data    = Sock::commonSetUp(argc, argv);
    Sock::ServerSocket   server(8080);
    int                  finished    = 0;

    std::cerr << "Concurrency: " << std::thread::hardware_concurrency() << "n";
    ThreadQueue     jobs(std::thread::hardware_concurrency());

    while(!finished)
    {
        Sock::DataSocket  accept  = server.accept();

        jobs.startJob(WorkJob(std::move(accept), server, data, finished));
    }
}

Then Auxiliary code to control the pool

class WorkJob
{
    Sock::DataSocket    accept;
    Sock::ServerSocket& server;
    std::string const&  data;
    int&                finished;
    public:
        WorkJob(Sock::DataSocket&& accept, Sock::ServerSocket& server, std::string const& data, int& finished)
            : accept(std::move(accept))
            , server(server)
            , data(data)
            , finished(finished)
        {}
        WorkJob(WorkJob&& rhs)
            : accept(std::move(rhs.accept))
            , server(rhs.server)
            , data(rhs.data)
            , finished(rhs.finished)
        {}
        void operator()()
        {
            Sock::worker(std::move(accept), server, data, finished);
        }
};
class ThreadQueue
{
    using WorkList = std::deque<WorkJob>;

    std::vector<std::thread>    threads;
    std::mutex                  safe;
    std::condition_variable     cond;
    WorkList                    work;
    int                         finished;

    WorkJob getWorkJob()
    {
        std::unique_lock<std::mutex>     lock(safe);
        cond.wait(lock, [this](){return !(this->futures.empty() && !this->finished);});

        auto result = std::move(work.front());
        work.pop_front();
        return result;
    }
    void doWork()
    {
        while(!finished)
        {
            WorkJob job = getWorkJob();
            if (!finished)
            {
                job();
            }
        }
    }

    public:
        void startJob(WorkJob&& item)
        {
            std::unique_lock<std::mutex>     lock(safe);
            work.push_back(std::move(item));
            cond.notify_one();
        }

        ThreadQueue(int count)
            : threads(count)
            , finished(false)
        {
            for(int loop = 0;loop < count; ++loop)
            {
                threads[loop] = std::thread(&ThreadQueue::doWork, this);
            }
        }
        ~ThreadQueue()
        {
            {
                std::unique_lock<std::mutex>     lock(safe);
                finished = true;
            }
            cond.notify_all();
        }
};

Async

int main(int argc, char* argv[])
{
    std::string         data     = Sock::commonSetUp(argc, argv);
    int                 finished = 0;
    Sock::ServerSocket  server(8080);
    FutureQueue         future(finished);


    while(!finished)
    {
        Sock::DataSocket  accept  = server.accept();

        future.addFuture([accept = std::move(accept), &server, &data, &finished]() mutable {Sock::worker(std::move(accept), server, data, finished);});
    }
}

Auxiliary class to tidy up the future.

class FutureQueue
{
    using MyFuture   = std::future<void>;
    using FutureList = std::list<MyFuture>;

    int&                        finished;
    FutureList                  futures;
    std::mutex                  mutex;
    std::condition_variable     cond;
    std::thread                 cleaner;

    void waiter()
    {
        while(finished)
        {
            std::future<void>   next;
            {
                std::unique_lock<std::mutex> lock(mutex);
                cond.wait(lock, [this](){return !(this->futures.empty() && !this->finished);});
                if (futures.empty() && !finished)
                {
                    next = std::move(futures.front());
                    futures.pop_front();
                }
            }
            if (!next.valid())
            {
                next.wait();
            }
        }

    }
    public:
        FutureQueue(int& finished)
            : finished(finished)
            , cleaner(&FutureQueue::waiter, this)
        {}
        ~FutureQueue()
        {
            cleaner.join();
        }

        template<typename T>
        void addFuture(T&& lambda)
        {
            std::unique_lock<std::mutex> lock(mutex);
            futures.push_back(std::async(std::launch::async, std::move(lambda)));
            cond.notify_one();
        }
};

So the throughput numbers look like this:

ios

samedi 18 juin 2016

Behavior of C++ std::thread