I've been thinking about
std::async and how one should use it in future compiler implementation. However, right now I'm a bit stuck with something that feels like a design flaw.
std::async is pretty much implementation dependent, with probably two variants of
launch::async, one which launches the task into a new thread and one that uses a thread-pool/task-scheduler.
However, depending one which one of these variants that are used to implement
std::async, the usage would vary greatly.
For the "thread-pool" based variant you would be able to launch a lot of small tasks without worrying much about overheads, however, what if one of the tasks blocks at some point?
On the other hand a "launch new thread" variant wouldn't suffer problems with blocking tasks, on the other hand, the overhead of launching and executing tasks would be very high.
+low-overhead, -never ever block
launch new thread:
+fine with blocks, -high overhead
So basically depending on the implementation, the way we use
std::async would wary very much. If we have a program that works well with one compiler, it might work horribly on another.
Is this by design? Or am I missing something? Would you consider this, as I do, as a big problem?
In the current specification I am missing something like
std::oversubscribe(bool) in order to enable implementation in-dependent usage of
EDIT: As far as I have read, the C++11 standard document does not give any hints in regards to whether tasks sent to
std::async may block or not.
std::async tasks launched with a policy of
std::launch::async run "as if in a new thread", so thread pools are not really supported --- the runtime would have to tear down and recreate all the thread-local variables in between each task execution, which is not straightforward.
This also means that you can expect tasks started with a policy of
std::launch::async to run concurrently. There may be a start-up delay, and there will be task-switching if you have more running threads than processors, but they should be running, and not deadlock just because one happens to wait for another.
An implementation may choose to offer an extension that allows your tasks to run in a thread pool, in which case it is up to that implementation to document the semantics.
I would expect implementations to launch new threads, and leave the thread pool to a future version of C++ that standardizes it. Are there any implementations that use a thread pool?
MSVC initally used a thread pool based on their Concurrency Runtime. According to STL Fixes In VS 2015, Part 2 this has been removed. The C++ specification left some room for implementers to do clever things, however I don't think it quite left enough room for this thread pooling implementation. In particular I think the spec still required that
thread_local objects would be destroyed and rebuilt, but that thread pooling with ConcRT would not have supported that.