Does anybody knows if there is a sort of 'load-balancer' in the erlang standard library? I mean, if I have some really simple operations on a really large set of data, the overhead of constructing a process for every item will be larger than perform the operation sequentially. But if I can balance the work in the 'right number' of process, it will perform better, so I'm basically asking if there is an easy way to accomplish this task.
By the way, does anybody knows if an OTP application does some kind of balance load? I mean, in an OTP application there is the concept of a "worker process" (like a java-ish thread worker)?
plists module probably does what you want. It is basically a parallel implementation of the
lists module, design to be used as a drop-in replacement. However, you can also control how it parallelizes its operations, for example by defining how many worker processes should be spawned etc.
You probably would do it by calculating some number of workers depending on the length of the list or the load of the system etc.
From the website:
plists is a drop-in replacement for the Erlang module lists, making most list operations parallel. It can operate on each element in parallel, for IO-bound operations, on sublists in parallel, for taking advantage of multi-core machines with CPU-bound operations, and across erlang nodes, for parallizing inside a cluster. It handles errors and node failures. It can be configured, tuned, and tweaked to get optimal performance while minimizing overhead.
pg2 implements quite simple distributed process pool.
pg2:get_closest_pid/1 returns "closest" pid, i.e. random local process if available, otherwise random remote process.
pool implements load balancing between nodes started with module
There is no, in my view, usefull generic load-balancing tool in otp. And perhaps it only usefull to have one in specific cases. It is easy enough to implement one yourself. plists may be useful in the same cases. I do not believe in parallel-libraries as a substitute to the real thing. Amdahl will haunt you forever if you walk this path.
The right number of worker processes is equal to the number of schedulers. This may vary depending of what other work is done on the system. Use,
erlang:system_info(schedulers_online) -> NS
to get the number of schedulers.
The notion of overhead when flooding the system with an abundance of worker processes is somewhat faulty. There is overhead with new processes but not as much as with os-threads. The main overhead is message copying between processes, this can be alleviated with the use of binaries since only the reference to the binary is sent. With eterms the structure is first expanded then copied to the other process.
There is no way how to predict cost of work mechanically without measure it e.g do it. Some person must determine how to partition work for some class of tasks. In load balancer word I understand something very different than in your question.