当前位置: 动力学知识库 > 问答 > 编程问答 >

python - Huge file as the data source for mincemeat.py

问题描述:

I am planning to use mincemeat.py for my map reduce task on a ~100GB file. After seeing the example code from mincemeat, it seems I need to input an in-memory dictionary as the data source. So, what is the right way to provide my huge file as the data source for mincemeat?

Link to mincemeat: https://github.com/michaelfairley/mincemeatpy

网友答案:

Looking at the example and the concept I would have thought that you would ideally:

  1. Produce an iterator for the data source,
  2. Spilt the file into a number of meerly large files on a number of servers and then
  3. Merge the results.
分享给朋友:
您可能感兴趣的文章:
随机阅读: