当前位置: 动力学知识库 > 问答 > 编程问答 >

algorithm - How to actually use merge sort for large data sets

问题描述:

How to actually use merge sort for large data sets?

Suppose that I have several sorted files with the following data:

1.txt

1

2

2

2.txt

3

4

5

3.txt

1

1

1

Suppose that we can't hold all files' contents in memory at the same time (let's say we can hold only two numbers from each file).

I heard that I can use some kind of R-way merge sort in this case but I don't understand how can I actually do it.

As you see, the first iteration will give us the following sorted sequence:

1 1 1 2 3 4

, so we flush it to the output file. However, we will get 1 again (from the 3.txt file) on the next iteration, so the whole resulting sequence is wrong!

网友答案:

Start by filling as many variables as you have files, one variable attached to one file. At each step find the lowest value of the three variables, and flush it to the output while filling it again from the same file.

| 1.txt | 2.txt | 3.txt |
| 1     | 3     | 1     | output 1 refill from file 1
| 2     | 3     | 1     | output 1 refill from file 3
| 2     | 3     | 1     | output 1 refill from file 3
| 2     | 3     | 1     | output 1 refill from file 3
| 2     | 3     | nil   | output 2 refill from file 1
| 2     | 3     | nil   | output 2 refill from file 1
| nil   | 3     | nil   | output 3 refill from file 2
| nil   | 4     | nil   | output 4 refill from file 2
| nil   | 5     | nil   | output 5 refill from file 2
| nil   | nil   | nil   | end
网友答案:

I heard that I can use some kind of R-way merge sort in this case but I don't understand how can I actually do it.

N-way merges are quite easy to explain. You open all the files, get the first element from each and put them into a heap. The algorithm then proceeds by getting the smallest element from the heap (pop), writing it to your output buffer and then read the next element from the file this item originated from. Repeat until all files are empty.

分享给朋友:
您可能感兴趣的文章:
随机阅读: