当前位置: 动力学知识库 > 问答 > 编程问答 >

node.js - Reading file in segments of X number of lines

问题描述:

I have a file with a lot of entries (10+ million), each representing a partial document that is being saved to a mongo database (based on some criteria, non-trivial).

To avoid overloading the database (which is doing other operations at the same time), I wish to read in chunks of X lines, wait for them to finish, read the next X lines, etc.

Is there any way to use any of the fscallback-mechanisms to also "halt" progress at a certain point, without blocking the entire program? From what I can tell they will all run from start to finish with no way of stopping it, unless you stop reading the file entirely.

The issues is that because of the file size, memory also becomes an issue and because of the time the updates take, a LOT of the data will be held in memory exceeding the 1 GB limit and causing the program to crash. Secondarily, as I said, I don't want to queue 1 million updates and completely stress the mongo database.

Any and all suggestions welcome.

UPDATE: Final solution using line-reader (available via npm) below, in pseudo-code.

var lineReader = require('line-reader');

var filename = <wherever you get it from>;

lineReader(filename, function(line, last, cb) {

//

// Do work here, line contains the line data

// last is true if it's the last line in the file

//

function checkProcessed(callback) {

if (doneProcessing()) { // Implement doneProcessing to check whether whatever you are doing is done

callback();

}

else {

setTimeout(function() { checkProcessed(callback) }, 100); // Adjust timeout according to expecting time to process one line

}

}

checkProcessed(cb);

});

This is implemented to make sure doneProcessing() returns true before attempting to work on more lines - this means you can effectively throttle whatever you are doing.

网友答案:

I don't use MongoDB and I'm not an expert in using Lazy, but I think something like below might work or give you some ideas. (note that I have not tested this code)

var fs   = require('fs'),
    lazy = require('lazy'); 

var readStream = fs.createReadStream('yourfile.txt');

var file = lazy(readStream)
  .lines                     // ask to read stream line by line
  .take(100)                 // and read 100 lines at a time.
  .join(function(onehundredlines){
      readStream.pause();    // pause reading the stream
      writeToMongoDB(onehundredLines, function(err){
        // error checking goes here
        // resume the stream 1 second after MongoDB finishes saving.
        setTimeout(readStream.resume, 1000); 
      });
  });
}
分享给朋友:
您可能感兴趣的文章:
随机阅读: