当前位置: 动力学知识库 > 问答 > 编程问答 >

amazon ec2 - MongoDB gets slower farther into a find() operation

问题描述:

I have MongoDB set up on an Amazon EC2 micro instance. There are about 7 million items in the db. I'm trying to iterate over all of them and print out some information about each item. I'm using the python wrapper to do so.

import pymongo as p

db_client = p.MongoClient()

db = db_client.my_awesome_db

photo_collection = db.photos

for photo in photo_collection.find():

print photo['attr']

I'm not storing anything in memory and the DB isn't being used by anything else.

Since the query was running long, I used limit() to estimate how long it should take. I'm seeing non-linear times, the larger I make the limit. For example,

  • limit -> time
  • 1,000 -> 1 second
  • 10,000 -> 10 seconds
  • 100,000 -> 720 seconds (~ 12 minutes)
  • 700,000 -> 9000 seconds

This isn't ridiculous, but it's larger than linear (the jump from 10k to 100k seems pretty bad). I can easily iterate over a 7 million line file in a second, but at this rate it will take 25 hours to iterate over the whole DB.

Do I have something configured wrong? Is find() not the correct function to use?

分享给朋友:
您可能感兴趣的文章:
随机阅读: