I have MongoDB set up on an Amazon EC2 micro instance. There are about 7 million items in the db. I'm trying to iterate over all of them and print out some information about each item. I'm using the python wrapper to do so.
import pymongo as p
db_client = p.MongoClient()
db = db_client.my_awesome_db
photo_collection = db.photos
for photo in photo_collection.find():
I'm not storing anything in memory and the DB isn't being used by anything else.
Since the query was running long, I used
limit() to estimate how long it should take. I'm seeing non-linear times, the larger I make the limit. For example,
This isn't ridiculous, but it's larger than linear (the jump from 10k to 100k seems pretty bad). I can easily iterate over a 7 million line file in a second, but at this rate it will take 25 hours to iterate over the whole DB.
Do I have something configured wrong? Is
find() not the correct function to use?