I have approx. 10 million
Article objects in a Mongoid database. The huge number of
Article objects makes the queries quite time consuming to perform.
As exemplified below, I am registering for each week (e.g. 700 days from now .. 7 days from now, 0 days from now) how many articles are in the database.
But for every query I make, the time consumption is increased, and Mongoid's CPU usage quickly reaches +100%.
articles = Article.where(published: true).asc(:datetime)
days = Date.today.mjd - articles.first.datetime.to_date.mjd
days.step(0, -7) do |n|
current_date = Date.today - n.days
previous_articles = articles.lt(datetime: current_date)
previous_good_articles = previous_articles.where(good: true).size
previous_bad_articles = previous_articles.where(good: false).size
Is there a way to save the
Article objects to memory, so only need to call the database on the first line?
A MongoDB database is not build for that.
I think the best way is to run daily a script that creates your data for that day and save it in a Redis Database http://www.redis.io
Redis stores your data in the server memory, so you can access it every time of the day. And is very quick.
Don't Repeat Yourself (DRY) is a best-practice that applies not only to code but also to processing. Many applications have natural epochs for summarizing data, a day is a good choice in your question, and if the data is historical, it only has to be summarized once. So you reduce processing of 10 million Article document down to 700 day-summary documents. You need special code for merging in today if you want up to the moment accurate data, but the previous savings is well worth the effort.
I politely disagree with the statement, "A MongoDB database is not build for that." You can see from the above that it is all about not repeating processing. The 700 day-summary documents can be stored in any reasonable data store. Since you already are using MongoDB, simply use another MongoDB collection for the day summaries. There's no need to spin up another data store if you don't want to. The summary data will easily fit in memory, and the reduction in processing means that your working set size will no longer be blown out by the historical processing.