Bulk Upload Millions of Documents to Mongo

By Avital Oliver

Here's a short story: A production user shows us two benchmarks that insert 50000 documents into a drove. They seem to do the same affair simply perform differently. And it seems scary — the version with Shooting star's synchronous APIs (yous know, like what ever other language other than JavaScript has?) is slower.

Has Meteor'south decision to betrayal synchronous APIs using fibers put apps in a performance trap? And what is the fastest way to insert 50000 documents into a collection in Shooting star?

(tl;dr: fibers aren't the trouble; and you should employ Mongo'south bulk insert operation)

Are fibers inherently slow?

(No.)

A month ago we were contacted past the folks at Workpop, a product Meteor app. Equally role of their developer subscription, they wanted u.s. to assist them sympathize the server performance of their app. Workpop wanted to learn more almost the performance of inserting a large set of documents into Mongo. They shared the following ii benchmarks:

          // (1) -- insert 50000 certificate asynchronouslyvar counter = 0;var start = new Date();for (i = 0; i < n; i++) { Test.insert({value: i}, office (err, res) { counter++; if (counter === 50000) { var end = new Date(); panel.log((stop - outset) + "ms"); } });          }          // (two) -- insert 50000 documents synchronously var beginning = new Appointment();for (i = 0; i < 50000; i++) { Examination.insert({value: i});}var end = new Appointment();          panel.log((end - start) + "ms");        
  • (1) uses Falling star'due south asynchronous APIs, like what y'all become by default in Node. In the callback to Exam.insert, nosotros increase a counter. Once nosotros get 50000 callbacks executed we know all the inserts are consummate.
  • (2) uses Meteor'southward synchronous APIs, like what y'all'd arrive Python, Carmine, Java, or every other popular linguistic communication. Meteor's synchronous APIs are powered by fibers, and they let you write clear, concise code. In this case, nosotros could apply a elementary for loop instead of thinking hard of how to write the callback function. Moreover, whatsoever errors are automatically handled by throwing exceptions. These APIs don't block the unabridged process like Node's builtin functions such every bit fs.readFileSync do -- other fibers run when ane is waiting for an functioning to consummate.

The trouble is that (2) turned out to be 2x-3x slower than (1). Does the conciseness you get from synchronous APIs atomic number 82 to slower lawmaking? Are fibers inherently slow?

Apples and Oranges

At start, these two examples await like they practise the same, just written with different APIs. Just in fact, they do dissimilar things.

The key thing to empathise is that when you're using synchronous APIs on top of fibers, each line of code will wait to finish executing before the adjacent one runs (while letting other fibers run). Let'due south analyze each of the two benchmarks:

  • Lawmaking snippet (2) does the following: "Transport a message to the Mongo procedure to insert a document; look for it to complete and transport the success back to the Meteor's server Node process; and then do that 49999 more times". And so the total benchmark time is well-nigh:

  • Lawmaking snippet (one) is different. The Node Mongo driver, which Shooting star uses, implements `insert` by sending a message over a socket (follow. the. code). So this case queues upwards 50000 messages over a socket which become sent to the Mongo procedure as fast equally Mongo tin read them. Mongo'southward inserts are most likely slower than the time to enqueue each message. This means that the network fourth dimension barely affects the overall benchmark time. The total benchmark time is just a little over:

To summarize, lawmaking snippet (two) actually should be much slower every bit it waits for a circular-trip to Mongo on every insert. The benchmark isn't surprising at all.

Comparing Apples to Apples

Allow's exam this theory with some lawmaking. We can write a new criterion that uses the asynchronous Mongo API simply waits for the Mongo process to respond to each insert before Shooting star's Node procedure sends the next i. This should lead to a benchmark that performs very similarly to (i) higher up. This is what it looks like:

          // (three) -- insert 50000 documents sequentially using the asynchronous APIvar i = 0;var go = function () { Test.insert({value: i}, function () { i++; if (i === 50000) { console.log((new Date - first) + "ms"); } else { get(); } });};var start = new Appointment;          go();        

On my figurer, this benchmark (3) took 31 seconds, compared to 33 seconds for (2) above. And then we have pretty much explained the consequence. (For reference, (1) took xix seconds.)

Why the 2 second difference? That would exist for another blog post, but hither's the brusk answer. When carefully measured, each call to a synchronous APIs did accept 0.025ms more than than the equivalent asynchronous API. 0.025ms — not too bad.

And then what's the best way to insert 50000 documents?

The existent answer is: None of the above. You should apply Mongo'southward "batch insert" performance. In that location's even a nifty Meteor bundle that exposes it while keeping all of the rest of Shooting star'southward Mongo machinery intact (latency bounty, customer-side string ID generation, allow/deny rules): https://atmospherejs.com/mikowals/batch-insert. When inserting the same 50000 documents above using this API, it takes 3.5 seconds. That'southward 10 times faster than the synchronous code nosotros started with and five times faster than the asynchronous code.

And if you happen to be doing multiple operations that aren't all inserts, y'all should utilise Mongo'southward Majority performance. I don't know of a Meteor package that exposes this at the moment, merely information technology shouldn't exist too hard to use collection.rawCollection.

The moral of the story

Here's the 1 almost important matter to take out of this story: When you're using synchronous APIs on meridian of fibers, each line of code volition look to stop executing earlier the adjacent one runs. Meteor makes it easy to utilise synchronous APIs, merely you lot should exist aware of how synchronous code executes. Luckily, Meteor also makes information technology easy to use asynchronous APIs — each Meteor API comes in both forms. And lastly, if you're doing a lot of pocket-sized bits of piece of work in parallel consider using a different solution that enacts all changes in i bulk operation.

Looking for help with your production Falling star app?

We love helping production users better understand their apps, and this is a neat style for us to ameliorate Meteor. Do you have questions like these? Demand product assistance? Let us know!

drinnonanduch.blogspot.com

Source: https://blog.meteor.com/inserting-50000-documents-into-a-collection-slow-fast-and-fastest-1ec00d20bb23

0 Response to "Bulk Upload Millions of Documents to Mongo"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel