Why DocumentDb Makes Good Business Sense for Some Projects

The data explosion that we've been experiencing for over a decade now has been pushing the limits of data management technologies that have been used since the 70's. Relational Data Management Systems (RDBMS) have been the de facto standard when it came to managing structured data and for good reason. They've provided a reliable and cost-effective way to not only store and query but more importantly manage our data -- provided that the database has a well designed schema.

However, what makes relational databases great can also be their achilles heel. Because through normalization we make relational databases our partner in managing data, we can end up with performance and scalability issues pretty quickly if the database is supporting a large application with lots of users and volumes of data. With that said, a relational database could be a perfect solution for a smaller application.

Let me talk about our specific scenario so that I don't just make some generic and open-ended statements that are bound to spark heated debates. Let me define our application, its goals and objectives then tie them into why we needed a data backend that is not purely a relational database.

We've been hard at work building an application that will store really, really, really large amounts of data and support a very large body of users -- provided that we do our part right, both technology and marketing-wise. The application is called Ingrid. Maybe a quick introduction of Ingrid is in order here. I'll keep it short and sweet so that I don't turn this article into a shamless plug.

Ingrid will help you manage your "life"... well, that ought to do it for now...

Let's now define Ingrid from a technology standpoint. It's an application that runs in the cloud and has a web user interface. Like any good app these days, it has mobile apps to make it more appealing to those who don't always have the luxury of being in front of their computers with at least two really large monitors.

With that said, I had three primary concerns about the data backend:

  1. Performance
  2. Scalability
  3. Management

Because we primarily use the technology stack offered by Microsoft, our choice for a relational database has been SQL Server which, I think, is a great product/technology.

However, with SQL Server, scaling up can be expensive and scaling out, pretty darn challanging.

I was also terribly concerned about managing the underlying platform. Call me a ______!!! (you're free to choose the adjective/noun combo of your choice) but I really don't like the idea of shifting some of my and my company's focus to making sure the underlying platform is up and running which I see as a "necessary evil". It shouldn't be our job to make sure the underlying infrastructure is happy and healthy because we have bigger fish to fry i.e. create a really useful application that people will love.

While processing all these concerns in my mind, I was getting more and more intrigued by the NoSQL movement that has been gaining momentum. If you haven't already, you owe it to yourself to watch this fantastic "Introduction to NoSQL" by Martin Fowler. I also consulted with some experts to get their opinions e.g. Ike Ellis and a few more. And while I was pondering on these ideas, Microsoft released the first beta of Azure DocumentDb which is now available as a service.

If one wants to consider the benefits of NoSQL databases, I think the following three are the most important ones:

  1. Performance
  2. Scalability
  3. Flexibility due to schema-less approach

As you can see, out of the three benefits NoSQL databases offer, the top two are directly addressing my top two concerns. There's only one more concern I needed to address which is management -- truth is I'm a very concerned person and I'm simplifying things a bit here. I'm also concerned about things like world peace, extinction of certain species and the social awkwardness problem the new generation seems to be suffering from.

There are a few pretty popular and well capable NoSQL platforms out there e.g. MongoDB, RavenDB, CouchDB to name a few. Some of these technologies are considered mainstream among NoSQL adopters. So, why did we choose DocumentDB then? The short answer is "it's a service". Here's what that means:

  1. You don't have to manage the underlying infrastructure e.g. availability of servers and resources, updates and upgrades, bandwidth, etc. are no longer your concern. Let truly qualified people worry about them and manage them.
  2. Pay for what you use. For example, you may be utilizing 50% of your server but paying for 100% of it. With DocumentDb, I love the idea of paying for what we use.
  3. Growth is less painful. Let's be honest, with growth, there's always pain and I really appreicate it if someone or something can take away some of that pain. I like the idea that when we need more storage for our data or faster response times, we won't have to consult database experts or purchase more hardware. We simply put a little bit more money into Microsoft's bank account. 

So, let's sum this up...

I figured some non-technical people may read this article as well as techies. So, I'll try to give you why we picked DocumentDb both in technical and non-technical terms:

BenefitTechnical ExplanationNon-Technical Explanation

More simultaneous hits to the database, smaller latency. Microsoft offers different performance levels so that you can pick the one that is right for your scenario. Here's the documentation about performance levels for DocumentDb.

Also, because you're not normalizing your data, you won't have JOINs to slow down your database performance.

More customers don't make it go slower!
Scalability There's really no limit to your database size. Collections in DocumentDb currently support up to 10GB of storage. As your needs grow, you can add more collections to your database. With some good design approaches, there would be no performance penalty. You can keep more information in your database and you won't have to buy bigger hard drives. You'll just have to write bigger checks to Microsoft.
Management Azure team will take care of things like upgrades to the platform, indexing of your data, system resource management. You can now focus on your data and your application. You won't have to hire expensive database and network admins to manage your servers.
Schema-less Database

As your application and data structures evolve, you won't have to worry about modifying your database schema.

You can have two similar records (they're called documents in DocumentDb) with different fields/properties stored side by side inside DocumentDb.

When you change the game on your developers because you're trying to respond to "market trends", your "computer guys" will be able to get the job done faster.

It is important to note that DocumentDb is a new technology and offering from Microsoft. The DocumentDb team is still adding new features and tweaking the system performance. Being an early-adopter, we did pay a price due to limited documentation, lack of code samples and even best practices. Though, I have to mention that folks in DocumentDb team have been pretty responsive to our questions.

One of the most challanging things I personally experienced was changing my mindset from a relational database design to NoSQL. I even had the tendency to use collections in DocumentDb as "tables" and tried to come up with ways to enforce referential integrity. If you're new to NoSQL way of doing things, it would be hugely beneficial to learn about how data are stored and managed in NoSQL databases. I'm currently writing an article about that which I will publish soon.

Hopefully, I was able to provide some useful information about why NoSQL -- Azure DocumentDb in particular -- can be a good solution for a growing application. Thank you all for reading this article...