An introduction to MongoDB

MongoDB is a cross-platform, document-oriented database. MongoDB is a NoSQL database and eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas which is called BSON (Binary JSON), making the integration of data in certain types of applications easier and faster. MongoDB is free and open-source software released under a combination of General Public License and the commercial license. MongoDB is developed and supported by 10gen.

MongoDBs developers “10gen” define it as a Scalable, Open Source and High performance document oriented database. Here in this blog we will try to ellaborate on these definitions by the developers.

What is a Document Oriented Database?
The first thing to know about Document oriented databases is that they are classified under NoSQL Databases. NoSQL database is one of the types of databases used in computing, with other major types being Relational Database Management Systems or RDBMS and Online Analytical Processing (OLAP) .

All database systems tend to fall under either of the above three categories. While RDBMS and OLAP are fairly matured, NoSQL is a much recent development.

A little history:
Back in the day, when databases where nothing more than flat files, the major problem was the lack of standard data management techniques. This meant that enormous efforts be out into coding so as to access or update the data on flat files. This all changed with introduction of a relational structure for databases in 1969 and thus we started querying the database instead of manual coding.

So RDBMS helped us easily maintain databases and even updated them. No hard coding was required, so a lot of time and effort was saved. The problem started when organizations started gathering enormous amounts of data in their databases. Due to the inherent nature of RDBMS (non-scalable horizontally) and how they work, they became painfully slow and could not handle Big-Data and its needs (All major RDBMS have same issues, be it Oracle or others). This became more evident with the advent of big-data and its increased importance.

The birth of NoSQL:

This incompetence of RDBMS to handle Big-Data lead to a NoSQL databases, which improvised on the shortcomings of Relational Databases, that was horizontal scaling. Horizontal scalability means adding more power by adding more nodes. Or in simpler terms, if we add more computers to the network, the network grows more powerful. This was not the case with RDBMS. When the need for Big Data arose, more powerful resources were needed at a minimal cost. RDBMS thus gave way to NoSQL, where we could simply add more commodity computers as and when required, to boost the power.

NoSQL databases were further classified into three types. We underline the brief difference here:

1. Key Value Store : In this type of NoSQL database, information is stored in a pair of Key and corresponding value. Examples include Memcached, Redis etc

2. Tabular : These databases use a tabular structure to store data, much like most RDBMS databases. Major examples are BigTable from Google and HBase from Apache.

3. Document Oriented Databases: Data is stored in a document format instead of tables or pairs. Examples include MongoDB and CouchDB

Major differences between RDBMS and NoSQL Databases:

RDBMS support Joins whereas the same are missing from NoSQL. Similarly, there is no support for complex, conditional transactions in NoSQL databases. That means, we cannot apply conditional queries and then revert the transactions if the same fails. Another major feature missing from NoSQL databases is Constraints support. The constraints are implemented at the Application Level instead of Database level.

While these may seem to be very important things that are missing, one would ask the question that why should i choose NoSQL over RDBMS when it has such vital features missing?The answer is that NoSQL databases makes up for these features by including certain additional features that make sure that we loose no functionality.

NoSQL provides us the same querying capability of traditional databases. While this is not a feature i like to point this out in case the name NoSQL misleads. The improvements begin with performance of these query languages over RDBMS and are Horizontally Scalable at the same time. This allows NoSQL databases to handle much more data at a very fast pace.

When we get down to compare the two, RDBMS and NoSQL, we find that although RDBMS systems have much more functionality than NoSQL, they lag behind NoSQL by a large margin when we compare their performances. So, certainly there is a trade-off when choosing which one to use.

How is querying different in MongoDB:

Doing a quick comparison, we find that a TABLE in RDBMS is comparable to a COLLECTION in MongoDB and an instance of a TABLE is comparable to a document in a COLLECTION. A big advantage of Document oriented databases over RDBMS is that it allows multiple values to be stored for each field. Thus, one to many relationships are very easily implemented in Document oriented databases such as MongoDB, which is not the case with RDBMS in general. Another great feature of MongoDB is the ability to implement Nested structures. This allows us to implement embedded data model where objects are embedded in other objects very easily. Say for example, we have an entry for an Employee named Jim who has two addresses on record. Each address in itself is an independent object. MongoDB allows for us to define independent objects for each of these addresses, like Phone Number, Street etc.

Similar to RDBMS where we have Structured Query Languages to modify data, MongoDB and other NoSQL databases offer the same. Here is an example of the same:

db.employee.find({_id:123});

This query simply looks in the collection titled “Employee” within the database for the record where the “id” is 123.

Wildcard support is also available in the queries as with RDBMS. For example:

db.employee.find().sort({name:1})

This queries the database to find everything in the collection “Employee” and sort it by the “name” field

Main features of MongoDB:

Instead of taking a business subject and breaking it up into multiple relational structures, MongoDB can store the business subject in the minimal number of documents. For example, instead of storing title and author information in two distinct relational structures, title, author, and other title-related information can all be stored in a single document called Book, which is much more intuitive and usually easier to work with. Here we list some prominent features that set MongoDB apart from other databases.

1. Ad hoc queries:
MongoDB supports search by field, range queries, regular expression searches. Queries can return specific fields of documents and also include user-defined JavaScript functions.
e.g. db.employee.find().sort({name:1})

2. Indexing:
Any field in a MongoDB document can be indexed (indices are conceptually similar to those in RDBMSes). Secondary indices are also available.

3. Master-Slave Replication:
MongoDB provides high availability with replica sets. A replica set consists of two or more copies of the data. Each replica set member may act in the role of primary or secondary replica at any time. The primary (or Master) replica performs all writes and reads by default. Secondary (or Slave) replicas maintain a copy of the data on the primary using built-in replication. When a primary replica fails, the replica set automatically conducts an election process to determine which secondary should become the primary. Secondaries can also perform read operations, but the data is eventually consistent by default.

4. Data Duplication:
MongoDB offers data duplication since the data is stored over a distributed network, which is a necessity in case of Big-Data. This helps keep the system up and running in case of a hardware failure.

5. Load balancing:
MongoDB scales horizontally using sharding. The user chooses a shard key, which determines how the data in a collection will be distributed. The data is split into ranges (based on the shard key) and distributed across multiple shards. MongoDB can run over multiple servers, balancing the load and/or duplicating data to keep the system up and running in case of hardware failure. Automatic configuration is easy to deploy, and new machines can be added to a running database.

6. File storage:
MongoDB can be used as a file system, taking advantage of load balancing and data replication features over multiple machines for storing files. This function is called GridFS. MongoDB exposes functions for file manipulation and content to developers. GridFS is used, for example, in plugins for NGINX and lighttpd. Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, and stores each of those chunks as a separate document. In a multi-machine MongoDB system, files can be distributed and copied multiple times between machines transparently, thus effectively creating a load-balanced and fault-tolerant system.

7. Server-side JavaScript execution:

JavaScript can be used in queries, aggregation functions (such as MapReduce), and sent directly to the database to be executed.

8. Capped collections:
MongoDB supports fixed-size collections called capped collections. This type of collection maintains insertion order and, once the specified size has been reached, behaves like a circular queue.

The success of MongoDB can be attributed to the fact that in such a short lifespan, MongoDB has been adopted by so many big brands and organizations. This is a testament to the quality of performance and stability offered by MongoDB and NoSQL databases overall.

Some famous organizations that use MongoDB:
MTV Networks, SAP AG, Craiglist, Foursquare, Sourceforge etc

Availaibility:

MongoDB is available for a wide range of Operating systems to deploy. These include Windows, Linux, OS X etc.

So, who should use MongoDB?
It is essential to mention here that although Big Data is the main reason why MongoDB exists in the first place, it is equally capable of being used with traditional applications in place of RDBMS, It is not however suitable for databases involving complex transactions.

MongoDB can easily be deployed for common database usage scenarios like User data storage, Location data, Form submission data, Content Management database or loggin data apart from its intended usage for Big Data. With supporting drivers available for most of the popular languages like Java, JS, Python, Ruby, PHP, C# and more, MogoDB is a very viable choice to be used as a fast performing database for all your application database needs.

December 29th, 2014|Technical|