Introduction
MongoDB it is a document-oriented database. It is a free and open-source database. It does not rely on a traditional table-based relational database structure that’s why it is classified as a NoSQL database. Instead it uses JSON-like documents with dynamic schemas. Before you add data to database MongoDB does not require a predefined schema cause it’s not like other relational database.The schema can be altercated at any time
In the following tutorial in one of its part we will use the MongoDB Repository to install the latest version of MongoDB. In part two now we have to start the authentication to secure it on the local system. At the end the final step, we’ll show how to more securely allow remote connection if they’re needed
MongoDB Overview
• From “humongous”
• Document-oriented database, not relational
• Schema free
• Manages hierarchical collection of BSON (bee-son) documents
• Written in C++
• Has an official driver for C# with support from 10gen
• Scalable with high-performance (scales horizontally)
• Designed to address today’s workloads
• BASE rather than ACID compliant
• Replication
• Part of the “NoSQL” class of DBMS
• Website with list of all features – http://www.mongodb.org/
What is NoSQL?
• Class of DBMS that differ from relational model
• Do not expose the standard SQL interface
• May not require fixed table schemas, usually
avoid join operations, and typically scale horizontally.
• Term coined by Carlo Strozzi in 1998 to name his lightweight,
open-source relational database that did not expose the
standard SQL interface.
• However as Strozzi said, because the current NoSQL
movement is departing “from the relational model
altogether; it should therefore have been called more
appropriately ‘NoREL’, or something to that effect.“
• http://nosql-database.org/ for examples.
Why are these interesting?
New requirements are arising in environments where we have higher volumes of data with high operation rates, agile development and cloud computing. This reflects the growing interactivity of applications which are becoming more networked and social, driving more requests to the database where high-performance DBMS such as MongoDB become favorable. Not requiring a schema or migration scripts before you add data makes it fit well with agile development approaches. Each time you complete new features, the schema of your database often needs to change. If the database is large, this can mean a slow process.
ACID
Relational databases make the ACID promise:
– Atomicity – a transaction is all or nothing
– Consistency – only valid data is written to the database
– Isolation – pretend all transactions are happening serially and the data is correct
– Durability – what you write is what you get
• The problem is ACID can give you too much, it trips you up when you are
trying to scale a system across multiple nodes.
• Down time is unacceptable so your system needs to be reliable. Reliability
requires multiple nodes to handle machine failures.
• To make scalable systems that can handle lots and lots of reads and writes
you need many more nodes.
• Once you try to scale ACID across many machines you hit problems with
network failures and delays. The algorithms don’t work in a distributed
environment at any acceptable speed.
CAP
If you can’t have all of the ACID guarantees it turns out you can have two of the following
three characteristics:
– Consistency – your data is correct all the time. What you write is what you read.
– Availability – you can read and write and write your data all the time
– Partition Tolerance – if one or more nodes fails the system still works and becomes consistent when
the system comes on-line.
• In distributed systems, network partitioning is inevitable and must be tolerated, so essential
CAP means that we cannot have both consistency and 100% availability.
“If the network is broken, your database won’t work.”
• However, we do get to pick the definition of “won’t work”. It can either mean down
(unavailable) or inconsistent (stale data).
BASE
• The types of large systems based on CAP aren’t ACID, they are BASE (ha ha)
– Basically Available – system seems to work all the time
– Soft State – it doesn’t have to be consistent all the time
– Eventually Consistent – becomes consistent at some later time
• Many companies building big applications build them on CAP and BASE: Google, Yahoo,
Facebook, Amazon, eBay, etc.
• Amazon popularized the concept of “Eventual Consistency”. Their definition is:
the storage system guarantees that if no new updates are made to the object, eventually all
accesses will return the last updated value.
• A few examples of eventually consistent systems:
– Asynchronous master/slave replication on an RDBMS or MongoDB
– DNS
– memcached in front of mysql, caching reads
For more depth and different configuration examples: http://blog.mongodb.org/post/498145601/on-distributed-consistency-part-2-some-eventual
BSON
• Stands for Binary JSON
• Is a binary encoded serialisation of JSON-like documents.
• Like JSON, BSON supports the embedding of documents and
arrays within other documents and arrays. BSON also contains
extensions that allow representation of data types that are
not part of the JSON spec. For example, BSON has a Date type
and a BinData type.
• The driver performs translation from the language’s “object”
(ordered associative array) data representation to BSON, and
back:
• C#: new BsonDocument(“x”, 1) Javascript: {x: 1}
A Many to Many Association
• In a relational DBMS use an intersection table and joins
• In MongoDB use either embedding or linking
BsonDocument user = new BsonDocument {
{ “name”, “John” },
{ “roles”, new BsonArray{“Admin”, “User”, “Engineer”}}
};
users.Insert(user);
//To get all Engineers
users.Find(Query.EQ(“roles”,”Engineer”));
• Embedding is the nesting of objects and arrays inside
a BSON document. Links are references between
documents.
• There are no joins in MongoDB – distributed joins would be difficult on a 1,000 server cluster. Embedding is a bit like “prejoined” data. Operations within a document are easy for the server to handle; these operations can be fairly rich. Links in contrast must be processed client-side by the application; the application does this by issuing a follow-up query.
• Generally, for “contains” relationships between entities, embedding should be chosen. Use linking when not using linking would result in duplication of data.
Replication through Replica Sets
• Replica sets are a form of asynchronous master/slave replication, adding automatic failover and automatic recovery of member nodes.
• A replica set consists of two or more nodes that are copies of each other. (i.e.: replicas)
• The replica set automatically elects a primary (master). No one member is intrinsically primary; that is, this is a share-nothing design.
• Drivers (and mongos) can automatically detect when a replica set primary changes and will begin sending writes to the new primary. (Also works with sharding)
• Replica sets have several common uses (detail in next slide):
– Data Redundancy
– Automated Failover / High Availability
– Distributing read load
– Simplify maintenance (compared to “normal” master-slave)
– Disaster recovery
• http://www.mongodb.org/display/DOCS/Replica+Sets
Why Replica Sets
• Data Redundancy
– Replica sets provide an automated method for storing multiple copies of your data.
– Supported drivers allow for the control of “write concerns”. This allows for writes to be confirmed by multiple nodes
before returning a success message to the client.
• Automated Failover
– Replica sets will coordinate to have a single primary in a given set.
– Supported drivers will recognize the change of a primary within a replica set.
• In most cases, this means that the failure of a primary can be handled by the client without any configuration changes.
– A correctly configured replica set basically provides a “hot backup”. Recovering from backups is typically very time
consuming and can result in data loss. Having an active replica set is generally much faster than working with backups.
• Read Scaling
– By default, the primary node of a replica set is accessed for all reads and writes.
– Most drivers provide a slaveOkay method for identifying that a specific operation can be run on a secondary node.
When using slaveOkay, a system can share the read load amongst several nodes.
• Maintenance
– When performing tasks such as upgrades, backups and compaction, it is typically required to remove a node from
service.
– Replica sets allow for these maintenance tasks to be performed while operating a production system. As long as the
production system can withstand the removal of a single node, then it’s possible to perform a “rolling” upgrade on
such things.
• Disaster Recovery
– Replica sets allows for a “delayed secondary” node.
– This node can provide a window for recovering from disastrous events such as:
• bad deployments
• dropped tables and collections
Horizontal Scalability
• Rather than buying bigger servers, MongoDB scales by adding additional servers – improvements come in the form of more processors and cores rather than faster processors from packing more CPUs and ram into a server (vertical scaling).
• MongoDB easily supports high transaction rate applications because as more servers are added, transactions are distributed across the larger cluster of nodes, which linearly increases database capacity. With this model additional capacity can be added without reaching any limits.
• MongoDB achieves this through auto-sharding
Sharding
• For applications that outgrow the resources of a single database server, MongoDB can convert to a sharded cluster, automatically managing failover and balancing of nodes, with few or no changes to the original application code.
• Each shard consists of one or more servers and stores data using mongod processes (mongod being the core MongoDB database process). In a production situation, each shard will consist of multiple replicated servers per shard to ensure availability and automated failover. The set of servers/mongod process within the shard comprise a replica
set.
• Sharding offers:
– Automatic balancing for changes in load and data distribution
– Easy addition of new machines
– Scaling out to one thousand nodes
– No single points of failure
– Automatic failover
• http://www.mongodb.org/display/DOCS/Sharding+Introduction
Large MongoDB Deployment example
1. One or more shards, each shard holds a portion of the total data (managed automatically). Reads and writes are automatically routed to the appropriate shard(s). Each shard is backed by a replica set – which just holds the data for that shard. A replica set is one or more servers, each holding copies of the same data. At any given time one is primary and the rest are secondaries. If the primary goes down one of the secondaries takes over automatically as primary. All writes and consistent reads go to the primary, and all eventually consistent reads are distributed amongst all the secondaries.
2. Multiple config servers, each one holds a copy of the meta data indicating which data lives on which shard.
3. One or more routers, each one acts as a server for one or more clients. Clients issue queries/updates to a router and the router routes them to the appropriate shard
while consulting the config servers.
4. One or more clients, each one is (part of) the user’s application and issues commands to a router via the mongo client library (driver) for its language. mongod is the server program (data or config). mongos is the router program.
0 Comments