Before we get started, let’s review the components of the setup we’ll be creating:
- Config Server - This stores metadata and configuration settings for the rest of the cluster.
- Query Router - The
mongos
acts as an interface between the client application and the cluster shards. Since data is distributed among multiple servers, queries need to be routed to the shard where a given piece of information is stored. The query router is run on the application server. - Shard - A shard is simply a database server that holds a portion of your data. Items in the database are divided among shards either by range or hashing.
Rather than using a single config server, we’ll use a replica set to ensure the integrity of the metadata. This enables master-slave (primary-secondary) replication among the three servers and automates fail over so that if your primary config server is down, a new one will be elected and requests will continue to be processed.
The steps below should be performed on each config server individually, unless
otherwise specified.
Config server 1 :-
mongod –replSet mysetConfig --logpath "D:\ReplicaConfig\config1\Log\cfg-a.log" --dbpath D:\ReplicaConfig\config1\Data --port 57017 --configsvr --smallfiles
Config server 2: -
mongod –replSet mysetConfig --logpath "D:\ReplicaConfig\config2\Log\cfg-a.log" --dbpath D:\ReplicaConfig\config2\Data --port 57018 --configsvr --smallfiles
Config server 3:
mongod –replSet mysetConfig --logpath "D:\ReplicaConfig\config2\Log\cfg-a.log" --dbpath D:\ReplicaConfig\config2\Data --port 57019 --configsvr --smallfiles
Initialize Replica Sets :-
We will mongod shell to run the replica set in local machina, We will config three node in each replica set. One node will be primary and other two would be secondary.
To enable the sharding, we need to use --shardsvr as below
Replic Set 1 with sharding :
mongod --dbpath D:\S1Replica\rs1\data -bind_ip 127.0.0.1 --port 27017 --logpath D:\S1Replica\rs1\log\mongod.log --replSet myset --shardsvr
mongod --dbpath D:\S1Replica\rs2\data -bind_ip 127.0.0.1 --port 27018 --logpath D:\S1Replica\rs2\log\mongod.log --replSet myset --shardsvr
mongod --dbpath D:\S1Replica\rs3\data -bind_ip 127.0.0.1 --port 27019 --logpath D:\S1Replica\rs3\log\mongod.log --replSet myset --shardsvr
Use config file to initiate member on replica set as mentioned in my previous post.
Replica Set 2 with sharding :-
mongod --dbpath D:\S2Replica\rs1\data -bind_ip 127.0.0.1 --port 27020 --logpath D:\S2Replica\rs1\log\mongod.log --replSet myset1 --shardsvr
mongod --dbpath D:\S2Replica\rs2\data -bind_ip 127.0.0.1 --port 27021 --logpath D:\S2Replica\rs2\log\mongod.log --replSet myset1 --shardsvr
mongod --dbpath D:\S2Replica\rs3\data -bind_ip 127.0.0.1 --port 27022 --logpath D:\S2Replica\rs3\log\mongod.log --replSet myset1 --shardsvr
Replica set 3 with sharding :-
mongod --dbpath D:\S3Replica\rs1\data -bind_ip 127.0.0.1 --port 27023 --logpath D:\S3Replica\rs1\log\mongod.log --replSet myset2 --shardsvr
mongod --dbpath D:\S3Replica\rs2\data -bind_ip 127.0.0.1 --port 27024 --logpath D:\S3Replica\rs2\log\mongod.log --replSet myset2 --shardsvr
mongod --dbpath D:\S3Replica\rs3\data -bind_ip 127.0.0.1 --port 27025 --logpath D:\S3Replica\rs3\log\mongod.log --replSet myset2 --shardsvr
Configure Query Router
Since we’re only configuring one query router, we’ll only need to do this once. However, it’s also possible to use a replica set of query routers.
mongos --port 30020 --logpath "D:\ReplicaMongos\rs1\Log\mongos-1.log" --configdb mysetConfig/localhost:57017,localhost:57018,localhost:57019
Add Shards to the Cluster
Now that the query router is able to communicate with the config servers, we must enable sharding so that the query router knows which servers will host the distributed data and where any given piece of data is located.
Use mongo shell connect to the query router:
mongo localhost:37017
From the
mongos
interface, add each shard individually:
sh.addShard("myset/127.0.0.1:27017,127.0.0.1:27018,127.0.0.1:27019")
sh.addShard("myset1/127.0.0.1:27020,127.0.0.1:27021,127.0.0.1:27022")
sh.addShard("myset2/127.0.0.1:27023,127.0.0.1:27024,127.0.0.1:27025")
- Connect to the
mongo
shell on your query router if you’re not already there:mongo localhost:37017
- Switch to the
exampleDB
database we created previously:use exampleDB
- Create a new collection called
exampleCollection
and hash its_id
key. The_id
key is already created by default as a basic index for new documents:db.exampleCollection.ensureIndex( { _id : "hashed" } )
- Finally, shard the collection:
sh.shardCollection( "exampleDB.exampleCollection", { "_id" : "hashed" } )
Add document in the collection and check shard distribution on the collection db.exampleCollection.getShardDistribution()
This will output information similar to the following:
Shard shard0000 at mongo-shard-1:27017
data : 8KiB docs : 265 chunks : 2
estimated data per chunk : 4KiB
estimated docs per chunk : 132
Shard shard0001 at mongo-shard-2:27017
data : 7KiB docs : 235 chunks : 2
estimated data per chunk : 3KiB
estimated docs per chunk : 117
Totals
data : 16KiB docs : 500 chunks : 4
Shard shard0000 contains 53% data, 53% docs in cluster, avg obj size on shard : 33B
Shard shard0001 contains 47% data, 47% docs in cluster, avg obj size on shard : 33B