Wednesday, July 17, 2019

MongoDB : Step by Step sharding with replica set on local machine

Before we get started, let’s review the components of the setup we’ll be creating:

  • Config Server - This stores metadata and configuration settings for the rest of the cluster.
  • Query Router - The mongos acts as an interface between the client application and the cluster shards. Since data is distributed among multiple servers, queries need to be routed to the shard where a given piece of information is stored. The query router is run on the application server.
  • Shard - A shard is simply a database server that holds a portion of your data. Items in the database are divided among shards either by range or hashing.
"A sharded MongoDB cluster"

Initialize Config ServersPermalink

Rather than using a single config server, we’ll use a replica set to ensure the integrity of the metadata. This enables master-slave (primary-secondary) replication among the three servers and automates fail over so that if your primary config server is down, a new one will be elected and requests will continue to be processed.
The steps below should be performed on each config server individually, unless 
otherwise specified.
Config server 1 :- 
mongod –replSet mysetConfig --logpath "D:\ReplicaConfig\config1\Log\cfg-a.log" --dbpath D:\ReplicaConfig\config1\Data --port 57017 --configsvr --smallfiles
Config server 2: -
mongod –replSet mysetConfig --logpath "D:\ReplicaConfig\config2\Log\cfg-a.log" --dbpath D:\ReplicaConfig\config2\Data --port 57018 --configsvr --smallfiles
Config server 3:
mongod –replSet mysetConfig --logpath "D:\ReplicaConfig\config2\Log\cfg-a.log" --dbpath D:\ReplicaConfig\config2\Data --port 57019 --configsvr --smallfiles

Initialize Replica Sets :- 

We will mongod shell to run the replica set in local machina, We will config three node in each replica set. One node will be primary and other two would be secondary.
To enable the sharding, we need to use --shardsvr as below

Replic Set 1 with sharding :

mongod --dbpath D:\S1Replica\rs1\data -bind_ip 127.0.0.1 --port 27017 --logpath D:\S1Replica\rs1\log\mongod.log --replSet myset --shardsvr
mongod --dbpath D:\S1Replica\rs2\data -bind_ip 127.0.0.1 --port 27018 --logpath D:\S1Replica\rs2\log\mongod.log --replSet myset --shardsvr
mongod --dbpath D:\S1Replica\rs3\data -bind_ip 127.0.0.1 --port 27019 --logpath D:\S1Replica\rs3\log\mongod.log --replSet myset --shardsvr


Use config file to initiate member on replica set as mentioned in my previous post

Replica Set 2 with sharding :- 

mongod --dbpath D:\S2Replica\rs1\data -bind_ip 127.0.0.1 --port 27020 --logpath D:\S2Replica\rs1\log\mongod.log --replSet myset1 --shardsvr
mongod --dbpath D:\S2Replica\rs2\data -bind_ip 127.0.0.1 --port 27021 --logpath D:\S2Replica\rs2\log\mongod.log --replSet myset1 --shardsvr
mongod --dbpath D:\S2Replica\rs3\data -bind_ip 127.0.0.1 --port 27022 --logpath D:\S2Replica\rs3\log\mongod.log --replSet myset1 --shardsvr

Replica set 3 with sharding :- 

mongod --dbpath D:\S3Replica\rs1\data -bind_ip 127.0.0.1 --port 27023 --logpath D:\S3Replica\rs1\log\mongod.log --replSet myset2 --shardsvr
mongod --dbpath D:\S3Replica\rs2\data -bind_ip 127.0.0.1 --port 27024 --logpath D:\S3Replica\rs2\log\mongod.log --replSet myset2 --shardsvr
mongod --dbpath D:\S3Replica\rs3\data -bind_ip 127.0.0.1 --port 27025 --logpath D:\S3Replica\rs3\log\mongod.log --replSet myset2 --shardsvr

Configure Query Router


Since we’re only configuring one query router, we’ll only need to do this once. However, it’s also possible to use a replica set of query routers.

mongos --port 30020 --logpath "D:\ReplicaMongos\rs1\Log\mongos-1.log" --configdb mysetConfig/localhost:57017,localhost:57018,localhost:57019

Add Shards to the Cluster

Now that the query router is able to communicate with the config servers, we must enable sharding so that the query router knows which servers will host the distributed data and where any given piece of data is located.

Use mongo shell connect to the query router:

mongo localhost:37017  

From the mongos interface, add each shard individually:

sh.addShard("myset/127.0.0.1:27017,127.0.0.1:27018,127.0.0.1:27019")

sh.addShard("myset1/127.0.0.1:27020,127.0.0.1:27021,127.0.0.1:27022")

sh.addShard("myset2/127.0.0.1:27023,127.0.0.1:27024,127.0.0.1:27025")

Enable Sharding at Collection LevelPermalink

  1. Connect to the mongo shell on your query router if you’re not already there:
    mongo localhost:37017  
    
  2. Switch to the exampleDB database we created previously:
    use exampleDB
    
  3. Create a new collection called exampleCollection and hash its _id key. The _id key is already created by default as a basic index for new documents:
    db.exampleCollection.ensureIndex( { _id : "hashed" } )
    
  4. Finally, shard the collection:
    sh.shardCollection( "exampleDB.exampleCollection", { "_id" : "hashed" } )
    

    Add document in the collection and check shard distribution on the collection 

  5. db.exampleCollection.getShardDistribution()
 
This will output information similar to the following:
Shard shard0000 at mongo-shard-1:27017
 data : 8KiB docs : 265 chunks : 2
 estimated data per chunk : 4KiB
 estimated docs per chunk : 132

Shard shard0001 at mongo-shard-2:27017
 data : 7KiB docs : 235 chunks : 2
 estimated data per chunk : 3KiB
 estimated docs per chunk : 117

Totals
 data : 16KiB docs : 500 chunks : 4
 Shard shard0000 contains 53% data, 53% docs in cluster, avg obj size on shard : 33B
 Shard shard0001 contains 47% data, 47% docs in cluster, avg obj size on shard : 33B






 

No comments:

Post a Comment

Kafka setup in window

Here's a step-by-step guide on how to do it : 1. Prerequisites : Before you begin, make sure you have the following prerequisites inst...