How to install clusters and shards in ArangoDB?

I want to use sharding in arangoDB. I made the coordinators, DBServers, as stated in the 2.8.5 documentation. But still, someone can still explain this in detail, as well as how I can check the performance of my request after and before scalding.

+5
source share
1 answer

Testing your application can be done with a local cluster, all instances run on the same computer - is that what you have already done if I get it right?

ArangoDB cluster consists of coordinator nodes and dbserver. Coordinators do not have their own custom local collections on disk. Their role is to process input / output with clients, analyze, optimize and distribute requests and user data to dbserver nodes. Foxx services will also be provided by the facilitators. DBServers are the storage nodes in this setting; they store user data.

To compare the performance of cluster and non-clustered mode, you import a dataset into a cluster instance and not a cluster one and compare the query execution time. Since there may be more network connection to configure the cluster (i.e., if you are connecting) than with a single server, performance may vary. On a physically distributed cluster, you can achieve higher throughput , because ultimately the cluster nodes are their own machines and have their own I / O paths that end on separate physical hard drives.

In the case of a cluster, you create collections that determine the number of breaks using the numberOfShards parameter; The shardKeys parameter can control the distribution of your documents in shards. You must select this key so that the documents are well distributed among the fragments (i.e. were not balanced with only one fragment). numberOfShards can be an arbitrary value and should not interfere with the number of dbserver nodes - it can even be larger, so you can easily move a shard from one dbserver to a new dbserver when scaling your cluster to more nodes in the future to adapt to higher loads.

When you design AQL queries based on cluster usage, it is important to use the explain command to verify how the query spreads across the clusters and where filters can be deployed:

 db._create("sharded", {numberOfShards: 2}) db._explain("FOR x IN sharded RETURN x") Query string: FOR x IN sharded RETURN x Execution plan: Id NodeType Est. Comment 1 SingletonNode 1 * ROOT 2 EnumerateCollectionNode 1 - FOR x IN sharded /* full collection scan */ 6 RemoteNode 1 - REMOTE 7 GatherNode 1 - GATHER 3 ReturnNode 1 - RETURN x Indexes used: none Optimization rules applied: Id RuleName 1 scatter-in-cluster 2 remove-unnecessary-remote-scatter 

In this simple query, < RETURN and GATHER coordinators are in the coordinator; nodes up, including REMOTE - node, are deployed to the DB server.

In general, the REMOTE / SCATTER β†’ GATHER mean less cluster communication. The narrower FILTER nodes can be expanded to *CollectionNodes to reduce the number of documents sent using REMOTE -nodes, the better the performance.

+2
source

Source: https://habr.com/ru/post/1245121/


All Articles