Testing your application can be done with a local cluster, all instances run on the same computer - is that what you have already done if I get it right?
ArangoDB cluster consists of coordinator nodes and dbserver. Coordinators do not have their own custom local collections on disk. Their role is to process input / output with clients, analyze, optimize and distribute requests and user data to dbserver nodes. Foxx services will also be provided by the facilitators. DBServers are the storage nodes in this setting; they store user data.
To compare the performance of cluster and non-clustered mode, you import a dataset into a cluster instance and not a cluster one and compare the query execution time. Since there may be more network connection to configure the cluster (i.e., if you are connecting) than with a single server, performance may vary. On a physically distributed cluster, you can achieve higher throughput , because ultimately the cluster nodes are their own machines and have their own I / O paths that end on separate physical hard drives.
In the case of a cluster, you create collections that determine the number of breaks using the numberOfShards
parameter; The shardKeys
parameter can control the distribution of your documents in shards. You must select this key so that the documents are well distributed among the fragments (i.e. were not balanced with only one fragment). numberOfShards
can be an arbitrary value and should not interfere with the number of dbserver nodes - it can even be larger, so you can easily move a shard from one dbserver to a new dbserver when scaling your cluster to more nodes in the future to adapt to higher loads.
When you design AQL queries based on cluster usage, it is important to use the explain command to verify how the query spreads across the clusters and where filters can be deployed:
db._create("sharded", {numberOfShards: 2}) db._explain("FOR x IN sharded RETURN x") Query string: FOR x IN sharded RETURN x Execution plan: Id NodeType Est. Comment 1 SingletonNode 1 * ROOT 2 EnumerateCollectionNode 1 - FOR x IN sharded 6 RemoteNode 1 - REMOTE 7 GatherNode 1 - GATHER 3 ReturnNode 1 - RETURN x Indexes used: none Optimization rules applied: Id RuleName 1 scatter-in-cluster 2 remove-unnecessary-remote-scatter
In this simple query, < RETURN
and GATHER
coordinators are in the coordinator; nodes up, including REMOTE
- node, are deployed to the DB server.
In general, the REMOTE
/ SCATTER
β GATHER
mean less cluster communication. The narrower FILTER
nodes can be expanded to *CollectionNodes
to reduce the number of documents sent using REMOTE
-nodes, the better the performance.