DatabaseLessLeasing failed, and the server is not in the primary cluster partition

I ran into a DatabaseLessLeasing problem. Our application is a middleware. We do not have a database, and our application runs on a WebLogic server. We have 2 servers in one cluster. Both servers are up and running, but we only use one server for processing. When the primary server fails, the entire server and services will migrate to the secondary server. This is working fine.

But we had one problem at the end of last year, when our secondary server equipment was turned off and the secondary server was unavailable. We got the following problem. When we went to Oracle, they suggested having one more server or one database, which is highly accessible for storing cluster leasing information to indicate which one is the main server. At the moment, we do not have such an opportunity, since the installation of a new server means that there will be a problem with the budget, and the client is not ready for this.

Our Weblogic configuration for the cluster:

  • one cluster with two managed servers
  • Cluster Messaging Mode - Multicast
  • The basis of migration is consensus.
  • Round Robin loading algorithm

This is the magazine I found.

LOG: Critical Health BEA-310006 Critical Subsystem Error DatabaseLessLeasing. Setting server status to FAILED. Reason: the server is not in the primary cluster partition>

WebLogicServer BEA-000385 server critical condition failed. Reason: The critical service "DatabaseLessLeasing" failed. WebLogicServer BEA-000365 Server Status changed to FAILED

** Note: ** I remember one thing, the server was not down when this happened. Both servers were working, but suddenly the server tried to restart and could not restart. The restart was unsuccessful. I saw that the status was showing as failedToRestart, and the application crashed.

Can anybody help me on this.

thanks

+4
source share
2 answers

Consensus leasing requires that most servers continue to function. Each time there is a network partition, the servers in most partitions will continue to work, while the partitions in the minority partition will fail, because they cannot contact the cluster leader or choose a new cluster leader because they will not have the majority of the servers. If the partition results in equal server sharing, then the partition containing the cluster leader will survive, and the other will fail.

Thanks to the above functions, if automatic server migration is enabled, servers must contact the cluster leader and periodically update their leases. Servers will be closed if they cannot renew the lease. Then the failed servers will be automatically transferred to the machines in the main section.

A server that has received partitioning (and not part of the main cluster) will fall into the FAILED state. This behavior was introduced in order to avoid split-brain scenarios where there are two sections of the cluster, and both consider them to be a real cluster. When a cluster is segmented, the largest segment will survive, and the smaller segment will disconnect. When servers cannot contact the cluster master, they determine whether they are in the larger partition or not. If they are in a larger section, they will select a new cluster master. If not, they will all be closed when their lease expires. In this case, the problematic clusters are "> 630. When the cluster is partitioned, which partition is the largest? When the cluster master goes into a cluster with two nodes, the remaining server does not know whether it is the majority or not. In this case, if the remaining server is the master of the cluster, it will continue to work, if it is not a master, it will shut down.

Usually this error appears when there are only 2 managed servers in the onc cluster.

To solve such problems, create another server; since a cluster consists of only two nodes, any server will drop out of the main cluster partition if it loses connection / drop of broadcast messages. In this case, there are no other servers in the cluster.

For Consensus Leasing, it is always recommended to create a cluster with at least three nodes; this way you can provide some stability.

In this scenario, even if one server drops out of the cluster, the other two still function correctly, as they remain in the main cluster partition. The third will connect to the cluster or will eventually be restarted.

In a scenario in which you have only 2 servers as part of a cluster, dropping out of the cluster will restart the servers because they are not part of most cluster partitions; this will ultimately lead to a very unstable environment.

Another possible scenario is that there was a communication problem between the managed servers, you should look for messages, such as "lost messages." (in the case of unicast, this is something like β€œLost 2 unicast messages” (s). ”] This may be caused by temporary network problems.

+2
source

Ensure that the node core for the secondary node in the cluster migration configuration is up and running.

+1
source

Source: https://habr.com/ru/post/1481287/


All Articles