Enterprise Jenkins HA plugin does not work as it should

Question

Enterprise Jenkins HA plugin does not work as it should

I am trying to configure Enterprise Jenkins with High Availabilityabilty setup. The current setup consists of two Jenkins masters who use the same Jenkins houses, say master1 and master2, installing jenkins-ha-monitor-1.1-1.1 rpm on both of these masters, say monitor1 and monitor2. With this setting, according to the documentation, at least the HA plugin should work properly. The promotion and demotion scripts are similar to those indicated in the documentation (only the IP and interface are different, the same approach). i.e

For demotion

ifconfig eth0: 2 down

For career advancement

ifconfig eth0: 2 the.floating.ip

Now, for the nodes to be registered correctly, I have to run master1, master2, monitor1 and monitor2 in that order. Log halts for both, I see that when services start in this order, they are correctly logged by both monitoring services, both nodes in the cluster, and in HA status status in the jenkins console.

Now that master1 is killed by sending it, the KILL2 signal monitor recognizes this and runs an ad script campaign. But the monitor continues to throw:

October 24, 2012 3:47:36 com.cloudbees.jenkins.ha.singleton.HASingleton $ 3 suspect INFO: Suspected cluster node failure: jenkins-master-1-285 Oct 24, 2012 3:47:39 PM com. cloudbees.jenkins.ha.singleton.HASingleton $ 3 suspect INFO: Suspicious cluster node failure: Jenkins-master-1-285

continuously without running downgrades script. Now, since master2 has taken the floating ip through its promotion script, and master1 still has this ip because the demotion script does not start, the installation ends with two blocks requiring the same ip. Moreover, restarting master1 does nothing, i.e. Master1 is not added to the cluster as the second node, monitor1 still continues to spill the above messages into the log, floating ip continues to return “Unable to connect”, and master2 and monitor2 show that the cluster is like master2, monitor2 and monitor1. So my question / problem is twofold - why isnt master1 taken back to the cluster? And why doesn't the demotion script work as it should?

Also fyi i tried to do

jenkins service stop

and in this case the demotion script is executed, but similar problems again arise when

start jenkins service

runs on a wizard that was stopped earlier, as the script progress is executed regardless of whether primary jenkins exist. And in this case, two monitors register different clusters, such as monitor1: master1, monitor1 and monitor2: master2, monitor2. Running ifconfig shows that both wizards have taken up a floating ip at this point.

Any help is appreciated! Thanks!

+4

cloudbees

yash.vyas Oct 25 '12 at 15:05

source share

1 answer

Jesse glick · Accepted Answer · 2012-11-07T16:18:00+0000

Still under investigation with support. The initially reported problem (here) suggests that the two nodes communicate nicely, but the promotion / demotion is not working correctly - either an error in JGroups or its use in Jenkins high availability.

But in further tests, there were problems with UDP multicast that were reported for RedHat / CentOS hosts. Currently, work is underway to create an alternative JGroups stack that does not rely on multicast (or UDP) at all, using the $JENKINS_HOME shared directory to register Jenkins and monitor instances (as TCP addresses: port entries).

Enterprise Jenkins HA plugin does not work as it should

More articles: