Why does the Rexster (and Titan) server stop responding?

Customization

I am implementing a recommendation system running on an Ubuntu 12.4 server using Titan Rexster (titan-server-0.4.4.zip) with the Elasticearch backend. To connect to a Rexster server, I use the Bulbflow library for python.

Beta seemed to be working fine for 3 weeks, but with an increase in load (just a couple of users ~ 10), the Rexster server stopped responding. I do not know if my rexster configuration is incorrect or if I am using the Bulbflow library incorrectly.

Rexster / Titan Configuration

Here is my rexster-cassandra-es.xml:

<?xml version="1.0" encoding="UTF-8"?> <rexster> <http> <server-port>8182</server-port> <server-host>0.0.0.0</server-host> <base-uri>http://MY_IP</base-uri> <web-root>public</web-root> <character-set>UTF-8</character-set> <enable-jmx>false</enable-jmx> <enable-doghouse>true</enable-doghouse> <max-post-size>2097152</max-post-size> <max-header-size>8192</max-header-size> <upload-timeout-millis>30000</upload-timeout-millis> <thread-pool> <worker> <core-size>20</core-size> <max-size>40</max-size> </worker> <kernal> <core-size>10</core-size> <max-size>20</max-size> </kernal> </thread-pool> <io-strategy>leader-follower</io-strategy> </http> <rexpro> <server-port>8184</server-port> <server-host>0.0.0.0</server-host> <session-max-idle>1790000</session-max-idle> <session-check-interval>3000000</session-check-interval> <connection-max-idle>180000</connection-max-idle> <connection-check-interval>3000000</connection-check-interval> <enable-jmx>false</enable-jmx> <thread-pool> <worker> <core-size>8</core-size> <max-size>8</max-size> </worker> <kernal> <core-size>4</core-size> <max-size>4</max-size> </kernal> </thread-pool> <io-strategy>leader-follower</io-strategy> </rexpro> <shutdown-port>8183</shutdown-port> <shutdown-host>127.0.0.1</shutdown-host> <script-engines> <script-engine> <name>gremlin-groovy</name> <reset-threshold>-1</reset-threshold> <imports>com.tinkerpop.gremlin.*,com.tinkerpop.gremlin.java.*,com.tinkerpop.gremlin.pipes.filter.*,com.tinkerpop.gremlin.pipes.sideeffect.*,com.tinkerpop.gremlin.pipes.transform.*,com.tinkerpop.blueprints.*,com.tinkerpop.blueprints.impls.*,com.tinkerpop.blueprints.impls.tg.*,com.tinkerpop.blueprints.impls.neo4j.*,com.tinkerpop.blueprints.impls.neo4j.batch.*,com.tinkerpop.blueprints.impls.orient.*,com.tinkerpop.blueprints.impls.orient.batch.*,com.tinkerpop.blueprints.impls.dex.*,com.tinkerpop.blueprints.impls.rexster.*,com.tinkerpop.blueprints.impls.sail.*,com.tinkerpop.blueprints.impls.sail.impls.*,com.tinkerpop.blueprints.util.*,com.tinkerpop.blueprints.util.io.*,com.tinkerpop.blueprints.util.io.gml.*,com.tinkerpop.blueprints.util.io.graphml.*,com.tinkerpop.blueprints.util.io.graphson.*,com.tinkerpop.blueprints.util.wrappers.*,com.tinkerpop.blueprints.util.wrappers.batch.*,com.tinkerpop.blueprints.util.wrappers.batch.cache.*,com.tinkerpop.blueprints.util.wrappers.event.*,com.tinkerpop.blueprints.util.wrappers.event.listener.*,com.tinkerpop.blueprints.util.wrappers.id.*,com.tinkerpop.blueprints.util.wrappers.partition.*,com.tinkerpop.blueprints.util.wrappers.readonly.*,com.tinkerpop.blueprints.oupls.sail.*,com.tinkerpop.blueprints.oupls.sail.pg.*,com.tinkerpop.blueprints.oupls.jung.*,com.tinkerpop.pipes.*,com.tinkerpop.pipes.branch.*,com.tinkerpop.pipes.filter.*,com.tinkerpop.pipes.sideeffect.*,com.tinkerpop.pipes.transform.*,com.tinkerpop.pipes.util.*,com.tinkerpop.pipes.util.iterators.*,com.tinkerpop.pipes.util.structures.*,org.apache.commons.configuration.*,com.thinkaurelius.titan.core.*,com.thinkaurelius.titan.core.attribute.*,com.thinkaurelius.titan.core.util.*,com.thinkaurelius.titan.example.*,org.apache.commons.configuration.*,com.tinkerpop.gremlin.Tokens.T,com.tinkerpop.gremlin.groovy.*</imports> <static-imports>com.tinkerpop.blueprints.Direction.*,com.tinkerpop.blueprints.TransactionalGraph$Conclusion.*,com.tinkerpop.blueprints.Compare.*,com.thinkaurelius.titan.core.attribute.Geo.*,com.thinkaurelius.titan.core.attribute.Text.*,com.thinkaurelius.titan.core.TypeMaker$UniquenessConsistency.*,com.tinkerpop.blueprints.Query$Compare.*</static-imports> </script-engine> </script-engines> <security> <authentication> <type>none</type> <configuration> <users> <user> <username>rexster</username> <password>rexster</password> </user> </users> </configuration> </authentication> </security> <metrics> <reporter> <type>jmx</type> </reporter> <reporter> <type>http</type> </reporter> <reporter> <type>console</type> <properties> <rates-time-unit>SECONDS</rates-time-unit> <duration-time-unit>SECONDS</duration-time-unit> <report-period>10</report-period> <report-time-unit>MINUTES</report-time-unit> <includes>http.rest.*</includes> <excludes>http.rest.*.delete</excludes> </properties> </reporter> </metrics> <graphs> <graph> <graph-name>newspaper</graph-name> <graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type> <!-- <graph-location>/tmp/titan</graph-location> --> <graph-read-only>false</graph-read-only> <properties> <storage.backend>cassandra</storage.backend> <storage.index.search.backend>elasticsearch</storage.index.search.backend> <storage.index.search.hostname>localhost</storage.index.search.hostname> <storage.index.search.client-only>true</storage.index.search.client-only> <storage.index.search.local-mode>false</storage.index.search.local-mode> </properties> <extensions> <allows> <allow>tp:gremlin</allow> </allows> </extensions> </graph> </graphs> </rexster> 

I changed the kernel size and the maximum threadpool size for the worker and the kernel, without this change the Rexster server freezes / does not respond even faster.

What are the appropriate values ​​for kernel size and maximum size?

Using Bulbflow

To use bulbflow, I create a new Graph object every time I need to execute a query. There are many requests, so these objects are created very often.

Should I create a new Graph object for each new query?

Is it possible to create only one Graph object and use it whenever a new request is sent to the graph database or I run session problems?

Error message

When everything gets stuck and I force the program to terminate (ctrl-c), I get the following stack:

 Exception happened during processing of request from ('my_ip', 57489) Traceback (most recent call last): File "/usr/lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock self.process_request(request, client_address) File "/usr/lib/python2.7/SocketServer.py", line 310, in process_request self.finish_request(request, client_address) File "/usr/lib/python2.7/SocketServer.py", line 323, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib/python2.7/SocketServer.py", line 638, in __init__ self.handle() File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 200, in handle rv = BaseHTTPRequestHandler.handle(self) File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle self.handle_one_request() File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 235, in handle_one_request return self.run_wsgi() File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 177, in run_wsgi execute(self.server.app) File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 165, in execute application_iter = app(environ, start_response) File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__ return self.wsgi_app(environ, start_response) File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app response = self.full_dispatch_request() File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request rv = self.dispatch_request() File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/home/user/dir/recommender/project/api/start.py", line 65, in put_user graphdb.insert_user(user_id) File "project/api/graphdb.py", line 14, in insert_user user_with_id = g.users.index.lookup(user_sqlid=user_id) File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/titan/index.py", line 270, in lookup resp = self.client.lookup_vertex(self.index_name,key,value) File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/titan/client.py", line 348, in lookup_vertex return self.request.get(path,params) File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/rest.py", line 101, in get return self.request(GET, path, params) File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/rest.py", line 184, in request http_resp = self.http.request(uri, method, body, headers) File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1593, in request (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey) File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1335, in _request (response, content) = self._conn_request(conn, request_uri, method, body, headers) File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1291, in _conn_request response = conn.getresponse() File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse response.begin() File "/usr/lib/python2.7/httplib.py", line 407, in begin version, status, reason = self._read_status() File "/usr/lib/python2.7/httplib.py", line 365, in _read_status line = self.fp.readline() File "/usr/lib/python2.7/socket.py", line 430, in readline data = recv(1) 

Recovery

To recover, I have to close rexster / titan and restart it. Whenever I stop the Rexster server (./bin/titan -c cassandra-es stop), I get the following output:

 Killing Titan + Rexster (pid 26779)... Rexster shutdown timeout exceeded (60 seconds) Killing Cassandra (pid 26201)... 

Rexster is completely stuck.

Looking forward to get a helpful guide.

+6
source share
2 answers

The following thread on the Titan mailing list may be useful to you: The Rexster REST API stops responding . However, I do not think that they were able to solve this problem, as the developers of Titan and Rexster could not reproduce it.

This suggests that I highly recommend upgrading the Titan v1.0.0, which uses the TinkerPop 3.0+ Gremlin server instead of TinkerPop 2.x Rexster. You will get fewer errors, more features and, especially, much more expressive Gremlin queries (see the TinkerPop 3.0.1 documentation used by Titan v1.0.0 ). Titan v0.4.4 is a very old release, and I don’t think it is worth it to solve this particular problem, especially if you are new to graphics.

+3
source

First, I recommend that you upgrade to titan 1.0, as Rexster is being replaced by the Gremlin server with some significant changes. if you still need to use titan version 0.4.4, I would say try running it as a service. the session may end so that the instance completes all jobs.

check the following documents. Gremlin Server Documentation

+1
source

Source: https://habr.com/ru/post/986929/


All Articles