Find an effective way to integrate different language libraries into one project using Python as a “glue”,

I am going to participate in an NLP-related project and I need to use various libraries. Some of them are in java, others in C / C ++ (for tasks that require more speed), and finally, some of them are in Python. I was thinking of using Python as a “glue” and creating wrapper classes for every task I want to do based on a different language. To do this, the wrapper class, for example, will execute a java program and communicate with it using pipes. My questions:

  • Do you think this will work for complex and highly repetitive tasks? Or is pipe overhead too complicated?

  • Is there any other (preferably simple) architecture that you would suggest?

+6
source share
5 answers

I would just advise you not to do this.

Do not implement material in C / C ++ "for speed". An effective advantage is not , which can be as big as you expect; for example, compared to implementing in Java using the “best practice” design and specifications.

Do not try to glue many languages ​​together. You set yourself up for a lot of portability problems, debugging difficulties, and reliability issues; for example due to C / C ++ errors, JVM failed. In addition, there is an overhead of performance at the crossroads between languages, and there may be unexpected bottlenecks. (For example, you may find that your C / C ++ should start single-threaded due to threading issues, and therefore you cannot take advantage of Java multithreading in a typically multi-core system.)

Instead, I advise you to look for libraries that will allow you to implement the entire application in one language. If this is not possible, design it so that different language components are different executable files / processes, exchanging messages via RPC, messaging, etc.

+5
source

If you have problems exchanging data through pipes / sockets, this has nothing to do with how the CPU intensifies tasks, but how often you need to send information between processes and how much data they need to send. Setting up threads for your connection will have a small overhead.

Perhaps you can automatically wrap the C / C ++ code using Python ( SWIG , ctypesgen , Boost.Python ), so the only glue you need to write is then talking to Java.

You can also do it the other way - run Python code in the JVM using Jython to get the Python and Java code together, then talk to C / C ++.

+1
source

You should see Apache UIMA . It is designed specifically for this. On the project website:

Frames run components and are available for both Java and C ++. The Java Framework supports the launch of both Java and non-Java components (using the C ++ platform). The C ++ framework, in addition to supporting annotators written in C / C ++, also supports Perl, Python, and TCL annotations.

UIMA can manage pipes and annotators and build on a scale.

+1
source

I would look at Jepp or JPype instead of using IPC for this. I would avoid Jython, since loading C / C ++ libraries in Java would probably be harder than in CPython.

0
source

1) Do you think this will work for processor-demanding and very repetitive tasks? Or are too large flows added by the tube too heavy?

Depends on your task. If this is a typical NLP application in which you have a large model loaded into memory and exchange only small pieces of data (strings, label sequences / parse trees), this might work. Communication with pipes is difficult to obtain, however, since there are many buffering and synchronization problems that you must solve. Python is a very good glue language, but it does not solve everything.

2) Is there any other (preferably simple) architecture that you would suggest?

Build your NLP component services and connect to them through REST interfaces. These are off-the-shelf tools that do this, for example. CLAM Pyro and SPIRO provide the connection between Java and Python even more direct and can be easier to use than HTTP / REST (but YMMV).

Parts written in C / C ++ can also be integrated with CPython using Cython . Do not start implementing things in C or C ++, because you think they will be faster; you can also implement them in Python and then see if you can get the performance you want with NumPy and / or Cython.

0
source

Source: https://habr.com/ru/post/894465/


All Articles