NLP project, python or C ++

We are working on a project to process Arabic in natural language, we limited our choice to writing code in Python or in C ++ (and in the Boost library). We think about these points:

  • Python

    • Slower than C ++ (work continues on creating Python)
    • Improved UTF8 Support
    • Speeding up test writing and using different algorithms
  • C ++

    • Faster than python
    • Familiar code, every programmer knows C or C code

After the project is completed, transferring the project to other programming languages ​​should not be very difficult.

What do you think is best and suitable for the project?

+3
source share
5 answers

Python, , , ++. Python ++ , "" ++ .

-, ++ Python. , Python , ++. , dict std::map .

P.S. , C Python.

+8

, , NLP python, NLTK. NLP :


( )

. , Python, , . , , ing. Python, , , , Python:

import sys
for line in sys.stdin:
    for word in line.split():
        if word.endswith('ing'):
            print word

[...]

C - , :

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv) {
   int i = 0;
   int c = 1;
   char buffer[1024];

   while (c != EOF) {
       c = fgetc(stdin);
       if ( (c >= '0' && c <= '9') || (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ) {
           buffer[i++] = (char) c;
           continue;
       } else {
           if (i > 2 && (strncmp(buffer+i-3, "ing", 3) == 0 || strncmp(buffer+i-3, "ING", 3) == 0 ) ) {
               buffer[i] = 0;
               puts(buffer);
           }
           i = 0;
       }
   }
   return 0;
}

: ++/Boost, , - , Boost . , .

// char_sep_example_1.cpp
#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>

    int main()
    {
      std::string str = ";;Hello|world||-foo--bar;yow;baz|";
      typedef boost::tokenizer<boost::char_separator<char> > 
        tokenizer;
      boost::char_separator<char> sep("-;|");
      tokenizer tokens(str, sep);
      for (tokenizer::iterator tok_iter = tokens.begin();
           tok_iter != tokens.end(); ++tok_iter)
        std::cout << "<" << *tok_iter << "> ";
      std::cout << "\n";
      return EXIT_SUCCESS;
    }
+9

/ . - (), Python ++, , - :

#include <string>
#include <iostream>

int main() { 
    std::string temp;
    while (std::cin>>temp) 
        if (temp.size()>2 && temp.substr(temp.size()-3, 3)=="ing")
           std::cout << temp;
}

, Python, : ++ "", , - ( , ++ ).

: , , , ++ , Python. , , .

: , ++ , :

for (std::string temp; std::cin>>temp; )
    temp.size()>2 && temp.substr(temp.size()-3, 3)=="ing" && std::cout << temp;

... ( ) : " ++ , Python".

+3

, C C-

C C- , ++. ++ , .

python, , .

, , , ( ) .

+2

C/++ - "" . LOC C/++, .

0

Source: https://habr.com/ru/post/1734598/


All Articles