Design Idea for Computational Linguistics Using Hadoop MapReduce

Question

Design Idea for Computational Linguistics Using Hadoop MapReduce

I need to do a project on the course of computational linguistics. Is there any interesting “linguistic” problem that is informative enough to work with a Hadoop card? The solution or algorithm should try to analyze and give some idea of the "linguistic" domain. however, it must be applied to large datasets so that I can use chaop for this. I know there is a python natural language processing tool for hadoop.

+3

mapreduce hadoop nlp

Aditya andhalikar Mar 01 '10 at 2:31

source share

4 answers

Alex Martelli · Answer 1 · 2010-03-01T03:11:17+0000

If you have large cases in some "unusual" languages (in the sense of "those for which limited amounts of computational linguistics have been performed"), repeating some existing work on computational linguistics already done for very popular languages (such as English, Chinese, Arabic, ...) is a completely suitable project (especially in an academic setting, but it can be quite suitable for the industry too - back when I was in computational linguistics with IBM Research. I got an interesting run from the volume The case for the Italian and the repetition [[at the relatively new IBM science center in Rome]] is very similar to what the IBM Research team at Yorktown Heights [[of which I was part]] has already done for the English language.

/ ( , IBM Italy, , ).

, , : ( ..), "" ? , , , , , , , , , , , , .

, , "" , ? ( ), ( , , - CL !).

Bob Futrelle · Answer 2 · 2010-03-01T14:30:38+0000

300M 60K OA, BioMed Central. . , - - , .

Hadoop - , , , , . , , . .

, .

BioNLP.org
-

mrjf · Answer 3 · 2010-10-28T23:08:10+0000

CL . , (, , , ..) , .

, N x N, MapReduce.

:

http://wordspace.collocations.de/doku.php/course:acl2010:start

Binary nerd · Answer 4 · 2010-03-01T07:05:22+0000

, Python NLTK, dumbo Hadoop.

PyCon 2010 spoke well only on this topic. You can access the slides from the conversation using the link below.

Python and elephant: handling a large number of natural languages with NLTK and Dumbo

Design Idea for Computational Linguistics Using Hadoop MapReduce

More articles: