The best data structure for representing English-verb-form information

Update . I could formulate this question in abstract terms, but in this way it would be less illustrative. Therefore, please do not reduce it to be too specific.

I need to create a data structure to store information about the forms of English verbs. In most cases, a verb can be in one of four forms: the base, the present participle, the past participle and the past simple, for example:

  • to accept
  • with
  • take
  • took

It seems easy to identify 4 types for each form and finish with it. However, there are a few exceptions that destroy this simple idea.

  • The real third party form, which in our example will be “he / she / she takes”.
  • The copular verb "be" has several irregular forms in the present tense: "am", "is", "are" and "was" and "were" in the past tense
  • Verbs like "may" that cannot fade in the form of a real third party: "she can."

What data structure will be effective, accurate, but unambiguous for the presentation of such information (in exceptional cases), given the following requirements:

  • for arbitrary shape answer the question which conjugations form the form
  • for arbitrary conjugation and forms answer the question, is the form a given conjugation or not?

Update : by effective, I meant that

  • The answer to these questions should be fairly quick, and then slow.
  • Memory consumption should be quite low and then high.
  • the definition of the data structure should be rather brief and then detailed (I understand that efficiency is a matter of compromise).
+4
source share
1 answer

Tables you will need:

Infinitives -------- InfinitiveId Name Tenses (ie Present, Preterite, Imperative, Past, Present continuous, Past continuous, Past perfect continuous, Future, etc., etc.) ------ TenseId Name Subjects (ie I, you, he/she, we, plural you, they) ------- SubjectId Name Conjugations ------------ InfinitiveId TenseId SubjectId Conjugation 

InfinitiveId would be small if you did not have more than 32,000 verbs (which I highly doubt). TenseId and SubjectId will be tinyint, InfinitiveId will be int, and all Name fields will be varchar.

Your update on "effectiveness" is completely pointless because you basically said: "I want the best of all worlds, although I understand that there must be compromises." You have not told us how you plan to use this. For example, will it be a public database that gets clogged with lots of traffic? Will it be for checking grammar in a text editor where only one person uses it at a time? We do not know how to tell you what the best compromises are if you do not tell us what you are doing with it.

Not knowing more than me, my suggestion would be to not worry at all about the need for memory. Just make a huge search table with every opportunity (realizing that there will be a ton of repetition of verbs over conjugations that use the same forms). I can’t imagine that you have enough verbs that make it even close to what I consider to be a large database.

Edit:

A possible improvement to the above would be to add another table that contains unique conjugations of the verb. Thus, your Conjugations table may refer to an identifier on that table, rather than repeating the actual text of the verb over and over.

0
source

Source: https://habr.com/ru/post/1492609/


All Articles