How to omit "THE" in search using PHP and MYSQL

I am making an "ALPHABETICAL ORDER SEARCH" module for a project.

that is, it will look like

ABCDE F. ,,,, ... ,,, ..., Z

When I press " A ", the results should be sorted by " A ". Which is the same for all alphabets.

Now my problem is this:

  • For example, there is a movie called " The Mummy ".

  • What I do when I press " ALPHABET T ", this corresponding movie will be sorted.

  • But my client requirement is that the " Mummy" should be sorted when the user presses "M" and not "T"

  • Because "a, an," are "ARTICLES" and it has no meanings.

Now I hope everyone can understand what my problem is.

Any help would be appreciable and appreciative.

early

+4
source share
3 answers

Assuming you don't want to change the contents of the table (and therefore get slightly less efficient queries), the following should do the trick.
(If you have the leisure time to change the table, see Suggestions at the end of this answer)

SELECT Title FROM myTable WHERE (Title LIKE 'x%' OR Title LIKE 'THE x%') -- AND Title NOT LIKE 'THE [^T]%' ORDER BY Title 

Notes:
- x indicate the desired letter (example: LIKE "A%", etc.)
- The additional condition "AND THE TITLE DOES NOT LIKE" is necessary only when "X" is the letter "T" (otherwise it is functionally redundant, but does not change the result)
- I'm not sure about the support [^xyz] (that is, NOT characters x, y or z), so [^T] can be replaced with its positive equivalent say [A-RS-Z0-9] .

There are several other stop words to consider ("A", "AN", "OF" ...), but for books or movie titles, it is common practice to consider only "THE". If you must deal with other articles, the logic can be expanded, as in:

 SELECT Title FROM myTable WHERE (Title LIKE 'x%' OR Title LIKE 'THE x%' OR Title LIKE 'A x%' OR Title LIKE 'AN x%') -- the following is only needed when "x" is either the letter T or A. -- AND (Title NOT LIKE 'THE [^T]%' -- AND Title NOT LIKE 'A [^A]%' -- AND Title NOT LIKE 'AN [^A]%' -- ) ORDER BY Title 



There are better solutions if you can change the contents of the table . Some of them involve the preliminary calculation of one or more additional columns (and their preservation / adding them when adding new records, etc.).

  • See for example Cletus’s answer in this post for the sort_column approach, where the extra column contains a header that is free of any unwanted leading noise word. In addition to its purpose, as a filtering field in the problem of finding the initial letters OP, this column can also be used to more conveniently and intelligently sort heading lists created by a filter that is not associated with the initial letter and / or the beginning of the name (say, search by year).
  • The option above is to store only the “effective” initial letter (minus unwanted noise), making the column smaller, but less universal.
  • The heading column itself can be updated to preserve the changed heading shape, as a result of which extraneous leading noise words move to the end of the line between the brackets. This practice is quite common in bibliographic-type catalogs.
+2
source

What are you really asking here how to remove the words "stop words" ("the" is just one example: you need to delete "from", "a", etc.). Trying to hardcode a set of stop words is a HUGE pain in the butt, and as your body changes, you will have to change the code.

Instead, you should try to use an algorithm that will output what stop words are based on your case. Algorithms of this type are well known and used by search engines. One that works very well is called TF / IDF

+2
source

Basically, as you do this, you have an extra column to sort. If you have a movie table with a name column, add another column named sort_name . This should contain the name of the movie in lower case with any words that you want to ignore from the far side (for example, "the", "a").

Do not try to do this dynamically.

When the field is updated, you will also have to update the sort_name column. You can restore it at any time, and of course you will have to index it. Then just do:

 SELECT * FROM movies WHERE sort_name LIKE 'a%' 
+2
source

Source: https://habr.com/ru/post/1303639/


All Articles