Parsing text for contents of a limited size database table

I have a MySQL people table as part of a website, for example:

  |  people_id |  firstname |  lastname |
 -----------------------------------------
 |  1 |  John |  Lennon |
 |  2 |  Paul |  McCartney |
 |  3 |  George |  Harrison |
 |  4 |  Ringo |  Starr |
 |  .  |  .  |  .  |

My table has about 2000 rows.

I also have a news section on the website. Often this news contains links to "people", for example.

John Lennon and Paul McCartney wrote some of the most popular songs in rock history.

Is it possible (or reasonably / expediently) to automatically analyze each news item in order to search for "people" who are in the database, and then turn them into links. So, for example, the above text will be turned into this (or something functionally equivalent):

<a href="/people/1>John Lennon</a> and <a href="/people/2">Paul McCartney</a> wrote some of the most popular songs in the history of rock music. 

What would be the best way to do this? I made some unsuccessful attempts to do this using regular expressions in php, but I think this is not the best approach. I know little about javascript (and its frameworks), but I would be happy to use this if it makes sense to do this.

This is not an essential feature of the website (but I believe that it would be a nice addition), so I would prefer to abandon such a function rather than increase the page load time.

EDIT

In the original question, I left some details to keep the length.

In fact, this is a website for a football club - all โ€œpeopleโ€ are members of the website and can log in, add and edit news (for example, match reports), in which they often link to other โ€œpeopleโ€. So not only am I adding news - they can be added by other users (circa 2000).

Although membership is limited to requiring people to be approved before joining, the system must be able to deal with difficulties such as people with unusual names, and there are several examples of more than one person with the same name.

I implemented a peculiar solution in which I use a type of patented code to indicate the names of people up / down (for example, [p = 1] John Lennon [/ p]), but I found that out of 2,000 users of the site, only a few use it.

For what it's worth, the website is www.ouafc.com, and an example of news is www.ouafc.com/news/312.

+4
source share
2 answers

I don't know much about php, but here's the fast JavaScript that follows it with jQuery 1.4:

 <div id="maindiv"> John Lennon and Paul McCartney wrote some of the most popular songs in the history of rock music. </div> <script> $(document).ready(function(){ myPage.linkify($("#maindiv")); }) var myPage = { map: { "John Lennon": 1, "Paul McCartney": 2, "Rock Music": 3 }, linkify: function(domEl){ var htmlcopy = domEl.html(); function buildLink(txt, loc){ return '<a "href = /blah/'+loc+'>'+txt+'</a>'; } for(i in myPage.map){ var tmpStr = new RegExp(i,"gi"); htmlcopy = htmlcopy.replace( tmpStr, buildLink(i, myPage.map[i]) ); } domEl.html(htmlcopy); } } </script> 

myPage.map will be built on the server side from the database. It can also be a callback to the Ajax function (which will capture the map) so that it does not leave the rest of the page in its action.

+2
source

Itโ€™s best to manually open the newsletter to indicate when the name appears. This is the only way to prevent missing names or incorrectly parsed names and to avoid the huge processing requirement of scanning each news item for every possible name from the database.

Maybe something with twittery syntax like:

 @[John Lennon] and @[Paul McCartney] wrote some of the most popular songs in the history of rock music. 

Then run it through the custom view markdown - style when you want to display the news. He can analyze these markers, find the corresponding database record and generate a link.

It would be more efficient to convert @ [] markers to links before embedding a news story in the database, but this is more closely related - if the user is deleted or his identifier changes, you have a broken link. Storing @ [] also makes editing stories easy.

Update

If you must have automatic detection and conversion of names to links, this can be achieved with a rather serious performance degradation, which increases only when names are added:

 function linkify_names($news) { $people = query('select people_id, firstname, lastname from people'); $from = $to = array(); foreach ($people as $person) { $name = "$person->firstname $person->lastname"; // TODO - escape regex chars in $name? // match [boundary]$name[boundary], case insensitive $from[] = "/(\b)($name)(\b)/i"; // include boundaries in replacement; maintain case of found name $to[] = '$1<a href="/people/' . $person->people_id .'">$2</a>$3'; } return preg_replace($from, $to, $news); } 

The difference is that instead of looking only for names tagged with @ [] tags, you should look for all the names and exhaustively search for them. You cannot rely on simple regular expressions to find names in the body of a news document.

+1
source

Source: https://habr.com/ru/post/1300191/


All Articles