My first thought was to use regular expressions to get all the names that might be company names out of the article. Company names can be completely different, but almost always every word in the name begins with a capital letter, so I think that this can only work with a few false positives (situations where, apparently, people have a name with the company).
There is a reason why we use a prefix like # or @ for tags or referral names, this helps to create a pattern mapping. I think that you will shoot in the foot if you allow βfalse positivesβ on this scale.
I would have acted in accordance with the standard ticker article formats, including the name of the company name or background information on stock tickets, such as American Company Co. (ACCO) American Company Co. (ACCO) , this allows you to simply search for links (*) .
In addition to adhering to the format, it will be difficult for you to get fast, relevant and accurate results.
A comprehensive solution will be server-side processing for false positives, downloading a complete list of names and a crunch for matches, with some warning system with viewing warnings, but this is just too much overhead when a simple long format setup is possible)
Jakub source share