Find phone numbers - search for numbers with and without phone extension

I have a table with 130,000 entries with phone numbers. All numbers are formed as follows: +4311234567. Numbers always include the international country code, LAN code, followed by a phone number, and sometimes an extension.

There is a web service that checks the caller number in the table. This service is already running. But now the client wants and if someone calls from a company whose number is already in the database, but not its extension, that the service will return some result.

An example for a table.

  ** id ** |  ** telephonenumber ** |  ** name **   
 |  1 |  +431234567 |  company A  
 |  2 |  +431234567890 |  employee in company A  
 |  3 |  +4398765432 |  company b 

Now, if someone from company A calls with a different extension, for example, +43123456777, than he should return id1. But the problem is that I do not know how many digits have extensions. It may have 3.4 or more digits.

Are there any patterns for string type?

Data is stored in sql2005 database.

thanks

EDIT:
The phone calls I receive from the crm system. I talked to the crm administrator and he is trying to send me data in a different format.

  ** id ** |  ** telephonenumber ** | ** extension ** |  ** name **   
 |  1 |  +431234567 |  |  company A  
 |  2 |  +431234567 |  890 |  employee in company A  
 |  3 |  +4398765432 |  |  company b 
+4
source share
7 answers

Considering that the number of digits in the extension can be different for each company, and the number of digits in the number can be different for each country and region code, this is not an easy task for effective work.

Even if you split the data table into a base number and extension, you still have to split the incoming number into a base number and extension, which I actually find difficult.

I would like to try:

Original format

  • Try matching the incoming number to the database.
    • If it matches one entry, you have your answer - a specific person.
    • If it matches more than one entry, something went wrong, so it didn’t work.
    • Otherwise, you need to find a company:
  • Reset the remaining digit from the incoming number and try to match the database again.
    • If the number of digits falls below the threshold (probably 6 digits), then your search will probably not work. This is just to limit the number of searches in the database when the number is not found.
    • If it does not match any entries, you need to try this step again.
    • If it matches more than one entry, something went wrong, so it didn’t work.
    • If it matches one entry, you have the next best answer - company.

For example, the search "+43123456777":

  • +43123456777 corresponds to 0 articles.
  • +4312345677 corresponds to 0 articles.
  • +431234567 matches 1 record: "Company A"

The main way this approach is rejected is that the company has variable extension numbers. For example, consider what happens if both 431234567890 and 43123456789 are valid numbers, but only the second is in the database. If the incoming number is 431234567890, then error 43123456789 will be erroneous.

Split format

It is a bit more complicated, but more reliable.

  • Try matching the incoming number to the database.
    • If it matches one record, you have your own answer - the company.
    • If it matches more than one record, match the record without the extension and you find the company.
    • Otherwise, you need to find the number and extension of the base company:
  • Reset the remaining digit from the incoming number and try to match the database again.
    • If the number of digits falls below the threshold (probably 6 digits), then your search will probably not work. This is just to limit the number of searches in the database when the number is not found.
    • If it does not match any entries, you need to try this step again.
    • If it matches one entry, you have found your answer - the company.
    • If it matches more than one record, then you have found the base number of the company and, therefore, now you know the extension, so you can try to find a specific person:
  • Separate the base number from the beginning of the original incoming number and use it to search for record extensions with this base number.
    • If it matches one entry, you have found a specific person.
    • If it doesn’t correspond to a certain person, match the record without the extension and you will find the company.

For example, the search "+43123456777":

  • +43123456777 corresponds to 0 articles.
  • +4312345677 corresponds to 0 articles.
  • +431234567 corresponds to 2 articles: "empty: Company A" and "890: employee in company A"
  • In these two matches, “77” does not match anything, so return the empty extension: “Company A”.

Implementation Notes

This algorithm, as noted above, has some performance issues. If you search for a database of roads, it has a linear cost associated with the length of the phone number, especially if the database does not have the same numbers (for example, if the incoming number is from Kazakhstan, but there is no Kazakhstan number in datsbase * 8 ').

You can add some optimizations relatively easily. If most of the companies you work with use 3 or 4-bit extensions, you can start by removing, say, 4 digits from the end, and then do a binary beating until you get a response. This will reduce the number of 15 digits to 4 or 5 in many cases and not more than 6 search queries.

In addition, each time you narrow the selection, you can only select within the previous selection, and not select within the entire database.

Additional Implementation Notes

It finally became clear how the Non-Responsive Answer works, I see that this is a much simpler and more elegant solution. I wish I just tried just to find the database number on the incoming number, and not vice versa.

My only problem is that doing this on every telephonenumber in the database can impose excessive requirements on the server. I would suggest a comparative analysis of this solution at maximum voltage and see if this causes problems. If not, take advantage of this. If so, consider using a simple form of my algorithm and repeating stress tests. If performance is still too slow, try my binary search clause.

+2
source

Is there a way to determine which part of a stored number is an extension? Or “base” numbers are saved without extension. IF yes, you could just check if the number in your database (without extension) is the prefix of the current number to check. Prefix means a substring of a line starting at the beginning.

But if you only have numbers in your database with the extension, and there is no way to find out how many numbers belong to him, I believe that you cannot find the exact solution.

+4
source

Instead of looking for a phone number in the database, you can invert the problem and check each number in the database to see if it matches or prefixes of the incoming number.

Assuming you get a phone number, for example +431234567891, from the caller ID, then

 SELECT name, id FROM Table WHERE CHARINDEX(telephonenumber, "+431234567891") > 0; 

will return the company, and in the case of +431234567890 will return 2 records

  • the company
  • actual expansion

If you can deal with two strings returned from the client side, you should be fine with the above.

Data preprocessing is better (in terms of performance), but for this you need to describe the data in more detail, for example:

  • - these are extensions only 3 and 4 digits,
  • - the base number is always 9 or 10 digits,
  • You always have at least one extension number for companies with extensions, etc.
+2
source

The number of digits in the extension is specific to the PBX. The number of digits in the area code + phone number depends on the country / operator.

One way to do this is to define additional rules, for example ...

+43123 | 12

... to say that everything that starts with +43123 is a 12-digit number, and that everything that follows is an extension: it allows you to use (customizable instead of hard-coded) data to indicate where the extension will start.

Another way could be to insist that for any records with a number with extension there should also be a corresponding number without extension, as shown in the example of company A.

+1
source

Well, my understanding of the phone number system is that there are no two valid / full numbers, where one is the prefix of the other. The usual joke here is to give out your number 11 05 32 or something else, where 110 is the German emergency police number.

So - if you can change the structure of the database and reproduce the data, you can look for numbers that have the same prefix (first order them, if the longer starts with the shortest, these are extensions). Every match

  • Base number (shortest)
  • Direct number plus extension (all longer)

I would tag those in the database for faster searches if possible.

This approach is not suitable for the case when you have a standard default extension. Here, many companies issue something like 1234567-0 as an external number, where 0 can be replaced with an extension of 2-4 digits. For these cases, my approach will be short - for your example data, it will work, though?

+1
source

If you are dealing with phone numbers from different countries, it will be almost impossible. Length often varies, even within the same country. If you know what lengths will be (or you want to save the list, for example, ChrisW), you can use the LEFT function (field, x) to truncate the phone number before searching for the company phone number. Note that if you make a connection, it will probably work much slower because it has to run a function on every line.

+1
source

This will not be possible without additional information: if your table is structured as described above, the system does not have the means to know which part is the base number and which part is the extension. Thus, he will return "company b" for any (unknown) number, starting with "+439".

EDIT (@MarkBooth)

I agree that it is impossible without additional information. Just to clarify: let's say we have the following information in our database

 ... +43316852132 - .... +433168731 - Company A (reception) +433168739999 - Company A, Mr. X +433168911321 - .... ... 

The structure of these numbers ist +43 (316) 873 - 1, which the Program does not know. Therefore, if the number +43316872133 (+43 (316) 87 21 33 with the structure) calls up (which is not in the database), you (and therefore your software :)) cannot determine whether it belongs to company A or not information.

The only solution is to maintain "base numbers" for companies against which you can perform a simple prefix search.

-1
source

Source: https://habr.com/ru/post/1308772/


All Articles