How does Firefox "awesome" bar match strings?

The question is how string matching is done to find the matching entries in firefox 3 url bar . Substituting substrings on each record can be slow. What algorithm can be used to quickly match anywhere?

+27
string algorithm firefox
Feb 12 '09 at 10:24
source share
4 answers

The usual way to quickly substring is to create a data structure that contains all the suffixes of all the strings you want to find. Depending on the organization, this may be called a "suffix" or a "suffix array". For example, if you have 1000 lines and each of them has a length of 50 characters, you have 1,000 x 50 non-trivial suffixes, i.e. Your suffix array will have 50,000 entries.

Then, to perform the mapping, you perform a binary search (if an array) or a tree search (if a tree) to find all these suffixes in the data structure, the beginning of which corresponds to the line written in the search window. Since this is the beginning of the suffix that you are matching, you can use standard search procedures (binary search, downstream) to quickly get the result. Each suffix is ​​associated with the lines in which it appears.

Example: you have two lines: "CAT" and "DOT". Your suffix array looks like this (pay attention to lexicography = alphabetical ordering):

#1 AT --> CAT #2 CAT --> CAT #3 DOT --> DOT #4 OT --> DOT #5 T --> DOT, CAT 

Note that there are six suffixes, but two of them are the same (the last “T” in “CAT” and “DOT”), and both are represented by the same entry (# 5).

Now that the user is entering a search, for example. "OT" (which must match "DOT"), you can do a simple search by lexicographic ordering during registration, because now you are looking for an initial match in an array of suffixes.

This is the basic principle for quick text search when the search pattern does not contain wildcards.

+27
Feb 12 '09 at 17:39
source share

The location algorithm in Firefox 3.0 is a bit more complicated. It will receive data from two (three for Firefox 3.5 and later) different requests:

  • For the first query, he checks the moz_inputhistory table to see if the current search string is stored in this table. These results are sorted by "rank", which is a number that determines how recently it is used. This number is decomposed once a day. This search makes the location bar adaptive to what you select over time (implemented in bug 95739 ).

  • In Firefox 3.5 and later, it checks the moz_keyword table for bookmarks with a keyword matching the search text.

  • The last request, it goes through each entry in moz_places, which will include all history visits and bookmarks. These results are ordered by frecency .

For all three of these cases, the following algorithm (called “searchable text” below) is used to match the tags, title, and URL. It is a little difficult to explain in words, so it would be easier to look at the source code .

  • The search string is divided into tokens defined by a space (each non-space word "is a token").
  • For each token, start comparing each character of the text with searchability and the token in Unicode, case insensitive, until it matches exactly. If one character set does not match, continue to the next "
  • If we match any searchable text, it will appear in the location bar.
If we don’t have enough results (by default 12), we will repeat the search for the query that goes through each bookmark and visit the history, and test the text with the ability to search in Unicode, case insensitive for each token anywhere (not only at word boundaries).

Hope this explains it in an understandable way!

+31
Jul 30 '09 at 18:30
source share

Awesomebar offers URLS using the Places distribution algorithm .

According to the Mozilla developer site:

The word "frecency" in itself is a combination of the words "frequency" and "regency".

+21
Feb 12 '09 at 10:38
source share

I think this remains for the underlying repository: the SQLite database in which Firefox stores the pages you visited offers a quick function to compare the substring.

I think Firefox issues an SQL query to the database. This is pretty fast because the database is cached in your memory.

+2
Feb 12 '09 at 10:31
source share



All Articles