Grouping Search Keywords

I have a log file containing search queries entered into my site search engine. I would like to “group” related search queries together for a report. I use Python for most of my webapp - so the solution can be based on Python or I can load strings in Postgres if it is easier to do with SQL.

Sample data:

dog food
good dog trainer
cat food
veterinarian

Groups should include:

cat:
cat food

dog:

dog food
good dog trainer

Food:

dog food
cat food

etc...

Ideas? Maybe some kind of "indexing algorithm"?

+3
source share
5 answers
f = open('data.txt', 'r')
raw = f.readlines()

#generate set of all possible groupings
groups = set()
for lines in raw:
    data = lines.strip().split()
    for items in data:
        groups.add(items)

#parse input into groups
for group in groups:
    print "Group \'%s\':" % group
    for line in raw:
        if line.find(group) is not -1:
            print line.strip()
    print

#consider storing into a dictionary instead of just printing

, , :

Group 'trainer':
good dog trainer

Group 'good':
good dog trainer

Group 'food':
dog food
cat food

Group 'dog':
dog food
good dog trainer

Group 'cat':
cat food

Group 'veterinarian':
veterinarian
+3

, , , . SQL, , ..

SELECT * FROM QUERIES WHERE `querystring` LIKE '%dog%'.

, , "dogbah", , OR, , , .

+1

, , , - , , .

, - , (-), . , , "".

0

( )

  • , , .
  • (, -). - , ( , )

():

create empty set S for name value pairs.
for each line L parsed
  for each word W in line L
    seek W in set S -> Item
    if not found -> add word W -> (empty array) to set S
    add line L reference to array in Ietm
  endfor
endfor

( (: W))

seek W in set S into Item
if found return array from Item
else return empty array.
0

A modified version of @swanson's answer (not tested):

from collections import defaultdict
from itertools   import chain

# generate set of all possible words
lines = open('data.txt').readlines()
words = set(chain.from_iterable(line.split() for line in lines))

# parse input into groups
groups = defaultdict(list)
for line in lines:    
    for word in words:
        if word in line:
           groups[word].append(line)
0
source

Source: https://habr.com/ru/post/1733063/


All Articles