Counting the number of hits for a given search query / term for one document in Oracle

I have a table containing droplets of text of the document that I am joining. Using oracle text, I can get a piece of text containing my search query (using ctx_doc.snippet). However, now I have to indicate the number of times this search query was found for each document that did not match my attachment to all the documents that I have. I have more than 100K documents, but I join, I do and filter the return of the subset.

Reading online, there CTX_QUERY.COUNT_HITS, which I can use, but which gives an invoice for all documents. If I had a text parameter for COUNT_HITS, life would be good, but not exist.

How can I do the number of hits for a given query in a document in Oracle?

+4
source share
3 answers

You can continue to use CTX_DOC; the HIGHLIGHT routine may be slightly distorted to do exactly what you are asking for.

Using this environment:

create table docs ( id number, text clob, primary key (id) );

Table created.

insert all
 into docs values (1, to_clob('a dog and a dog'))
 into docs values (2, to_clob('a dog and a cat'))
 into docs values (3, to_clob('just a cat'))
select * from dual;

3 rows created.

create index i_text_docs on docs(text) indextype is ctxsys.context;

Index created.

CTX_DOC.HIGHLIGHT has an OUT parameter of type HIGHLIGHT_TAB, which contains the number of hits in the document.

declare
   l_highlight ctx_doc.highlight_tab;
begin
  ctx_doc.set_key_type('PRIMARY_KEY');

  for i in ( select * from docs where contains(text, 'dog') > 0 ) loop
     ctx_doc.highlight('I_TEXT_DOCS', i.id, 'dog', l_highlight);
     dbms_output.put_line('id: ' || i.id || ' hits: ' || l_highlight.count);
  end loop;

end;
/
id: 1 hits: 2
id: 2 hits: 1

PL/SQL procedure successfully completed.

Obviously, if you do this in a query, then the procedure is not the best thing in the world, but you can wrap it in a function if you want:

create or replace function docs_count (
        Pid in docs.id%type, Ptext in varchar2
         ) return integer is

   l_highlight ctx_doc.highlight_tab;
begin
  ctx_doc.set_key_type('PRIMARY_KEY');
  ctx_doc.highlight('I_TEXT_DOCS', Pid, Ptext, l_highlight);
  return l_highlight.count;
end;

This can then be called normal.

select id
     , to_char(text) as text
     , docs_count(id, 'dog') as dogs
     , docs_count(id, 'cat') as cats
  from docs;

        ID TEXT                  DOGS       CATS
---------- --------------- ---------- ----------
         1 a dog and a dog          2          0
         2 a dog and a cat          1          1
         3 just a cat               0          1

, . DBMS_LOB.GETLENGTH() LENGTH(), , REPLACE() CLOB, . - (, )

select (dbms_lob.getlength(text) - dbms_lob.getlength(replace(text, 'dog')))
         / length('dog')
  from docs

, ( ), , , .


:

... , , , ,

, , . , - . ( Joel post on strings, XML , , ). , , 100 , LOB ( , ), .

, , Oracle , . , . , . , - . , , , .

, , Oracle , CTX_DOC.HIGHLIGHT , - , , , , / .

length(replace(<original string>, <new string>)) - length(<original string) ( - ). , , , Oracle , LENGTH() . DBMS_LOB.GETLENGTH, LENGTH(); Oracle .

, , / .

+1

" " "clob", . - . .

:

select t.*
from (select t.*,
             length(replace(t.doc, KEYWORD, KEYWORD || 'x')) - length(t.doc) as nummatches
      from table t
     ) t
order by nummatches desc;
0

, pl/sql (, ), , ( ) count_hits.

-1

Source: https://habr.com/ru/post/1547800/


All Articles