How to filter HBase Scan by part of a row key?

I have an HBase table with row keys that consist of a text id and a timestamp like next:

...
string_id1.1470913344067
string_id1.1470913345067
string_id2.1470913344067
string_id2.1470913345067
...

How to filter Scan HBase (in Scala or Java) to get results with some string id and timestamp greater than some value?

thank

+7
source share
3 answers

, :
 - PrefixFilter ( . - , "string_id1." )
 - RowFilter ( : - CompareOp.GREATER_OR_EQUAL, - , "string_id1.1470913345000"

, string_id, , , . , .

:

val s = new Scan()
s.addFamily(family.getBytes)
val filterList = new FilterList()
filterList.addFilter(new PrefixFilter(Bytes.toBytes(prefixOfRowKey)))
filterList.addFilter(new RowFilter(CompareOp.GREATER_OR_EQUAL, new BinaryComparator(valueForBinaryFilter.getBytes())))
s.setFilter(filterList)
val scanner = table.getScanner(s)

, .

-2

, : , FuzzyRowFilter .

, - userId_actionId_timestamp ( userId , , 4 ), ????_login_. FuzzyRowKey:

FuzzyRowFilter rowFilter = new FuzzyRowFilter(
 Arrays.asList(
  new Pair<byte[], byte[]>(
    Bytes.toBytesBinary("\x00\x00\x00\x00_login_"),
    new byte[] {1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0})));

hbase-the-Definitive Guide → API:

+4

Suppose you somehow ended up having your rows in a monadic structure like List or RDD. Now you want to have only lines with id = "string_id2"and timestamp > 1470913345000.

Now what is the problem? Just filter the passing monadic structure by these two criteria.

val filtered = listOrRddOfLines
  .map(l => {
    val idStr :: timestampStr :: Nil = l.split('.').toList
    (idStr, timestampStr.toLong)
  })
  .filter({
    case (idStr, timestamp) => idStr.equals("string_id2") && (timestamp > "1470913345000".toLong)
  })
-2
source

Source: https://habr.com/ru/post/1650916/


All Articles