HBase: get (...) vs scan and in-memory table

I am performing MR on HBase.

The gearbox business logic strongly refers to two tables: T1 (lines 40k) and T2 (lines 90k). I am currently following these steps:

1. In the gearbox class constructor, do the following:

HBaseCRUD hbaseCRUD = new HBaseCRUD(); HTableInterface t1= hbaseCRUD.getTable("T1", "CF1", null, "C1", "C2"); HTableInterface t2= hbaseCRUD.getTable("T2", "CF1", null, "C1", "C2"); 

In abbreviation (...)

  String lowercase = ....; /* Start : HBase code */ /* * TRY using get(...) on the table rather than a * Scan! */ Scan scan = new Scan(); scan.setStartRow(lowercase.getBytes()); scan.setStopRow(lowercase.getBytes()); /*scan will return a single row*/ ResultScanner resultScanner = t1.getScanner(scan); for (Result result : resultScanner) { /*business logic*/ } 

Although I’m not sure that the meaning of the code is in the first place, I have a question - could I get (...) provide any performance advantage over scanning?

 Get get = new Get(lowercase.getBytes()); Result getResult = t1.get(get); 

Since T1 and T2 will be read-only (mostly), I think that if stored in memory, performance will improve. According to the HBase document, I will have to recreate tables T1 and T2. Please check the correctness of my understanding:

 public void createTables(String tableName, boolean readOnly, boolean blockCacheEnabled, boolean inMemory, String... columnFamilyNames) throws IOException { // TODO Auto-generated method stub HTableDescriptor tableDesc = new HTableDescriptor(tableName); /* not sure !!! */ tableDesc.setReadOnly(readOnly); HColumnDescriptor columnFamily = null; if (!(columnFamilyNames == null || columnFamilyNames.length == 0)) { for (String columnFamilyName : columnFamilyNames) { columnFamily = new HColumnDescriptor(columnFamilyName); /* * Start : Do these steps ensure that the column * family(actually, the column data) is in-memory??? */ columnFamily.setBlockCacheEnabled(blockCacheEnabled); columnFamily.setInMemory(inMemory); /* * End : Do these steps ensure that the column family(actually, * the column data) is in-memory??? */ tableDesc.addFamily(columnFamily); } } hbaseAdmin.createTable(tableDesc); hbaseAdmin.close(); } 

Made by:

  • How to check that the columns are in memory (of course, the description operator and the browser reflect it) and access to them, and not to the disk?
  • Is reading from memory or disk transparent to the client? In simple words, do I need to change the HTable passcode in my reducer class? If so, what are the changes?
+6
source share
2 answers

can get (...) provide any performance benefit when scanning?

Get runs directly on a specific string identified by the row string passed as a parameter to the Get instance. Although Scan works in all lines, if you did not use a range query, specifying the start and end lines of keys in the Scan instance. Clearly, this is more efficient if you know in advance which line to work on. You can directly go there and perform the required operation.

How to check that the columns are in memory (of course, the description operator and the browser reflect it) and access to them, and not to the disk?

You can use the isInMemory () method provided by HColumnDescriptor to check if a particular CF is in memory or not. But you cannot find out that the whole table is in memory and whether sampling from disk or from memory occurs. Although memory blocks in memory have the highest priority, they are not 100% sure that everything is in memory all the time. It is important to note that data is saved to disk even if CF is used in memory.

Is reading from memory or disk transparent to the client? In simple words, do I need to change the HTable passcode in my reducer class? If so, what are the changes?

Yes. It is completely transparent. You do not have to do anything.

+8
source
  • There is a significant difference between the two regarding implementation. They are both identical to the customer.
+3
source

Source: https://habr.com/ru/post/953615/


All Articles