I am performing MR on HBase.
The gearbox business logic strongly refers to two tables: T1 (lines 40k) and T2 (lines 90k). I am currently following these steps:
1. In the gearbox class constructor, do the following:
HBaseCRUD hbaseCRUD = new HBaseCRUD(); HTableInterface t1= hbaseCRUD.getTable("T1", "CF1", null, "C1", "C2"); HTableInterface t2= hbaseCRUD.getTable("T2", "CF1", null, "C1", "C2");
In abbreviation (...)
String lowercase = ....; Scan scan = new Scan(); scan.setStartRow(lowercase.getBytes()); scan.setStopRow(lowercase.getBytes()); ResultScanner resultScanner = t1.getScanner(scan); for (Result result : resultScanner) { }
Although Iām not sure that the meaning of the code is in the first place, I have a question - could I get (...) provide any performance advantage over scanning?
Get get = new Get(lowercase.getBytes()); Result getResult = t1.get(get);
Since T1 and T2 will be read-only (mostly), I think that if stored in memory, performance will improve. According to the HBase document, I will have to recreate tables T1 and T2. Please check the correctness of my understanding:
public void createTables(String tableName, boolean readOnly, boolean blockCacheEnabled, boolean inMemory, String... columnFamilyNames) throws IOException {
Made by:
- How to check that the columns are in memory (of course, the description operator and the browser reflect it) and access to them, and not to the disk?
- Is reading from memory or disk transparent to the client? In simple words, do I need to change the HTable passcode in my reducer class? If so, what are the changes?