Friday, August 16, 2019

HBase Read

A read against HBase must be reconciled between the HFiles, MemStore & BLOCKCACHE.The BlockCache is designed to keep frequently accessed data from the HFiles in memory so as to avoid disk reads.

Each column family has its own BlockCache.BlockCache contains data in form of 'block', as unit of data that HBase reads from disk in a single pass.

The HFile is physically laid out as a sequence of blocks plus an index over those blocks. This means reading a block from HBase requires only looking up that block's location in the index and retrieving it from disk.

Block: It is the smallest indexed unit of data and is the smallest unit of data that can be read from disk. default size 64KB.

Scenario, when smaller block size is preferred: To perform random lookups. Having smaller blocks creates a larger index and thereby consumes more memory.

Scenario, when larger block size is preferred: To perform sequential scans frequently. This allows you to save on memory because larger blocks mean fewer index entries and thus a smaller index.
Reading a row from HBase requires first checking the MemStore, then the BlockCache, Finally, HFiles on disk are accessed.

No comments:

Post a Comment

Lab 09: Publish and subscribe to Event Grid events

  Microsoft Azure user interface Given the dynamic nature of Microsoft cloud tools, you might experience Azure UI changes that occur after t...