Presentation My Life with HBase

HBase is an Open Source implementation of Google's BigTable architecture. Its goal is the hosting of very large tables - billions of rows, millions of columns - atop clusters of "commodity" hardware. This talk reports on findings along the way of setting up HBase clusters of various size and use.

Speakers


Slides

“My Life with HBase”

“My Life with HBase” Lars George, CTO of WorldLingo Apache Hadoop HBase Committer www.worldlingo.com www.larsgeorge.com

WorldLingo

WorldLingo Co-founded 1999  Machine Translation Services  Professional Human Translations  Offices in US and UK  Microsoft Office Provider since 2001  Web based services  Customer Projects  Multilingual Archive 

Multilingual Archive

Multilingual Archive SOAP API  Simple calls  ◦ ◦ ◦ ◦ ◦ ◦ putDocument() getDocument() search() command() putTransformation() getTransformation()

Multilingual Archive (cont.)

Multilingual Archive (cont.) Planned already, implemented as customer project  Scale:  ◦ 500million documents ◦ Random Access ◦ “100%” Uptime  Technologies? ◦ Database ◦ Zip-Archives on file system, or Hadoop

RDBMS Woes

RDBMS Woes         Scaling MySQL hard, Oracle expensive (and hard) Machine cost goes up faster speed Turn off all relational features to scale Turn off secondary indexes too Tables can be a problem at sizes as low as 500GB Hard to read data quickly at these sizes Write speed degrades with table size Future growth uncertain

MySQL Limitations

MySQL Limitations Master becomes a problem  What if your write speed is greater than a single machine  All slaves must have same write capacities as master (can‘t check out on slaves)  Single point of failure, no easy failover  Can (sort of) solve this with sharding 

Sharding

Sharding

Sharding Problems

Sharding Problems Requires either a hashing function or mapping table to determine shard  Data access code becomes complex  What if shard sizes become too large? 

Resharding

Resharding

Schema Changes

Schema Changes What about schema changes or migrations?  MySQL not your friend here  Only gets harder with more data 

HBase to the Rescue

HBase to the Rescue Clustered, commodity(-ish) hardware  Mostly schema-less  Dynamic distribution  Spreads writes out over the cluster 

HBase

HBase  Distributed database modeled on Bigtable ◦ Bigtable: A Distributed Storage System for Structured Data by Chang et al. Runs on top of Hadoop Core  Layers on HDFS for storage  Native connections to MapReduce  Distributed, High Availability, High Performance, Strong Consistency 

HBase

HBase  Column-oriented store ◦ ◦ ◦ ◦ Wide table costs only the data stored NULLs in row are 'free' Good compression: columns of similar type Column name is arbitrary Rows stored in sorted order  Can random read and write  Goal of billions of rows X millions of cells  ◦ Petabytes of data across thousands of servers

untitled

untitled

Tables

Tables Table is split into roughly equal sized „regions“  Each region is a contiguous range of keys, from [start, to end)  Regions split as they grow, thus dynamically adjusting to your data set 

Tables (cont.)

Tables (cont.) Tables are sorted by Row  Table schema defines column families  ◦ Families consist of any number of columns ◦ Columns consist of any number of versions ◦ Everything except table name is byte[] (Table, Row, Family:Column, Timestamp)  Value

Tables (cont.)

Tables (cont.)  As a data structure SortedMap( RowKey, List( SortedMap( Column, List( Value, Timestamp ) ) ) )

Server Architecture

Server Architecture  Similar to HDFS ◦ Master ≈ Namenode ◦ Regionserver ≈ Datanode Often run these alongsaide each other!  Difference: HBase stores state in HDFS  HDFS provides robust data storage across machines, insulating against failure  Master and Regionserver fairly stateless and machine independent 

Region Assignment

Region Assignment Each region from every table is assigned to a Regionserver  Master Duties:  ◦ Reponsible for assignment and handling regionserver problems (if any!) ◦ When machines fail, move regions ◦ When regions split, move regions to balance ◦ Could move regions to respond to load ◦ Can run multiple backup masters

Master

Master  The master does NOT ◦ ◦ ◦ ◦ Handle any write requests (not a DB master!) Handle location finding requests Not involved in the read/write path Generally does very little most of the time

Distributed Coordination

Distributed Coordination Zookeeper is used to manage master election and server availability  Set up as a cluster, provides distributed coordination primitives  An excellent tool for building cluster management systems 

HBase Storage Architecture

HBase Storage Architecture

HBase Public Timeline

HBase Public Timeline         November 2006 ◦ Google releases paper on Bigtable February 2007 October 2007 ◦ Initial HBase prototype created as Hadoop contrib ◦ First "useable" HBase (0.15.0 Hadoop) December 2007 ◦ First HBase User Group January 2008 ◦ Hadoop becomes TLP, HBase becomes subproject October 2008 ◦ HBase 0.18.1 released January 2009 ◦ HBase 0.19.0 released ◦ HBase 0.20.0 released September 2009

HBase WorldLingo Timeline

HBase WorldLingo Timeline

HBase - Example

HBase - Example  Store web crawl data ◦ Table crawl with family content ◦ Row is URL with columns  content:data stores raw crawled data  content:language stores http language header  content:type stores http content-type header ◦ If processing raw data for hyperlinks and images, add families links and images  links: column for each hyperlink  links: column for each image

HBase - Clients

HBase - Clients  Native Java client/API ◦ get(Get get) ◦ put(Put put)  Non-Java clients ◦ Thrift server (Ruby, C++, Erlang, etc.) ◦ REST server (Stargate) TableInput/TableOutputFormat for MapReduce  HBase shell (jruby) 

Scaling HBase

Scaling HBase   Add more machines to scale ◦ Automatic rebalancing Base model (BigTable) scales past 1000TB  No inherent reason why Hbase couldn‘t

What to store in HBase

What to store in HBase Maybe not your raw log data...  ... but the results of processing it with Hadoop!  By storing the refined version in HBase, can keep up with huge data demands and serve to your website 

!HBase

!HBase  “NoSQL” Database! ◦ ◦ ◦ ◦ ◦ No joins No sophisticated query engine No transactions (sort of) No column typing No SQL, no ODBC/JDBC, etc. (but there is HBql now!) Not a replacement for your RDBMS...  Matching Impedance! 

Why HBase?

Why HBase? Datasets are reaching Petabytes  Traditional databases are expensive to scale and difficult to distribute  Commodity hardware is cheap and powerful (but HBase can make use of powerful machines too!)  Need for random access and batch processing (which Hadoop does not offer) 

Numbers

Numbers Single reads are 1-10ms depending on disk seeks and caching  Scans can return hundreds of rows in dozens of ms  Serial read speeds 

Multilingual Archive (cont.)

Multilingual Archive (cont.) Planned already, implemented as customer project  Scale:  ◦ 500million documents ◦ Random Access ◦ “100%” Uptime  Technologies? ◦ Database ◦ Zip-Archives on file system, or Hadoop

Lucene Search Server

Lucene Search Server 43 fields indexed  166GB size  Automated merging/warm-up/swap  Looking into scalable solution  ◦ ◦ ◦ ◦  Katta Hyper Estraier DLucene … Sorting?

Multilingual Archive (cont.)

Multilingual Archive (cont.) 5 Tables  Up to 5 column families  XML Schemas  Automated table schema updates  Standard options tweaked over time  ◦ Garbage Collection!  MemCached(b) layer

Layers

Layers Firewall Network LWS Web App Cache Data Director 1 Director n Apache 1 Apache n … Tomcat 1 Tomcat n Tomcat 1 Tomcat n MemCach ed 1 HBase MemCach ed n

Map/Reduce

Map/Reduce Backup/Restore  Index building  Cache filling  Mapping  Updates  Translation 

HBase - Problems

HBase - Problems  Early versions (before HBase 0.19.0!) ◦ Data loss ◦ Migration nightmares ◦ Slow performance   Current version ◦ Read HBase Wiki!!! Single point of failure (name node only!)

HBase - Notes

HBase - Notes    RTF M HBase Wiki, IRC Channel Personal Experience: (ine) ◦ ◦ ◦ ◦ ◦ ◦ ◦ Max. file handles (32k+) Hadoop xceiver limits (NIO?) Redundant meta data (on name node) RAM (4GB+) Deployment strategy Garbage collection (use CMS, G1?) Maybe not mix batch and interactive?

Graphing

Graphing Use supplied Ganglia context or JMX bridge to enable Nagios and Cacti  JMXToolkit: swiss army knife for JMX enabled servers: http://github.com/ larsgeorge/jmxtoolkit 

HBase - Roadmap

HBase - Roadmap  HBase 0.20.x “Performance” ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ New Key Format – KeyValue New File Format – Hfile New Block Cache – Concurrent LRU New Query and Result API New Scanners Zookeeper Integration – No SPOF in HBase New REST Interface Contrib  Transactional Tables  Secondary Indexes  Stargate

HBase - Roadmap (cont.)

HBase - Roadmap (cont.)  HBase 0.21.x “Advanced Concepts” ◦ ◦ ◦ ◦ ◦ Master Rewrite – More Zookeeper New RPC Protocol (Avro) Multi-DC Replication Intra Row Scanning Further optimizations on algorithms and data structures ◦ Discretionary Access Control ◦ Coprocessors

Questions?

Questions? lars@worldlingo.com larsgeorge@apache.org lars@larsgeorge.com  Blog: www.larsgeorge.com  Twitter: larsgeorge  Email: