Presentation My Life with HBase


HBase is an Open Source implementation of Google's BigTable architecture. Its goal is the hosting of very large tables - billions of rows, millions of columns - atop clusters of "commodity" hardware. This talk reports on findings along the way of setting up HBase clusters of various size and use.
Published on: 2010-02-12T13:30:24.000Z
Channel: FOSDEM NoSQL (all)
Tags: fosdem hbase nosql
Speakers:

Lars George


Lars George is the CTO of WorldLingo and uses HBase to host their Multilingual Archive.


Slides:

“My Life with HBase”


“My Life with HBase” Lars George, CTO of WorldLingo Apache Hadoop HBase Committer www.worldlingo.com www.larsgeorge.com

WorldLingo


WorldLingo Co-founded 1999  Machine Translation Services  Professional Human Translations  Offices in US and UK  Microsoft Office Provider since 2001  Web based services  Customer Projects  Multilingual Archive 

Multilingual Archive


Multilingual Archive SOAP API  Simple calls  ◦ ◦ ◦ ◦ ◦ ◦ putDocument() getDocument() search() command() putTransformation() getTransformation()

Multilingual Archive (cont.)


Multilingual Archive (cont.) Planned already, implemented as customer project  Scale:  ◦ 500million documents ◦ Random Access ◦ “100%” Uptime  Technologies? ◦ Database ◦ Zip-Archives on file system, or Hadoop

RDBMS Woes


RDBMS Woes         Scaling MySQL hard, Oracle expensive (and hard) Machine cost goes up faster speed Turn off all relational features to scale Turn off secondary indexes too Tables can be a problem at sizes as low as 500GB Hard to read data quickly at these sizes Write speed degrades with table size Future growth uncertain

MySQL Limitations


MySQL Limitations Master becomes a problem  What if your write speed is greater than a single machine  All slaves must have same write capacities as master (can‘t check out on slaves)  Single point of failure, no easy failover  Can (sort of) solve this with sharding 

Sharding


Sharding

Sharding Problems


Sharding Problems Requires either a hashing function or mapping table to determine shard  Data access code becomes complex  What if shard sizes become too large? 

Resharding


Resharding

Schema Changes


Schema Changes What about schema changes or migrations?  MySQL not your friend here  Only gets harder with more data 

HBase to the Rescue


HBase to the Rescue Clustered, commodity(-ish) hardware  Mostly schema-less  Dynamic distribution  Spreads writes out over the cluster 

HBase


HBase  Distributed database modeled on Bigtable ◦ Bigtable: A Distributed Storage System for Structured Data by Chang et al. Runs on top of Hadoop Core  Layers on HDFS for storage  Native connections to MapReduce  Distributed, High Availability, High Performance, Strong Consistency 

HBase


HBase  Column-oriented store ◦ ◦ ◦ ◦ Wide table costs only the data stored NULLs in row are 'free' Good compression: columns of similar type Column name is arbitrary Rows stored in sorted order  Can random read and write  Goal of billions of rows X millions of cells  ◦ Petabytes of data across thousands of servers

untitled



untitled



Tables


Tables Table is split into roughly equal sized „regions“  Each region is a contiguous range of keys, from [start, to end)  Regions split as they grow, thus dynamically adjusting to your data set 

Tables (cont.)


Tables (cont.) Tables are sorted by Row  Table schema defines column families  ◦ Families consist of any number of columns ◦ Columns consist of any number of versions ◦ Everything except table name is byte[] (Table, Row, Family:Column, Timestamp)  Value

Tables (cont.)


Tables (cont.)  As a data structure SortedMap( RowKey, List( SortedMap( Column, List( Value, Timestamp ) ) ) )

Server Architecture


Server Architecture  Similar to HDFS ◦ Master ≈ Namenode ◦ Regionserver ≈ Datanode Often run these alongsaide each other!  Difference: HBase stores state in HDFS  HDFS provides robust data storage across machines, insulating against failure  Master and Regionserver fairly stateless and machine independent 

Region Assignment


Region Assignment Each region from every table is assigned to a Regionserver  Master Duties:  ◦ Reponsible for assignment and handling regionserver problems (if any!) ◦ When machines fail, move regions ◦ When regions split, move regions to balance ◦ Could move regions to respond to load ◦ Can run multiple backup masters

Master


Master  The master does NOT ◦ ◦ ◦ ◦ Handle any write requests (not a DB master!) Handle location finding requests Not involved in the read/write path Generally does very little most of the time

Distributed Coordination


Distributed Coordination Zookeeper is used to manage master election and server availability  Set up as a cluster, provides distributed coordination primitives  An excellent tool for building cluster management systems 

HBase Storage Architecture


HBase Storage Architecture

HBase Public Timeline


HBase Public Timeline         November 2006 ◦ Google releases paper on Bigtable February 2007 October 2007 ◦ Initial HBase prototype created as Hadoop contrib ◦ First "useable" HBase (0.15.0 Hadoop) December 2007 ◦ First HBase User Group January 2008 ◦ Hadoop becomes TLP, HBase becomes subproject October 2008 ◦ HBase 0.18.1 released January 2009 ◦ HBase 0.19.0 released ◦ HBase 0.20.0 released September 2009

HBase WorldLingo Timeline


HBase WorldLingo Timeline

HBase - Example


HBase - Example  Store web crawl data ◦ Table crawl with family content ◦ Row is URL with columns  content:data stores raw crawled data  content:language stores http language header  content:type stores http content-type header ◦ If processing raw data for hyperlinks and images, add families links and images  links: column for each hyperlink  links: column for each image

HBase - Clients


HBase - Clients  Native Java client/API ◦ get(Get get) ◦ put(Put put)  Non-Java clients ◦ Thrift server (Ruby, C++, Erlang, etc.) ◦ REST server (Stargate) TableInput/TableOutputFormat for MapReduce  HBase shell (jruby) 

Scaling HBase


Scaling HBase   Add more machines to scale ◦ Automatic rebalancing Base model (BigTable) scales past 1000TB  No inherent reason why Hbase couldn‘t

What to store in HBase


What to store in HBase Maybe not your raw log data...  ... but the results of processing it with Hadoop!  By storing the refined version in HBase, can keep up with huge data demands and serve to your website 

!HBase


!HBase  “NoSQL” Database! ◦ ◦ ◦ ◦ ◦ No joins No sophisticated query engine No transactions (sort of) No column typing No SQL, no ODBC/JDBC, etc. (but there is HBql now!) Not a replacement for your RDBMS...  Matching Impedance! 

Why HBase?


Why HBase? Datasets are reaching Petabytes  Traditional databases are expensive to scale and difficult to distribute  Commodity hardware is cheap and powerful (but HBase can make use of powerful machines too!)  Need for random access and batch processing (which Hadoop does not offer) 

Numbers


Numbers Single reads are 1-10ms depending on disk seeks and caching  Scans can return hundreds of rows in dozens of ms  Serial read speeds 

Multilingual Archive (cont.)


Multilingual Archive (cont.) Planned already, implemented as customer project  Scale:  ◦ 500million documents ◦ Random Access ◦ “100%” Uptime  Technologies? ◦ Database ◦ Zip-Archives on file system, or Hadoop

Lucene Search Server


Lucene Search Server 43 fields indexed  166GB size  Automated merging/warm-up/swap  Looking into scalable solution  ◦ ◦ ◦ ◦  Katta Hyper Estraier DLucene … Sorting?

Multilingual Archive (cont.)


Multilingual Archive (cont.) 5 Tables  Up to 5 column families  XML Schemas  Automated table schema updates  Standard options tweaked over time  ◦ Garbage Collection!  MemCached(b) layer

Layers


Layers Firewall Network LWS Web App Cache Data Director 1 Director n Apache 1 Apache n … Tomcat 1 Tomcat n Tomcat 1 Tomcat n MemCach ed 1 HBase MemCach ed n

Map/Reduce


Map/Reduce Backup/Restore  Index building  Cache filling  Mapping  Updates  Translation 

HBase - Problems


HBase - Problems  Early versions (before HBase 0.19.0!) ◦ Data loss ◦ Migration nightmares ◦ Slow performance   Current version ◦ Read HBase Wiki!!! Single point of failure (name node only!)

HBase - Notes


HBase - Notes    RTF M HBase Wiki, IRC Channel Personal Experience: (ine) ◦ ◦ ◦ ◦ ◦ ◦ ◦ Max. file handles (32k+) Hadoop xceiver limits (NIO?) Redundant meta data (on name node) RAM (4GB+) Deployment strategy Garbage collection (use CMS, G1?) Maybe not mix batch and interactive?

Graphing


Graphing Use supplied Ganglia context or JMX bridge to enable Nagios and Cacti  JMXToolkit: swiss army knife for JMX enabled servers: http://github.com/ larsgeorge/jmxtoolkit 

HBase - Roadmap


HBase - Roadmap  HBase 0.20.x “Performance” ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ New Key Format – KeyValue New File Format – Hfile New Block Cache – Concurrent LRU New Query and Result API New Scanners Zookeeper Integration – No SPOF in HBase New REST Interface Contrib  Transactional Tables  Secondary Indexes  Stargate

HBase - Roadmap (cont.)


HBase - Roadmap (cont.)  HBase 0.21.x “Advanced Concepts” ◦ ◦ ◦ ◦ ◦ Master Rewrite – More Zookeeper New RPC Protocol (Avro) Multi-DC Replication Intra Row Scanning Further optimizations on algorithms and data structures ◦ Discretionary Access Control ◦ Coprocessors

Questions?


Questions? lars@worldlingo.com larsgeorge@apache.org lars@larsgeorge.com  Blog: www.larsgeorge.com  Twitter: larsgeorge  Email: