Presentation My Life with HBase
HBase is an Open Source implementation of Google's BigTable architecture. Its goal is the hosting of very large tables - billions of rows, millions of columns - atop clusters of "commodity" hardware. This talk reports on findings along the way of setting up HBase clusters of various size and use.
Published on: 2010-02-12T13:30:24.000Z
Channel: FOSDEM NoSQL (all)
Tags: fosdem hbase nosql
Speakers:
Lars George
Lars George is the CTO of WorldLingo and uses HBase to host their Multilingual Archive.
Slides:
“My Life with HBase”
“My Life with HBase”
Lars George, CTO of WorldLingo Apache Hadoop HBase Committer www.worldlingo.com www.larsgeorge.com
WorldLingo
WorldLingo
Co-founded 1999 Machine Translation Services Professional Human Translations Offices in US and UK Microsoft Office Provider since 2001 Web based services Customer Projects Multilingual Archive
Multilingual Archive
Multilingual Archive
SOAP API Simple calls
◦ ◦ ◦ ◦ ◦ ◦
putDocument() getDocument() search() command() putTransformation() getTransformation()
Multilingual Archive (cont.)
Multilingual Archive (cont.)
Planned already, implemented as customer project Scale:
◦ 500million documents ◦ Random Access ◦ “100%” Uptime
Technologies?
◦ Database ◦ Zip-Archives on file system, or Hadoop
RDBMS Woes
RDBMS Woes
Scaling MySQL hard, Oracle expensive (and hard) Machine cost goes up faster speed Turn off all relational features to scale Turn off secondary indexes too Tables can be a problem at sizes as low as 500GB Hard to read data quickly at these sizes Write speed degrades with table size Future growth uncertain
MySQL Limitations
MySQL Limitations
Master becomes a problem What if your write speed is greater than a single machine All slaves must have same write capacities as master (can‘t check out on slaves) Single point of failure, no easy failover Can (sort of) solve this with sharding
Sharding
Sharding
Sharding Problems
Sharding Problems
Requires either a hashing function or mapping table to determine shard Data access code becomes complex What if shard sizes become too large?
Resharding
Resharding
Schema Changes
Schema Changes
What about schema changes or migrations? MySQL not your friend here Only gets harder with more data
HBase to the Rescue
HBase to the Rescue
Clustered, commodity(-ish) hardware Mostly schema-less Dynamic distribution Spreads writes out over the cluster
HBase
HBase
Distributed database modeled on Bigtable
◦ Bigtable: A Distributed Storage System for Structured Data by Chang et al.
Runs on top of Hadoop Core Layers on HDFS for storage Native connections to MapReduce Distributed, High Availability, High Performance, Strong Consistency
HBase
HBase
Column-oriented store
◦ ◦ ◦ ◦ Wide table costs only the data stored NULLs in row are 'free' Good compression: columns of similar type Column name is arbitrary
Rows stored in sorted order Can random read and write Goal of billions of rows X millions of cells
◦ Petabytes of data across thousands of servers
untitled
untitled
Tables
Tables
Table is split into roughly equal sized „regions“ Each region is a contiguous range of keys, from [start, to end) Regions split as they grow, thus dynamically adjusting to your data set
Tables (cont.)
Tables (cont.)
Tables are sorted by Row Table schema defines column families
◦ Families consist of any number of columns ◦ Columns consist of any number of versions ◦ Everything except table name is byte[]
(Table, Row, Family:Column, Timestamp) Value
Tables (cont.)
Tables (cont.)
As a data structure
SortedMap( RowKey, List( SortedMap( Column, List( Value, Timestamp ) ) ) )
Server Architecture
Server Architecture
Similar to HDFS
◦ Master ≈ Namenode ◦ Regionserver ≈ Datanode
Often run these alongsaide each other! Difference: HBase stores state in HDFS HDFS provides robust data storage across machines, insulating against failure Master and Regionserver fairly stateless and machine independent
Region Assignment
Region Assignment
Each region from every table is assigned to a Regionserver Master Duties:
◦ Reponsible for assignment and handling regionserver problems (if any!) ◦ When machines fail, move regions ◦ When regions split, move regions to balance ◦ Could move regions to respond to load ◦ Can run multiple backup masters
Master
Master
The master does NOT
◦ ◦ ◦ ◦ Handle any write requests (not a DB master!) Handle location finding requests Not involved in the read/write path Generally does very little most of the time
Distributed Coordination
Distributed Coordination
Zookeeper is used to manage master election and server availability Set up as a cluster, provides distributed coordination primitives An excellent tool for building cluster management systems
HBase Storage Architecture
HBase Storage Architecture
HBase Public Timeline
HBase Public Timeline
November 2006
◦ Google releases paper on Bigtable
February 2007 October 2007
◦ Initial HBase prototype created as Hadoop contrib ◦ First "useable" HBase (0.15.0 Hadoop)
December 2007
◦ First HBase User Group
January 2008
◦ Hadoop becomes TLP, HBase becomes subproject
October 2008
◦ HBase 0.18.1 released
January 2009
◦ HBase 0.19.0 released ◦ HBase 0.20.0 released
September 2009
HBase WorldLingo Timeline
HBase WorldLingo Timeline
HBase - Example
HBase - Example
Store web crawl data
◦ Table crawl with family content ◦ Row is URL with columns
content:data stores raw crawled data content:language stores http language header content:type stores http content-type header
◦ If processing raw data for hyperlinks and images, add families links and images
links: column for each hyperlink links: column for each image
HBase - Clients
HBase - Clients
Native Java client/API
◦ get(Get get) ◦ put(Put put)
Non-Java clients
◦ Thrift server (Ruby, C++, Erlang, etc.) ◦ REST server (Stargate)
TableInput/TableOutputFormat for MapReduce HBase shell (jruby)
Scaling HBase
Scaling HBase
Add more machines to scale
◦ Automatic rebalancing
Base model (BigTable) scales past 1000TB No inherent reason why Hbase couldn‘t
What to store in HBase
What to store in HBase
Maybe not your raw log data... ... but the results of processing it with Hadoop! By storing the refined version in HBase, can keep up with huge data demands and serve to your website
!HBase
!HBase
“NoSQL” Database!
◦ ◦ ◦ ◦ ◦ No joins No sophisticated query engine No transactions (sort of) No column typing No SQL, no ODBC/JDBC, etc. (but there is HBql now!)
Not a replacement for your RDBMS... Matching Impedance!
Why HBase?
Why HBase?
Datasets are reaching Petabytes Traditional databases are expensive to scale and difficult to distribute Commodity hardware is cheap and powerful (but HBase can make use of powerful machines too!) Need for random access and batch processing (which Hadoop does not offer)
Numbers
Numbers
Single reads are 1-10ms depending on disk seeks and caching Scans can return hundreds of rows in dozens of ms Serial read speeds
Multilingual Archive (cont.)
Multilingual Archive (cont.)
Planned already, implemented as customer project Scale:
◦ 500million documents ◦ Random Access ◦ “100%” Uptime
Technologies?
◦ Database ◦ Zip-Archives on file system, or Hadoop
Lucene Search Server
Lucene Search Server
43 fields indexed 166GB size Automated merging/warm-up/swap Looking into scalable solution
◦ ◦ ◦ ◦
Katta Hyper Estraier DLucene …
Sorting?
Multilingual Archive (cont.)
Multilingual Archive (cont.)
5 Tables Up to 5 column families XML Schemas Automated table schema updates Standard options tweaked over time
◦ Garbage Collection!
MemCached(b) layer
Layers
Layers
Firewall
Network
LWS
Web
App
Cache
Data
Director 1
Director n
Apache 1
Apache n
…
Tomcat 1
Tomcat n
Tomcat 1
Tomcat n
MemCach ed 1 HBase
MemCach ed n
Map/Reduce
Map/Reduce
Backup/Restore Index building Cache filling Mapping Updates Translation
HBase - Problems
HBase - Problems
Early versions (before HBase 0.19.0!)
◦ Data loss ◦ Migration nightmares ◦ Slow performance
Current version
◦ Read HBase Wiki!!!
Single point of failure (name node only!)
HBase - Notes
HBase - Notes
RTF M HBase Wiki, IRC Channel Personal Experience:
(ine)
◦ ◦ ◦ ◦ ◦ ◦ ◦
Max. file handles (32k+) Hadoop xceiver limits (NIO?) Redundant meta data (on name node) RAM (4GB+) Deployment strategy Garbage collection (use CMS, G1?) Maybe not mix batch and interactive?
Graphing
Graphing
Use supplied Ganglia context or JMX bridge to enable Nagios and Cacti JMXToolkit: swiss army knife for JMX enabled servers: http://github.com/ larsgeorge/jmxtoolkit
HBase - Roadmap
HBase - Roadmap
HBase 0.20.x “Performance”
◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦
New Key Format – KeyValue New File Format – Hfile New Block Cache – Concurrent LRU New Query and Result API New Scanners Zookeeper Integration – No SPOF in HBase New REST Interface Contrib
Transactional Tables Secondary Indexes Stargate
HBase - Roadmap (cont.)
HBase - Roadmap (cont.)
HBase 0.21.x “Advanced Concepts”
◦ ◦ ◦ ◦ ◦ Master Rewrite – More Zookeeper New RPC Protocol (Avro) Multi-DC Replication Intra Row Scanning Further optimizations on algorithms and data structures ◦ Discretionary Access Control ◦ Coprocessors
Questions?
Questions?
lars@worldlingo.com larsgeorge@apache.org lars@larsgeorge.com Blog: www.larsgeorge.com Twitter: larsgeorge
Email: