Presentation Going Reactive: Scalable, Highly Concurrent & Fault-Tolerant Systems

The skills of building Scalable, Highly Concurrent, Event-driven and Resilient Systems are becoming increasingly important in our new world of Cloud Computing, multi-core processors, Big Data and Real-Time Web. Unfortunately, many people are still doing it wrong; using the wrong tools, techniques, habits and ideas. In this talk we will look at what it means to 'Go Reactive' and discuss some of the most common (and some not so common but superior) practices; what works - what doesn't work - and why.

Speakers


PDF: slides.pdf

Slides

Going Reactive:

Going Reactive: Event-Driven, Scalable & Resilient Systems Jonas Bonér CTO Typesafe Twitter: @jboner

I will never use distributed objects again

I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again I will never use distributed objects again Lessons Learned through... Agony and Pain lots of Pain

New tools for a new era

New tools for a new era • The demands and expectations for applications have changed dramatically in recent years

New tools for a new era

New tools for a new era • The demands and expectations for applications have changed dramatically in recent years • We to need to build systems that: • react to events - Event-Driven

New tools for a new era

New tools for a new era • The demands and expectations for applications have changed dramatically in recent years • We to need to build systems that: • • react to events - Event-Driven react to load - Scalable

New tools for a new era

New tools for a new era • The demands and expectations for applications have changed dramatically in recent years • We to need to build systems that: • • • react to events - Event-Driven react to load - Scalable react to failure - Resilient

New tools for a new era

New tools for a new era • The demands and expectations for applications have changed dramatically in recent years • We to need to build systems that: • • • • react to events - Event-Driven react to load - Scalable react to failure - Resilient react through a rich and engaging UI – Interactive Reactive Applications

The four traits of Reactive

The four traits of Reactive Interactive Scalable Resilient Event-Driven

So how can we get there?

So how can we get there? • It’s All Trade-offs • Go Event-Driven • Go Resilient • Go Scalable • Go Interactive

Performance

Performance vs Scalability

Latency

Latency vs Throughput

Availability

Availability vs Consistency

Go Event-Driven

Go Event-Driven

Shared mutable state

Shared mutable state

Shared mutable state

Shared mutable state Together with threads...

Shared mutable state

Shared mutable state Together with threads... ...code that is totally INDETERMINISTIC ...leads to ...and the root of all EVIL

Shared mutable state

Shared mutable state Together with threads... ...code that is totally INDETERMINISTIC ...leads to ...and the root of all EVIL Please, avoid it at all cost

Shared mutable state

Shared mutable state Together with threads... LE AB UT M IM EVIL se U id up st te ta it at all cost savoid Please, ...code that is totally INDETERMINISTIC ...leads to ...and the root of all

The problem with locks

The problem with locks • Locks do not compose • Locks break encapsulation • Taking too few locks • Taking too many locks • Taking the wrong locks • Taking locks in the wrong order • Error recovery is hard

1. Never block

1. Never block • ...unless you really have to • Blocking kills scalability (& performance) • Never sit on resources you don’t use • Use non-blocking IO • Use lock-free concurrency

2. Go Async

2. Go Async Design for reactive event-driven systems • • • Use asynchronous event/message passing Think in workflow, how the events flow in the system Gives you 1. lower latency 2. better throughput 3. a more loosely coupled architecture, easier to extend, evolve & maintain

Needs to be Event-Driven

Needs to be Event-Driven all the way down

Traditional vs Non-blocking

Traditional vs Non-blocking def getTweets = Action { Ok(WS.get("http://twitter.com/")) } Client' blocking' Server' blocking' Service'

Traditional vs Non-blocking

Traditional vs Non-blocking def getTweets = Action { Ok(WS.get("http://twitter.com/")) } Client' blocking' Server' blocking' Service' non0blocking' Service' def getTweets = Action { Async { Ok(WS.get("http://twitter.com/")) }} Client' non0blocking' Server'

You deserve better (and more fun) tools

You deserve better (and more fun) tools • Actors • Agents • Futures • FRP/RX

Actors

Actors •Share NOTHING •Isolated lightweight event-based processes •Each actor has a mailbox (message queue) •Communicates through asynchronous & non-blocking message passing •Location transparent (distributable) •Examples: Akka & Erlang

Agents

Agents • Reactive memory cells • Send a update function to the Agent, which 1. adds it to an (ordered) queue, to be 2. applied to the Agent async & non-blocking • Reads are “free”, just dereferences the Ref • Composes nicely • Examples: Clojure & Akka

Futures

Futures • Allows you to spawn concurrent computations and work with the not yet computed results • Write-once, Read-many • Freely sharable • Allows non-blocking composition • Monadic (composes in for-comprehensions) • Build in model for managing failure

Functional Reactive Programming (FRP)

Functional Reactive Programming (FRP) • Extend Futures with the concept of a stream • Functional variation of the observer pattern • A signal attached to a stream of events • The signal is reevaluated for each event • Model events on a linear timeline - deterministic • Composes nicely • Examples: Rx, Reactive.js, RxJava, Scala.Rx, Knockout.js

Work with layers in complexity

Work with layers in complexity

Work with layers in complexity

Work with layers in complexity 1. Start with a Deterministic, Declarative & Immutable core

Work with layers in complexity

Work with layers in complexity 1. Start with a Deterministic, Declarative & Immutable core • Logic or Functional Programming

Work with layers in complexity

Work with layers in complexity 1. Start with a Deterministic, Declarative & Immutable core • • Logic or Functional Programming Futures, FRP or Dataflow

Work with layers in complexity

Work with layers in complexity 1. Start with a Deterministic, Declarative & Immutable core • • Logic or Functional Programming Futures, FRP or Dataflow 2. Add Indeterminism selectively - only where needed

Work with layers in complexity

Work with layers in complexity 1. Start with a Deterministic, Declarative & Immutable core • • Logic or Functional Programming Futures, FRP or Dataflow 2. Add Indeterminism selectively - only where needed • Actor or Agent-based Programming

Work with layers in complexity

Work with layers in complexity 1. Start with a Deterministic, Declarative & Immutable core • • Logic or Functional Programming Futures, FRP or Dataflow 2. Add Indeterminism selectively - only where needed • Actor or Agent-based Programming 3. Add Shared Mutability selectively - only where needed

Work with layers in complexity

Work with layers in complexity 1. Start with a Deterministic, Declarative & Immutable core • • Logic or Functional Programming Futures, FRP or Dataflow 2. Add Indeterminism selectively - only where needed • Actor or Agent-based Programming 3. Add Shared Mutability selectively - only where needed • Protected by Transactions (STM)

Work with layers in complexity

Work with layers in complexity 1. Start with a Deterministic, Declarative & Immutable core • • Logic or Functional Programming Futures, FRP or Dataflow 2. Add Indeterminism selectively - only where needed • Actor or Agent-based Programming 3. Add Shared Mutability selectively - only where needed • Protected by Transactions (STM) 4. Finally - only if really needed • Add Monitors (Locks) and explicit Threads

Go Resilient

Go Resilient

Failure Recovery in Java/C/C# etc.

Failure Recovery in Java/C/C# etc.

Failure Recovery in Java/C/C# etc.

Failure Recovery in Java/C/C# etc. • You are given a SINGLE thread of control

Failure Recovery in Java/C/C# etc.

Failure Recovery in Java/C/C# etc. • You are given a SINGLE thread of control • If this thread blows up you are screwed

Failure Recovery in Java/C/C# etc.

Failure Recovery in Java/C/C# etc. • You are given a SINGLE thread of control • If this thread blows up you are screwed • So you need to do all explicit error handling WITHIN this single thread • To make things worse - errors do not propagate between threads so there is NO WAY OF EVEN FINDING OUT that something have failed

Failure Recovery in Java/C/C# etc.

Failure Recovery in Java/C/C# etc. • You are given a SINGLE thread of control • If this thread blows up you are screwed • So you need to do all explicit error handling WITHIN this single thread • To make things worse - errors do not propagate between threads so there is NO WAY OF EVEN FINDING OUT that something have failed • This leads to DEFENSIVE programming with:

Failure Recovery in Java/C/C# etc.

Failure Recovery in Java/C/C# etc. • You are given a SINGLE thread of control • If this thread blows up you are screwed • So you need to do all explicit error handling WITHIN this single thread • To make things worse - errors do not propagate between threads so there is NO WAY OF EVEN FINDING OUT that something have failed • This leads to DEFENSIVE programming with: • Error handling TANGLED with business logic

Failure Recovery in Java/C/C# etc.

Failure Recovery in Java/C/C# etc. • You are given a SINGLE thread of control • If this thread blows up you are screwed • So you need to do all explicit error handling WITHIN this single thread • To make things worse - errors do not propagate between threads so there is NO WAY OF EVEN FINDING OUT that something have failed • This leads to DEFENSIVE programming with: • Error handling TANGLED with business logic • SCATTERED all over the code base

Failure Recovery in Java/C/C# etc.

Failure Recovery in Java/C/C# etc. • You are given a SINGLE thread of control • If this thread blows up you are screwed • So you need to do all explicit error handling WITHIN this single thread • To make things worse - errors do not propagate between threads so there is NO WAY OF EVEN FINDING OUT that something have failed • This leads to DEFENSIVE programming with: • Error handling TANGLED with business logic • SCATTERED all over the code base o d n !! a ! c r e te W et b

The Right Way

The Right Way

The Right Way

The Right Way • Isolate the failure • Compartmentalize • Manage failure locally • Avoid cascading failures Use Bulkheads

...together with supervision

...together with supervision

...together with supervision

...together with supervision 1. Use Isolated lightweight processes (compartments) 2. Supervise these processes 1. Each process has a supervising parent process 2. Errors are reified and sent as (async) events to the supervisor 3. Supervisor manages the failure - can kill, restart, suspend/resume • Same semantics local as remote • Full decoupling between business logic & error handling • Build into the Actor model

Go Scalable

Go Scalable

Performance

Performance vs Scalability

How do I know if I have a

How do I know if I have a performance problem? If your system is slow for a single user

How do I know if I have a

How do I know if I have a scalability problem?

How do I know if I have a

How do I know if I have a scalability problem? If your system is fast for a single user but slow under heavy load

Fallacy 1: Transparent Distributed Computing

Fallacy 1: Transparent Distributed Computing • Distributed Shared Mutable State • N EVIL (where N is number of nodes) • Distributed Objects • “Sucks like an inverted hurricane” - Martin Fowler • Distributed Transactions • Good reading: • A Note On Distributed Computing - Waldo et. al. • Six Misconceptions about Reliable Distributed Computing - Werner Vogels

Fallacy 2: RPC

Fallacy 2: RPC • Emulating synchronous method dispatch across the network is a BAD THING • Ignores: • Latency • Partial failures • General scalability and distributed computing concerns • Good reading: • Convenience over Correctness - Steve Vinoski

Instead

Instead Embrace the Network Use Asynchronous Message Passing e n a d n e b do w th i it

Guaranteed Delivery

Guaranteed Delivery Delivery Semantics • No guarantees • At most once • At least once • Once and only once

It’s all lies.

It’s all lies.

The network is inherently unreliable

The network is inherently unreliable and there is no such thing as 100% guaranteed delivery It’s all lies.

Guaranteed Delivery

Guaranteed Delivery The question is what to guarantee

Guaranteed Delivery

Guaranteed Delivery The question is what to guarantee 1. The message is - sent out on the network?

Guaranteed Delivery

Guaranteed Delivery The question is what to guarantee 1. The message is - sent out on the network? 2. The message is - received by the receiver host’s NIC? 3. The message is - put on the receiver’s queue?

Guaranteed Delivery

Guaranteed Delivery The question is what to guarantee 1. The message is - sent out on the network? 2. The message is - received by the receiver host’s NIC? 3. The message is - put on the receiver’s queue? 4. The message is - applied to the receiver? 5. The message is - starting to be processed by the receiver?

Guaranteed Delivery

Guaranteed Delivery The question is what to guarantee 1. The message is - sent out on the network? 2. The message is - received by the receiver host’s NIC? 3. The message is - put on the receiver’s queue? 4. The message is - applied to the receiver? 5. The message is - starting to be processed by the receiver? 6. The message is - has completed processing by the receiver?

Ok, then what to do?

Ok, then what to do? 1. Start with 0 guarantees (0 additional cost) 2. Add the guarantees you need - one by one

Ok, then what to do?

Ok, then what to do? 1. Start with 0 guarantees (0 additional cost) 2. Add the guarantees you need - one by one Different USE-CASES Different GUARANTEES Different COSTS

Ok, then what to do?

Ok, then what to do? 1. Start with 0 guarantees (0 additional cost) 2. Add the guarantees you need - one by one Different USE-CASES Different GUARANTEES Different COSTS For each additional guarantee you add you will either: • decrease performance, throughput or scalability • increase latency

Just

Just Use ACKing and be done with it

Use Batching

Use Batching http://www.aosabook.org/en/zeromq.html

Use Batching

Use Batching http://www.aosabook.org/en/zeromq.html

Use Batching

Use Batching In a JVM-based application it is more like: 1. Application 2. NIO 3. JVM 4. User/Kernel space boundary 5. TCP 6. IP 7. Ethernet layer 8. NIC

Latency

Latency vs Throughput

You should strive for

You should strive for maximal throughput with acceptable latency

Go Reactive

Go Reactive Scalable Resilient Event-Driven

Go Reactive

Go Reactive Interactive Scalable Resilient Event-Driven

Thank You

Thank You Contact me Email: jonas@typesafe.com Web: typesafe.com Twitter: @jboner