PLANET: Predictive Latency-Aware NEtworked Transactions

A new programming model for commit processing in unpredictable database environments.

 

PLANET Overview

New SLO-aware transaction programming model

Recent database trends such as multi-tenancy and cloud databases have contributed to higher variance and latency for transaction response times. When databases are geo-replicated across several data centers, the situation is worse. For example, the graph below shows the higher latencies and variance for RPCs between several pairs of regions on Amazon EC2. When developing for applications in such unpredictable environments, there are currently only two options: either wait (possibly a long time) for the transaction outcomes, or timeout early and be uncertain of the outcomes. Both of these options are not ideal, so, we introduce PLANET, Predictive Latency-Aware NEtworked Transactions. PLANET is a new transaction programming model to help developers by exposing details of the transaction state, supporting callbacks, implementing a commit likelihood model, and enabling optimizations using the commit likelihoods.

Round-trip response times of RPCs between various data centers on Amazon EC2.

Round-trip response times of RPCs between various data centers on Amazon EC2.

PLANET is a Service-level-objective (SLO) aware transaction programming model which exposes more details of the stages of the transaction. PLANET also provides the developer more information and flexibility to handle unpredictable network latencies, and exposes transaction commit likelihoods to further improve resource utilization and latency.

PLANET is a general transaction programming model which can be applicable to many different types of systems and environments. However, we implemented PLANET on our transactional, geo-replicated database, MDCC (Multi-Data Center Consistency).

For those interested in more details of PLANET, you can read our PLANET paper, which will appear in SIGMOD 2014.

 

PLANET Transaction Programming Model

Making progress with commit processing

 

The PLANET transaction programming model helps the developer handle longer and high-variance network latencies by making the service level objective (SLO) in the form of timeouts explicit, exposing more stages of the transaction, and computing transaction commit likelihoods. Requiring an SLO timeout forces the developer to consider the acceptable response times for each transaction. Also, because the programming model exposes different stages of the transaction, the developer can make a more informed decision and react intelligently to sudden and unpredictable latency spikes between data centers. Below is an example of the programming model in the Scala programming language.

val t = new Tx(300) ({ // 300 ms timeout
  // Transaction operations, with get(), put(), or SQL queries
}).onFailure(txInfo => {
  // Show error message
}).onAccept(txInfo => {
  // Show pending status page
}).onComplete(90%)((txInfo => {
  if (txInfo.state == COMMITTED ||
      txInfo.state == SPEC_COMMITTED) {
    // Show success page
  } else {
    // Show order not successful page
  }
}).finallyCallback(txInfo => {
  if (!txInfo.timedOut) // Update via AJAX
}).finallyCallbackRemote(txInfo => {
  // Email user the completed status
})
val status = t.Execute() 

The programming model guarantees to return execution back to the application within the specified timeout. This enables the developer to create applications with predictable response times. When execution does return to the application, the transaction will be in one of three stages: onFailure, onAccept, or onComplete. The transaction will run the code block for the latest stage reached within the specified timeout.

onFailure
Sometimes failures are unavoidable, so if nothing is known about the transaction, this code block will be run.
onAccept
This code block is run when the database is still executing the transaction, so the final status of the transaction is still unknown. The system guarantees to finish the transaction at some point in the future.
onComplete
This code block is run after the transaction fully completes and the final status is known.

Both onAccept and onComplete do not need to be defined. If only onAccept is defined, then the transaction does not need to complete the transaction before returning control back to the application. This can reduce the latency of the transaction because the commit status is not required. This can achieve similar response times and semantics as eventually consistent systems. If only onComplete is defined, then the transaction will wait until the commit status is known. This is useful for situations when the commit status is required and important.

In addition to the 3 stages, the programming model also allows the developer to define two callbacks: finallyCallback and finallyRemote. These callbacks are asynchronously executed after the transaction completes and the final commit status is known.

finallyCallback
This callback is executed after the tranasction completes, on the current application server, at most once. If the application server fails before the transaction completes, then the callback will not be able to run.
finallyCallbackRemote
This is like finally, but the closure is transferred to a remote machine, so it will be executed at least once.

Speculative Commits

The onComplete allows the developer to provide a parameter for the transaction commit likelihood. This enables the developer to advance without waiting for the transaction to fully complete. This ability for applications to advance before knowing the true, final outcome of the transaction is called speculative commits. The developer can enable speculative commits by specifying the optional parameter to onComplete(P). If the developer defines onComplete(90%), then the onComplete block will be run as soon as the continually computed commit likelihood of the transaction is at least 90%. On our implementation on MDCC, PLANET uses simple latency and row access rate statistics to compute the commit likelihood of transactions. As long as there is a way to calculate the commit likelihood of transactions, speculative commits can be supported on a variety of systems.

Since speculative commits allow the application to move on without waiting for the transaction to fully complete, the prediction may be wrong sometimes. However, discovering the final outcome is simple. The finallyCallbacks will always have the true outcome of the transaction, so the application can always determine the final status.

Commit Likelihood Model

PLANET uses commit likelihood computations to enable optimizations as well as allow developers access to the predictions. The commit likelihood calculation depends on the underlying system, but simple models and statistics can be effective. With MDCC, PLANET can compute an initial commit likelihood with local statistics, and continually update the commit likelihood as more details on the stage of the transaction progress is known. In addition to enabling speculative commits, PLANET can also use the commit likelihoods for admission control, to efficiently use system resources for transactions more likely to succeed.

Interactive Example

The interactive example below demonstrates how the different stages work with different scenarios with transactions and timeouts. The left side shows the different transaction stage blocks, and the right side shows the timeline of a transaction. The arrows from the timeline to the stage blocks show which stage blocks are executed, and when they are executed. You can click on the stage blocks in order to enable or disable some of them. You can also drag the timeout to visualize how the timeout affects the execution.

PLANET Transaction Programming Model Use Cases

Below are some examples of possible uses cases of the PLANET transaction programming model.

Amazon.com Web Shop

Purchasing items from a web shop. If the transaction completes within 300ms, the user is immediately informed of the commit. Otherwise, the user will see a "Thank you" message, and when the transaction finally completes, the user will receive notification and email.

Toggle Code

val t = new Tx(300) ({
  var order = new Order(cust.key, date)
  orders.put(order)
  var product1 = products.get("Product1")
  var orderline1 = new OrderLine(product1.id, 2)
  orderlines.put(orderline1)
  product1.stock -= 2
  products.put(product1)
}).onFailure(txInfo => {
  // Show page: Error message
}).onAccept(txInfo => {
  // Show page: Thanks for your order!
}).onComplete((txInfo => {
  if (txInfo.state == COMMITTED) // Show page: Success confirmation
  else // Show page: Order not successful
}).finallyCallback(txInfo => {
  if (!txInfo.timedOut) // Update page via AJAX
}).finallyCallbackRemote(txInfo => {
  // Email user the status
})

Twitter

Posting a tweet. Conflicts are impossible because tweets are append only, so waiting for the onAccept stage is enough. Only waiting for the onAccept will greatly reduce response times of the transactions.

Toggle Code

val t = new Tx(200) ({
  tweets.put(user.id, tweetText)
}).onFailure(txInfo => {
  // Show page: Error message
}).onAccept(txInfo => {
  // Show page: Accepted tweet
}

Ebay Auction System

Submitting a bid for an auction.

Toggle Code

// submit an auction bid
val t = new Tx(300) ({
  var bid = new Bid(prod_id, user_id, price)
}).onFailure(txInfo => {
  // Show page: Error message
}).onAccept(txInfo => {
  // Show page: Bid was placed, please wait for final results.
}).onComplete((txInfo => {
  if (txInfo.state == COMMITTED) // Show page: Winning bid so far
  else // Show page: Bid not high enough
}).finallyCallback(txInfo => {
  if (!txInfo.timedOut) // Update page via AJAX
}).finallyCallbackRemote(txInfo => {
  // Email user the results of bid
})

Reserving Tickets to an Event

Purchasing a ticket for a general admission event. This is similar to the web shop example.

Toggle Code

// purchase a ticket
val t = new Tx(300) ({
  var ticket = new Ticket(event.id, user.id)
  event.tickets_remaining -= 1
}).onFailure(txInfo => {
  // Show page: Error message
}).onAccept(txInfo => {
  // Show page: Order was placed, will be processed shortly
}).onComplete((txInfo => {
  if (txInfo.state == COMMITTED) // Show page: Order placed successfully
  else // Show page: Sold out
}).finallyCallback(txInfo => {
  if (!txInfo.timedOut) // Update page via AJAX
}).finallyCallbackRemote(txInfo => {
  // Email user the ticket confirmation
})

Bank Transactions

Withdrawing money from an ATM. The onAccept stage does not make sense in this situation, so the transaction only waits for the onComplete.

Toggle Code

// ATM withdraw money
val t = new Tx(30000) ({
  var account = Accounts.get(123456)
  account.balance -= 100
}).onFailure(txInfo => {
  // Error message
}).onComplete((txInfo => {
  if (txInfo.state == COMMITTED) // Give out money
  else // Not enough balance
}).finallyCallbackRemote(txInfo => {
  if (txInfo.state == COMMITTED && txInfo.timedOut)
    // Inform bank personal of failure
})

Booking Flights

Reserving seats on a flight. If the requested seats are not available, the transaction can be retried with new seat numbers.

Toggle Code

// Book seats
val t = new Tx(500) ({
  flight.reserve(seatNum1, passenger1);
  flight.reserve(seatNum2, passenger2);
}).onFailure(txInfo => {
  // Show page: Error message
}).onAccept(txInfo => {
  // Show page: Order was submitted, will be processed shortly
}).onComplete((txInfo => {
  if (txInfo.state == COMMITTED) // Show page: Ticket/seat Confirmation
  else // Show page: Seats not available
}).finallyCallbackRemote(txInfo => {
  if (txInfo.state == COMMITTED)
    // Email user the ticket confirmation
  else if (txInfo.timedOut)
    // Choose different seats and retry transaction
})

Booking Hotels

Reserving hotel rooms.

Toggle Code

// book room
val t = new Tx(500) ({
  Rooms.reserve(numBeds, userEmail)
}).onFailure(txInfo => {
  // Show page: Error message
}).onAccept(txInfo => {
  // Show page: Order was submitted, will be processed shortly
}).onComplete((txInfo => {
  if (txInfo.state == COMMITTED) // Show page: Room Confirmation
  else // Show page: Room not available
}).finallyCallbackRemote(txInfo => {
  if (txInfo.state == COMMITTED)
    // Email user the ticket confirmation
  else if (txInfo.timedOut)
    // Choose larger room type and retry transaction
})

Google Docs

There are multiple writers, but the typing does not require the commit status before displaying to the user.

Toggle Code

// When typing
val t = new Tx(50) ({ 
  doc.update(typingDiff)
}).onFailure(txInfo => {
  // Error contacting server
}).onAccept(txInfo => {
  // Display typed updates
}).onComplete((txInfo => {
  if (txInfo.state == COMMITTED) // Display typed updates
  else // Display correct/new updates
}).finallyCallbackRemote(txInfo => {
  // Display correct/new updates
})

Email

Sending an email can be done asynchronously, so the onAccept stage is good enough for most cases. The commit can be notified to the user asynchronously.

Toggle Code

// send email
val t = new Tx(100) ({
  var newEmail = new Email(from, subject, body)
  newEmail.send(to)
}).onFailure(txInfo => {
  // Show page: Error message
}).onAccept(txInfo => {
  // Display "Sending email..."
}).onComplete((txInfo => {
  if (txInfo.state == COMMITTED) // Show page: Email sent
  else // Show page: Email could not be sent
}).finallyCallback(txInfo => {
  // Display if the email was sent successfully through AJAX
})
 

Additional Information

Or questions?

 

PLANET was developed in the AMPLab at UC Berkeley by Gene Pang, Tim Kraska, Mike Franklin, and Alan Fekete. Read our PLANET paper, which will appear in SIGMOD 2014, if you are interested in more details.

PLANET was implemented on our MDCC system. Please visit the MDCC page for more information.

If you have any comments or questions, feel free to email us at: gpang@cs.berkeley.edu, kraska@cs.berkeley.edu.