Stateful or Stateless or Serverless?

Processing data can take many forms and shapes, but we can broadly distinguish 2 types: Stateful and Stateless. So what about that other -less buzzword everyone keeps mentioning: Serverless, or sometimes even computerless. What is that all about, and how do these concepts relate? And what is databaseless?

Let’s dive into each one of these concepts and see how they are related.

Stateless processing

When we talk about stateless processing, we mean the processing of data that doesn’t need anything else besides the input.
In other words, the processing happens entirely on the input-data.
Some examples:

    • converting a text-document into a pdf
    • calculating the total price of a list of items
    • converting data from one format into another (encoding, decoding, encrypting, decrypting, …)
    • analyzing a picture to detect what is on it

These examples have in common that the outcome of the processing is defined only by the input: No matter how many times you convert the same text-file into a pdf, the outcome will always be the same. The same holds for calculating the total price of a list of items. As long as it is the same list of the same items, the outcome will predicably be the same.

We call these types of calculations side effects-free. Calculations without side-effects are called pure functions.

Pure functions have a number of interesting characteristics. For example, they are very easy to scale horizontally: converting 50 files to pdf can easily be done in parallel. Afterall, the converting of 1 file cannot interfere with the result of another conversions.

Another characteristic of stateless processing is that it is idempotent. This makes that things can easily be retried if something goes wrong, without risking the consistency of the state (because there is no state).

Stateful processing

Stateful processing, in contrast to stateless,  is whenever the processing of some data needs other external data. What we mean by external is anything that is not contained within the data that is being processed.
That is still a bit vague, so here are a few examples to make things clear:

    • a user adds an item to the shopping-card of a web-shop. The ‘data’ here is the item that is added, the ‘external state’ is the shopping cart. There are already some items in the shopping-cart, and the user wants to add 1 additional item. The details of the user is also ‘external state’. Those are for example details about the shipping of the order and payment preferences.
      If the application that holds this state becomes unavailable during the shopping, the user loses all the items that were already put in the shopping-cart. The processing (adding the item to the cart) is therefore an example of ‘stateful processing’: the content of the shopping cart is relevant to correctly execute the ‘add item’ process.
    • counting the number of clicks on an item of a website: this means adding 1 to whatever the previous number was. So, in this case, the external state is ’the previous number of clicks’
    • keeping track of which episodes you have seen of a show on a videostreaming platform
    • a visitor gives a rating to a restaurant he/she recently visited

What these examples have in common is that the processing involves manipulating some preexistent state, based on some input. This is called a side-effect. The result is defined by the combination of the preexisting state and the input.

There are a few consequences to having side-effects. For example, those actions are not idempotent:

    • Adding the same item to the shopping cart again, will result if having it twice in the cart.
    • Blindly processing the rating of a resaurants visitor twice will lead to inaccurate statistics.

Another consequence is that stateful processing comes with the notion of consistency: The state that is being manipulated must be protected against corruption: when 2 users are rating the same restaurant simultaniously, the application should behave correctly.

Most applications typically use a mix of stateful and stateless processing.

Serverless processing

Now that we cleared the differences between stateful and stateless, what is serverless?

Wikipedia has a definition for it, but what does it now actually mean from practical point of view?

Serverless is all about the deployment of an application. Even though the name suggests it, it obviously does not mean there is no computer (or server) involved.

It is just that, from a developers perspective, you’re not aware on which server or how many of them are used to run your application. Frameworks that leverage this, allow developers to not worry about how exactly and where the application gets deployed, and how many instances of it. For example, Spring-Cloud-function provides a framework that allows to develop functions that are automatically instantiated, when needed.

@SpringBootApplication
public class Application {

  @Bean
  public Function<Flux<String>, Flux<String>> uppercase() {
	return flux -> flux.map(value -> value.toUpperCase());
  }

  public static void main(String[] args) {
	SpringApplication.run(Application.class, args);
  }
}
Stateless calculations are a very good match for serverless processing

In this example (which is a complete application), the developer only needs to provide a function which accepts a String, and returns a String. All the processing (converting the input text to upper case) happens within the context of that function. So yes, it is a pure function.

Being a pure function, this application can theoretically scale out indefinitely, and if there is no data to be processed, it can scale down to no instances at all. Not only in theory, in practice too: Spring cloud takes care of all the specifics to make this work on Amazon LambdaApache OpenWhiskAzure Functions, and Project Riff.

This is just 1 example, but other serverless frameworks do conceptually the same.

As you can see, there are definitely some perks to serverless computing. Stateless calculations are clearly a very good match for it. Stateless calculations can blindly be scaled horizontally in a serverless environment: if there is more demand for some calculation than 1 instance can handle in timely fashion, multiple instances can be spun up, and the load can be evenly distributed across them. There is no additional logic or complexity (other then basic load balancing, which is taken care of by the cloud-provider) to consider. The more hardware resources you have, the more instances you can run.

Indeed, this is a very powerful and convenient solution.

Unless… you have to deal with state.

Stateful applications at scale

Serverless computation only solves half the problem of creating an elastic application

If you have an application which needs to be elastic, you might be tempted to look at serverless computing. Why wouldn’t you, it makes scaling an application easy?

But what if your application needs stateful processing? Can you still go for serverless? Can you still easily increase the number of instances of your application? Things become a bit more complex.

What if you have to include external state into a serverless application? There are 2 ways to approach it.

    1. connect to the external state (database) when the application starts. When the data starts pouring in, the processing can make use of the connection that is already setup.
    2. connect to the database when data arrives in the application

 

Both approaches have pros and cons, but in both cases you end up with something like this when you spin up multiple instances of the application.

This might look fine at first, but there are a few sharp edges to it.

    1. Setting up the connection at startup or when processing a message will put a brake on the throughput of data
    2. Every instance has its own connection to the database. At some point, the maximum number of connections of the database will be reached. Measures must be taken to prevent that the database gets overwhelmed by just too many connections. This puts an upper limit to the elasticity.
    3. what happens if 2 instances of the application are competing for the same piece of data? To prevent inconsistency, there must be some sort of transactional measurements. Locking of data to guarantee consistency reduces the ability to scale.

It boils down to the database ultimately limiting factor in the elasticity of the application.

Serverless computation only solves half the problem of creating an elastic application: it allows to easily scale the ‘computational’ part of an application. It does not solve the problems you’ll encounter when trying to scale the state of such an application. Those problems you still need to solve yourself.

For some applications, having a single database is just fine. Those are typically not the applications that would benefit greatly from a serverless approach anyway.

But for an increasing number of applications, it is important to consider the limits of an architecture. Because one day that limit will be reached and then it becomes a massive problem.

Off course there are ways of sharding and distributing the data across multiple (database) systems, but albeit possible and not uncommon, those are complex (and by consequence expensive) to setup and maintain. And they will always have an upper limit when it comes to elasticity (but that limit could be quite high)

Databaseless applications

What if you don’t like compromises? Your application needs to be elastic and stateful, and it needs to be straightforward to develop and scale, in the way serverless frameworks are providing it.

Remember the code-example of Spring-cloud-function where you had to develop a function that converts one string into another? Well, AkkaServerless allows the same way of working, but then with state.

This is how, with AkkaServerless, how you convert the current state of the application into the new state of the application, based on an event. The code looks like this:

public ApplicationState eventHandler(ApplicationState currentState, Event event) {
  ApplicationState newState = //calculate the new state based on the current state and the event
  return newState;
}
Akka-serverless provides elastic database storage in the same way serverless frameworks provides elastic servers

Looking at it like this, we can conclude that Akka-serverless provides elastic database storage in the same way serverless frameworks provides elastic servers. This makes it a very good match for statefull processing.

The current state of the application is ‘injected’ into the function, together with the event. The framework takes care of all the plumbing:

    • setting up the database, including the schemas, connecting to it
    • serializing/deserializing from and to the database when reading/writing to it
    • making sure the event and the current state are correctly ‘wired’ into the function
    • taking care of storing the state when the function is completed successfully.

 

All of this in a way that is truly elastic.

There is a bit more to do than just this small code-snippet, but this at least illustrates what is meant with ‘stateful serverless’. A more in-depth explanation of the framework can be found in a previous blog.