Raw performance numbers - Spring Boot 2 Webflux vs Spring Boot 1

Summary

Spring Boot 2 with Spring Webflux based application outperforms a Spring Boot 1 based application by a huge margin for IO heavy workloads. The following is a summarized result of a load test - Response time for a IO heavy transaction with varying concurrent users:

Image may be NSFW.
Clik here to view.

When the number of concurrent users remains low (say less than 1000) both Spring Boot 1 and Spring Boot 2 handle the load well and the 95 percentile response time remains milliseconds above a expected value of 300 ms.

At higher concurrency levels, the Async Non-Blocking IO and reactive support in Spring Boot 2 starts showing its colors - the 95th percentile time even with a very heavy load of 5000 users remains at around 312ms! Spring Boot 1 records a lot of failures and high response times at these concurrency levels.

Details

My set-up for the performance test is the following:

Image may be NSFW.
Clik here to view.

The sample applications expose an endpoint(/passthrough/message) which in-turn calls a downstream service. The request message to the endpoint looks something like this:

{
"id": "1",
"payload": "sample payload",
"delay": 3000
}

The downstream service would delay based on the "delay" attribute in the message (in milliseconds).

Spring Boot 1 Application

I have used Spring Boot 1.5.8.RELEASE for the Boot 1 version of the application. The endpoint is a simple Spring MVC controller which in turn uses Spring's RestTemplate to make the downstream call. Everything is synchronous and blocking and I have used the default embedded Tomcat container as the runtime. This is the raw code for the downstream call:

public MessageAck handlePassthrough(Message message) {
    ResponseEntity<MessageAck> responseEntity = this.restTemplate.postForEntity(targetHost 
                                                            + "/messages", message, MessageAck.class);
    return responseEntity.getBody();
}

Spring Boot 2 Application

Spring Boot 2 version of the application exposes a Spring Webflux based endpoint and uses WebClient, the new non-blocking, reactive alternate to RestTemplate to make the downstream call - I have also used Kotlin for the implementation, which has no bearing on the performance. The runtime server is Netty:

import org.springframework.http.HttpHeaders
import org.springframework.http.MediaType
import org.springframework.web.reactive.function.BodyInserters.fromObject
import org.springframework.web.reactive.function.client.ClientResponse
import org.springframework.web.reactive.function.client.WebClient
import org.springframework.web.reactive.function.client.bodyToMono
import org.springframework.web.reactive.function.server.ServerRequest
import org.springframework.web.reactive.function.server.ServerResponse
import org.springframework.web.reactive.function.server.bodyToMono
import reactor.core.publisher.Mono

class PassThroughHandler(private val webClient: WebClient) {

    fun handle(serverRequest: ServerRequest): Mono<ServerResponse> {
        val messageMono = serverRequest.bodyToMono<Message>()

        return messageMono.flatMap { message ->
            passThrough(message)
                    .flatMap { messageAck ->
                        ServerResponse.ok().body(fromObject(messageAck))
                    }
        }
    }

    fun passThrough(message: Message): Mono<MessageAck> {
        return webClient.post()
                .uri("/messages")
                .header(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
                .header(HttpHeaders.ACCEPT, MediaType.APPLICATION_JSON_VALUE)
                .body(fromObject<Message>(message))
                .exchange()
                .flatMap { response: ClientResponse ->
                    response.bodyToMono<MessageAck>()
                }
    }
}

Details of the Perfomance Test

The test is simple, for different sets of concurrent users (300, 1000, 1500, 3000, 5000), I send a message with the delay attribute set to 300 ms, each user repeats the scenario 30 times with a delay between 1 to 2 seconds between requests. I am using the excellent Gatling tool to generate this load.