Spring-Boot 2.1.x and overriding bean definition

January 21, 2019, 4:01 pm

≫ Next: Water Pouring Problem with Kotlin and Vavr

≪ Previous: Unit testing DynamoDB applications using JUnit5

I was recently migrating an application from Spring Boot 1.5.X to Spring Boot 2.X and saw an issue with overriding Spring Bean definitions. One of the configurations was along these lines in Kotlin:

@Configuration
class DynamoConfig {

    @Bean
    fun dynamoDbAsyncClient(dynamoProperties: DynamoProperties): DynamoDbAsyncClient {
        ...
    }

    @Bean
    fun dynampoDbSyncClient(dynamoProperties: DynamoProperties): DynamoDbClient {
        ...
    }
}

Now, for a test I wanted to override these 2 bean definitions and did something along these lines:

@SpringBootTest
class DynamoConfigTest {

    @Test
    fun saveHotel() {
        val hotelRepo = DynamoHotelRepo(localDynamoExtension.asyncClient!!)
        val hotel = Hotel(id = "1", name = "test hotel", address = "test address", state = "OR", zip = "zip")
        val resp = hotelRepo.saveHotel(hotel)

        StepVerifier.create(resp)
                .expectNext(hotel)
                .expectComplete()
                .verify()
    }


    @TestConfiguration
    class SpringConfig {
        @Bean
        fun dynamoDbAsyncClient(dynamoProperties: DynamoProperties): DynamoDbAsyncClient {
            ...
        }

        @Bean
        fun dynamoDbSyncClient(dynamoProperties: DynamoProperties): DynamoDbClient {
            ...
        }
    }
}

This type of overriding works with Spring Boot 1.5.X but fails with Spring Boot 2.1.X with an error:

Invalid bean definition with name 'dynamoDbAsyncClient' defined in sample.dyn.repo.DynamoConfigTest$SpringConfig:.. 
There is already .. defined in class path resource [sample/dyn/config/DynamoConfig.class]] bound

I feel this behavior is right, not allowing beans to overridden this way is the correct default behavior for an application, however I do want the ability to override the beans for tests and thanks to a Stack Overflow answer and Spring Boot 2.1.X release notes, the fix is to allow overrides using a property "spring.main.allow-bean-definition-overriding=true", so with this change, the test looks like this:

@SpringBootTest(properties = ["spring.main.allow-bean-definition-overriding=true"])
class DynamoConfigTest {

    @Test
    fun saveHotel() {
        val hotelRepo = DynamoHotelRepo(localDynamoExtension.asyncClient!!)
        val hotel = Hotel(id = "1", name = "test hotel", address = "test address", state = "OR", zip = "zip")
        val resp = hotelRepo.saveHotel(hotel)

        StepVerifier.create(resp)
                .expectNext(hotel)
                .expectComplete()
                .verify()
    }


    @TestConfiguration
    class SpringConfig {
        @Bean
        fun dynamoDbAsyncClient(dynamoProperties: DynamoProperties): DynamoDbAsyncClient {
            ...
        }

        @Bean
        fun dynamoDbSyncClient(dynamoProperties: DynamoProperties): DynamoDbClient {
            ...
        }
    }
}

↧

Water Pouring Problem with Kotlin and Vavr

March 4, 2019, 9:30 pm

≫ Next: Functional Hystrix using Spring Cloud HystrixCommands

≪ Previous: Spring-Boot 2.1.x and overriding bean definition

The first time I saw the Water Pouring Problem being programmatically solved was the excellent lectures on functional Programming by Martin Odersky on Coursera. The solution demonstrates the power of lazy evaluation in Streams with Scala.

Solving Water Pouring Problem using Kotlin

I wanted to explore how I can rewrite the solution described by Martin Odersky using Kotlin and I realized two things - one is that the immutable data structures that Kotlin offers are simply wrappers over Java Collections library and are not truly immutable, secondly the solution using Streams feature in Java will be difficult. However, the Vavr offers a good alternative to both - a first-class Immutable collections library and a Streams library and I took a crack at replicating the solution with Kotlin and Vavr.

A Cup looks like this, represented as a Kotlin data class:

import io.vavr.collection.List


data class Cup(val level: Int, val capacity: Int) {
    override fun toString(): String {
        return "Cup($level/$capacity)"
    }
}

Since the water pouring problem repesents the "state" of a set of cups, this can be simply represented as a "typealias" the following way:

typealias State = List<Cup>

There are 3 different types of moves that can be performed with the water in the cup - Empty it, Fill it, or Pour from one cup to another, represented again as Kotlin Data classes:

interface Move {
    fun change(state: State): State
}

data class Empty(val glass: Int) : Move {
    override fun change(state: State): State {
        val cup = state[glass]
        return state.update(glass, cup.copy(level = 0))
    }

    override fun toString(): String {
        return "Empty($glass)"
    }
}

data class Fill(val glass: Int) : Move {
    override fun change(state: State): State {
        val cup = state[glass]
        return state.update(glass, cup.copy(level = cup.capacity))
    }

    override fun toString(): String {
        return "Fill($glass)"
    }
}

data class Pour(val from: Int, val to: Int) : Move {
    override fun change(state: State): State {
        val cupFrom = state[from]
        val cupTo = state[to]
        val amount = min(cupFrom.level, cupTo.capacity - cupTo.level)

        return state
                .update(from, cupFrom.copy(cupFrom.level - amount))
                .update(to, cupTo.copy(level = cupTo.level + amount))
    }

    override fun toString(): String {
        return "Pour($from,$to)"
    }
}

The implementation is making use of Vavr's List data structures "update" method to create a new list with just the relevant elements updated.

A "Path" represents a history of moves leading to the current state:

data class Path(val initialState: pour.State, val endState: State, val history: List<Move>) {
    fun extend(move: Move) = Path(initialState, move.change(endState), history.prepend(move))

    override fun toString(): String {
        return history.reverse().mkString("") + " ---> " + endState
    }
}

I am using the "prepend" method of the list to add elements to the beginning of history. Prepending to a list is an O(1) operation whereas appending is O(n), hence the choice.

Given a "state", a set of possible moves to change the "state" are the following -

1. Empty the glasses -

(0 until count).map { Empty(it) }

2. Fill the glasses -

(0 until count).map { Fill(it) }

3. Pour from one glass to another -

(0 until count).flatMap { from ->
    (0 until initialState.length()).filter { to -> from != to }.map { to ->
        Pour(from, to)
    }
}

Now, all these moves are used for advancing from one state to another. Consider say 2 cups with capacity of 4 and 9 litres, initially filled with 0 litres of water, represented as "List(Cup(0/4), Cup(0/9))", with all possible moves the next set of states of the cups are the following:

Similarly, advancing each of these states to a new set of states would like this(in a somewhat simplified form):

As each State advances to a next set of states based on all possible moves, it can be seen that there will be an explosion of possible paths, this is where laziness offered by the Stream data structure of Vavr comes in. The values in a stream are only computed on request.

Given a set of paths, new paths are created using Stream the following way:

fun from(paths: Set<Path>, explored: Set<State>): Stream<Set<Path>> {
    if (paths.isEmpty) {
        return Stream.empty()
    } else {
        val more = paths.flatMap { path ->
            moves.map { move ->
                val next: Path = path.extend(move)
                next
            }.filter { !explored.contains(it.endState) }
        }
        return Stream.cons(paths) { from(more, explored.addAll(more.map { it.endState })) }
    }
}

So, now given a stream of potential paths from the initial state to a new state, a solution to a "target" state becomes:

val pathSets = from(hashSet(initialPath), hashSet())

fun solution(target: State): Stream<Path> {
    return pathSets.flatMap { it }.filter { path -> path.endState == target }
}

That covers the solution, a test with this code looks like this - there are two cups of 4 litre and 9 litre capacity, initially filled with 0 litres of water. The final target state is to get the second cup filled with 6 litres of water:

val initialState = list(Cup(0, 4), Cup(0, 9))
val pouring = Pouring(initialState)

pouring.solution(list(Cup(0, 4), Cup(6, 9)))
    .take(1).forEach { path ->
        println(path)
    }

when run, this spits out the following solution:

Fill(1) Pour(1,0) Empty(0) Pour(1,0) Empty(0) Pour(1,0) Fill(1) Pour(1,0) Empty(0) ---> List(Cup(0/4), Cup(6/9))

Graphically represented, the solution looks like this:

It may be easier to simply follow a working version of the sample which is available in my github repo - https://github.com/bijukunjummen/algos/blob/master/src/test/kotlin/pour/Pouring.kt

Conclusion

Although Kotlin lacks first class support for native immutable datastructures, I feel that a combination of Vavr with Kotlin makes for a solution that is as elegant as the Scala one.

↧

Functional Hystrix using Spring Cloud HystrixCommands

April 30, 2019, 6:57 pm

≫ Next: Callback hell and Reactive patterns

≪ Previous: Water Pouring Problem with Kotlin and Vavr

Spring's WebClient provides a non-blocking client for making service to service calls. Hystrix, though now in a maintenance mode, has been used for protecting service to service calls by preventing cascading failures, providing circuit breakers for calls to slow or faulty upstream services.

In this post, I will be exploring how Spring Cloud provides a newer functional approach to wrapping a remote call with Hystrix.

Consider a simple service that returns a list of entities, say a list of cities, modeled using the excellent Wiremock tool:

WIREMOCK_SERVER.stubFor(WireMock.get(WireMock.urlMatching("/cities"))
                .withHeader("Accept", WireMock.equalTo("application/json"))
                .willReturn(WireMock.aResponse()
                        .withStatus(HttpStatus.OK.value())
                        .withFixedDelay(5000)
                        .withHeader("Content-Type", "application/json")))

When called with a uri of the type "/cities" this Wiremock endpoint responds with a json of the following type:

[
  {
"country": "USA",
"id": 1,
"name": "Portland",
"pop": 1600000
  },
  {
"country": "USA",
"id": 2,
"name": "Seattle",
"pop": 3200000
  },
  {
"country": "USA",
"id": 3,
"name": "SFO",
"pop": 6400000
  }
]

after a delay of 5 seconds.

Traditional approach

There are many approaches to using Hystrix, I have traditionally preferred an approach where an explicit Hystrix Command protects the remote call, along these lines:

import com.netflix.hystrix.HystrixCommandGroupKey
import com.netflix.hystrix.HystrixCommandKey
import com.netflix.hystrix.HystrixCommandProperties
import com.netflix.hystrix.HystrixObservableCommand
import org.bk.samples.model.City
import org.slf4j.Logger
import org.slf4j.LoggerFactory
import org.springframework.http.MediaType
import org.springframework.web.reactive.function.client.WebClient
import org.springframework.web.reactive.function.client.bodyToFlux
import org.springframework.web.util.UriComponentsBuilder
import reactor.core.publisher.Flux
import rx.Observable
import rx.RxReactiveStreams
import rx.schedulers.Schedulers
import java.net.URI


class CitiesHystrixCommand(
        private val webClientBuilder: WebClient.Builder,
        private val citiesBaseUrl: String
) : HystrixObservableCommand<City>(
        HystrixObservableCommand.Setter
                .withGroupKey(HystrixCommandGroupKey.Factory.asKey("cities-service"))
                .andCommandKey(HystrixCommandKey.Factory.asKey("cities-service"))
                .andCommandPropertiesDefaults(HystrixCommandProperties.Setter()
                        .withExecutionTimeoutInMilliseconds(4000))) {
    override fun construct(): Observable<City> {
        val buildUri: URI = UriComponentsBuilder
                .fromUriString(citiesBaseUrl)
                .path("/cities")
                .build()
                .encode()
                .toUri()

        val webClient: WebClient = this.webClientBuilder.build()

        val result: Flux<City> = webClient.get()
                .uri(buildUri)
                .accept(MediaType.APPLICATION_JSON)
                .exchange()
                .flatMapMany { clientResponse ->
                    clientResponse.bodyToFlux<City>()
                }

        return RxReactiveStreams.toObservable(result)
    }

    override fun resumeWithFallback(): Observable<City> {
        LOGGER.error("Falling back on cities call", executionException)
        return Observable.empty()
    }

    companion object {
        private val LOGGER: Logger = LoggerFactory.getLogger(CitiesHystrixCommand::class.java)
    }
}

This code can now be used to make a remote call the following way:

import org.springframework.http.MediaType
import org.springframework.web.reactive.function.client.WebClient


class CitiesHystrixCommandBasedClient(
        private val webClientBuilder: WebClient.Builder,
        private val citiesBaseUrl: String
) {
    fun getCities(): Flux<City> {
        val citiesObservable: Observable<City> = CitiesHystrixCommand(webClientBuilder, citiesBaseUrl)
                .observe()
                .subscribeOn(Schedulers.io())

        return Flux
                .from(RxReactiveStreams
                        .toPublisher(citiesObservable))
    }
}

Two things to note here,
1. WebClient returns a Project Reactor "Flux" type representing a list of cities, however Hystrix is Rx-Java 1 based, so Flux is being transformed to Rx-Java Observable using "RxReactiveStreams.toObservable()" call, provided by the RxJavaReactiveStreams library here.

2. I still want Project Reactor "Flux" type to be used in the rest of the application, so there is another adapter that converts the Rx-Java Observable back to a Flux - "Flux.from(RxReactiveStreams.toPublisher(citiesObservable))" once the call wrapped in Hystrix returns.

If I were to try this client with the wiremock sample with the 5 second delay, it correctly handles the delay and returns after a second.

Functional approach

There is a lot of boiler-plate with the previous approach which is avoided with the new functional approach of using HystrixCommands, a utility class which comes with Spring Cloud which provides a functional approach to making the remote call wrapped with Hystrix.

The entirety of the call using HystrixCommands looks like this:

import com.netflix.hystrix.HystrixCommandProperties
import org.bk.samples.model.City
import org.slf4j.Logger
import org.slf4j.LoggerFactory
import org.springframework.cloud.netflix.hystrix.HystrixCommands
import org.springframework.http.MediaType
import org.springframework.web.reactive.function.client.WebClient
import org.springframework.web.reactive.function.client.bodyToFlux
import org.springframework.web.util.UriComponentsBuilder
import reactor.core.publisher.Flux
import rx.schedulers.Schedulers
import java.net.URI

class CitiesFunctionalHystrixClient(
        private val webClientBuilder: WebClient.Builder,
        private val citiesBaseUrl: String
) {
    fun getCities(): Flux<City> {
        return HystrixCommands
                .from(callCitiesService())
                .commandName("cities-service")
                .groupName("cities-service")
                .commandProperties(
                        HystrixCommandProperties.Setter()
                                .withExecutionTimeoutInMilliseconds(1000)
                )
                .toObservable { obs ->
                    obs.observe()
                            .subscribeOn(Schedulers.io())
                }
                .fallback { t: Throwable ->
                    LOGGER.error(t.message, t)
                    Flux.empty()
                }
                .toFlux()
    }

    fun callCitiesService(): Flux<City> {
        val buildUri: URI = UriComponentsBuilder
                .fromUriString(citiesBaseUrl)
                .path("/cities")
                .build()
                .encode()
                .toUri()

        val webClient: WebClient = this.webClientBuilder.build()

        return webClient.get()
                .uri(buildUri)
                .accept(MediaType.APPLICATION_JSON)
                .exchange()
                .flatMapMany { clientResponse ->
                    clientResponse.bodyToFlux<City>()
                }
    }

    companion object {
        private val LOGGER: Logger = LoggerFactory.getLogger(CitiesHystrixCommand::class.java)
    }
}

A lot of boiler-plate is avoided with this approach -
1. an explicit command is not required anymore
2. the call and the fallback are coded in a fluent manner
3. Any overrides can be explicitly specified - in this specific instance the timeout of 1 second.

Conclusion

I like the conciseness which HystrixCommands brings to the usage of Hystrix with WebClient. I have the entire sample available in my github repo - https://github.com/bijukunjummen/webclient-hystrix-sample, all the dependencies required to get the samples to work is part of this repo. If you are interested in sticking with Rx-Java 1, then an approach described here may help you avoid boiler-plate with vanilla Hystrix

↧

Callback hell and Reactive patterns

June 9, 2019, 4:22 pm

≫ Next: Chicken and egg - resolving Spring properties ahead of a test

≪ Previous: Functional Hystrix using Spring Cloud HystrixCommands

One of the ways that I have better understood the usefulness of a Reactive Streams based approach is how it simplifies a Non-blocking IO call.

This post will be a quick walkthrough of the kind of code involved in making a synchronous remote call, then show how layering in Non-blocking IO though highly efficient in the use of resources(especially threads) introduces complications referred to as a callback hell and how a reactive streams based approach simplifies the programming model.

Target Service

Since I will be writing a client call, my target service representing the details of a City has two endpoints. One returning a list of city id's when called with a uri of type - "/cityids" and a sample result looks like this:

and an endpoint returning the details of a city given its id, for example when called using an id of 1 - "/cities/1":

{
"country": "USA",
"id": 1,
"name": "Portland",
"pop": 1600000
}

The client's responsibility is to get the list of city id's and then for each city id get the detail of the city and put it together into a list of cities.

Synchronous call

I am using Spring Framework's RestTemplate to make the remote call. A Kotlin function to get the list of cityids looks like this:

private fun getCityIds(): List<String> {
    val cityIdsEntity: ResponseEntity<List<String>> = restTemplate
            .exchange("http://localhost:$localServerPort/cityids",
                    HttpMethod.GET,
                    null,
                    object : ParameterizedTypeReference<List<String>>() {})
    return cityIdsEntity.body!!
}

and to get the details of a city:

private fun getCityForId(id: String): City {
    return restTemplate.getForObject("http://localhost:$localServerPort/cities/$id", City::class.java)!!
}

Given these two functions it is easy to compose them such that a list of cities is returned:

val cityIds: List<String> = getCityIds()
val cities: List<City> = cityIds
        .stream()
        .map<City> { cityId -> getCityForId(cityId) }
        .collect(Collectors.toList())

cities.forEach { city -> LOGGER.info(city.toString()) }

The code is very easy to understand, however, there are 8 blocking calls involved -
1. to get the list of 7 city ids and then to get the details for each
2. To get the details of each of the 7 cities

Each of these calls would have been on a different thread.

Using Non-Blocking IO with callback

I will be using a library called AsyncHttpClient to make a non-blocking IO call.

AyncHttpClient returns a ListenableFuture type when a remote call is made.

val responseListenableFuture: ListenableFuture<Response> = asyncHttpClient
                .prepareGet("http://localhost:$localServerPort/cityids")
                .execute()

A callback can be attached to a Listenable future to act on the response when available.

responseListenableFuture.addListener(Runnable {
    val response: Response = responseListenableFuture.get()
    val responseBody: String = response.responseBody
    val cityIds: List<Long> = objectMapper.readValue<List<Long>>(responseBody,
            object : TypeReference<List<Long>>() {})
    ....
}

Given the list of cityids I want to get the details of the city, so from the response I need to make more remote calls and attach a callback for each of the calls to get the details of the city along these lines:

val responseListenableFuture: ListenableFuture<Response> = asyncHttpClient
        .prepareGet("http://localhost:$localServerPort/cityids")
        .execute()

responseListenableFuture.addListener(Runnable {
    val response: Response = responseListenableFuture.get()
    val responseBody: String = response.responseBody
    val cityIds: List<Long> = objectMapper.readValue<List<Long>>(responseBody,
            object : TypeReference<List<Long>>() {})

    cityIds.stream().map { cityId ->
        val cityListenableFuture = asyncHttpClient
                .prepareGet("http://localhost:$localServerPort/cities/$cityId")
                .execute()

        cityListenableFuture.addListener(Runnable {
            val cityDescResp = cityListenableFuture.get()
            val cityDesc = cityDescResp.responseBody
            val city = objectMapper.readValue(cityDesc, City::class.java)
            LOGGER.info("Got city: $city")
        }, executor)
    }.collect(Collectors.toList())
}, executor)

This is a gnarly piece of code, there is set of callbacks within a callback which is very difficult to reason about and make sense of and hence referred to as the callback hell.

Using Non-Blocking IO with Java CompletableFuture

This code can be improved a little by returning a Java's CompletableFuture as the return type instead of the ListenableFuture. CompletableFuture provides operators that allow the return type to modified and returned.

As an example, consider the function to get the list of city ids:

private fun getCityIds(): CompletableFuture<List<Long>> {
    return asyncHttpClient
            .prepareGet("http://localhost:$localServerPort/cityids")
            .execute()
            .toCompletableFuture()
            .thenApply { response ->
                val s = response.responseBody
                val l: List<Long> = objectMapper.readValue(s, object : TypeReference<List<Long>>() {})
                l
            }
}

Here I am using the "thenApply" operator to transform "CompletableFuture<Response>" to "CompletableFuture<List<Long>>

And similarly to get the detail a city:

private fun getCityDetail(cityId: Long): CompletableFuture<City> {
    return asyncHttpClient.prepareGet("http://localhost:$localServerPort/cities/$cityId")
            .execute()
            .toCompletableFuture()
            .thenApply { response ->
                val s = response.responseBody
                LOGGER.info("Got {}", s)
                val city = objectMapper.readValue(s, City::class.java)
                city
            }
}

This is an improvement from the Callback based approach, however, CompletableFuture lacks sufficient operators, say in this specific instance where all the city details need to be put together:

val cityIdsFuture: CompletableFuture<List<Long>> = getCityIds()
val citiesCompletableFuture: CompletableFuture<List<City>> =
        cityIdsFuture
                .thenCompose { l ->
                    val citiesCompletable: List<CompletableFuture<City>> =
                            l.stream()
                                    .map { cityId ->
                                        getCityDetail(cityId)
                                    }.collect(toList())

                    val citiesCompletableFutureOfList: CompletableFuture<List<City>> =
                            CompletableFuture.allOf(*citiesCompletable.toTypedArray())
                                    .thenApply { _: Void? ->
                                        citiesCompletable
                                                .stream()
                                                .map { it.join() }
                                                .collect(toList())
                                    }
                    citiesCompletableFutureOfList
                }

I have used an operator called CompletableFuture.allOf which returns a "Void" type and has to be coerced to return the desired type of ""CompletableFuture<List<City>>.

Using Project Reactor

Project Reactor is an implementation of the Reactive Streams specification. It has two specialized types to return a stream of 0/1 item and a stream of 0/n items - the former is a Mono, the latter a Flux.

Project Reactor provides a very rich set of operators that allow the stream of data to be transformed in a variety of ways. Consider first the function to return a list of City ids:

private fun getCityIds(): Flux<Long> {
    return webClient.get()
            .uri("/cityids")
            .exchange()
            .flatMapMany { response ->
                LOGGER.info("Received cities..")
                response.bodyToFlux<Long>()
            }
}

I am using Spring's excellent WebClient library to make the remote call and get a Project reactor "Mono<ClientResponse>" type of response, which can be modified to a "Flux<Long>" type using the "flatMapMany" operator.

Along the same lines to get the detail of the city, given a city id:

private fun getCityDetail(cityId: Long?): Mono<City> {
    return webClient.get()
            .uri("/cities/{id}", cityId!!)
            .exchange()
            .flatMap { response ->
                val city: Mono<City> = response.bodyToMono()
                LOGGER.info("Received city..")
                city
            }
}

Here a Project reactor "Mono<ClientResponse>" type is being transformed to "Mono<City>" type using the "flatMap" operator.

and the code to get the cityids and then the City's from it:

val cityIdsFlux: Flux<Long> = getCityIds()
val citiesFlux: Flux<City> = cityIdsFlux
        .flatMap { this.getCityDetail(it) }

return citiesFlux

This is very expressive - contrast the mess of a callback based approach and the simplicity of the reactive streams based approach.

Conclusion

In my mind, this is one of the biggest reasons to use a Reactive Streams based approach and in particular Project Reactor for scenarios that involve crossing asynchronous boundaries like in this instance to make remote calls. It cleans up the mess of callbacks and callback hells and provides a natural approach of modifying/transforming types using a rich set of operators.

My repository with a working version of all the samples that I have used here is available at https://github.com/bijukunjummen/reactive-cities-demo/tree/master/src/test/kotlin/samples/geo/kotlin

↧

Chicken and egg - resolving Spring properties ahead of a test

August 6, 2019, 7:16 am

≫ Next: Unit test for Spring's WebClient

≪ Previous: Callback hell and Reactive patterns

Consider a service class responsible for making a remote call and retrieving a detail:

...
public class CitiesService {
    private final WebClient.Builder webClientBuilder;
    private final String baseUrl;

    public CitiesService(
            WebClient.Builder webClientBuilder,
            @Value("${cityservice.url}") String baseUrl) {
        this.webClientBuilder = webClientBuilder;
        this.baseUrl = baseUrl;
    }


    public Flux<City> getCities() {
        return this.webClientBuilder.build()
                .get()
....

This is a Spring Bean and resolves the url to call through a property called "cityservice.url".

If I wanted to test this class, an approach that I have been using when using WebClient is to start a mock server using the excellent Wiremock and using it to test this class. A Wiremock mock looks like this:

    private static final WireMockServer WIREMOCK_SERVER =
            new WireMockServer(wireMockConfig().dynamicPort());


    .....

    WIREMOCK_SERVER.stubFor(get(urlEqualTo("/cities"))
                .withHeader("Accept", equalTo("application/json"))
                .willReturn(aResponse()
                        .withStatus(200)
                        .withHeader("Content-Type", "application/json")
                        .withBody(resultJson)));

The Wiremock server is being started up at a random port and it is set to respond to an endpoint called "/cities". Here is where the chicken and egg problem comes up:

1. The CitiesService class requires a property called "cityservice.url" to be set before starting the test.
2. Wiremock is started at a random port and the url that it is responding to is "http://localhost:randomport" and is available only once the test is kicked off.

There are three potential solutions that I can think of to break this circular dependency:

Approach 1: To use a hardcoded port

This approach depends on starting up Wiremock on a fixed port instead of a dynamic port, this way the property can be set when starting the test up, something like this:

@ExtendWith(SpringExtension.class)
@SpringBootTest(classes = CitiesServiceHardcodedPortTest.SpringConfig.class,
        properties = "cityservice.url=http://localhost:9876")
public class CitiesServiceHardcodedPortTest {
    private static final WireMockServer WIREMOCK_SERVER =
            new WireMockServer(wireMockConfig().port(9876));

Here Wiremock is being started at port 9876 and the property at startup is being set to "http://localhost:9876/".

This solves the problem, however, this is not CI server friendly, it is possible for the ports to collide at runtime and this makes for a flaky test.

Approach 2: Not use Spring for test

A better approach is to not use the property, along these lines:

public class CitiesServiceDirectTest {
    private static final WireMockServer WIREMOCK_SERVER =
            new WireMockServer(wireMockConfig().dynamicPort());

    private CitiesService citiesService;

    @BeforeEach
    public void beforeEachTest() {
        final WebClient.Builder webClientBuilder = WebClient.builder();

        this.citiesService = new CitiesService(webClientBuilder, WIREMOCK_SERVER.baseUrl());
    }

Here the service is being created by explicitly setting the baseUrl in the constructor, thus avoiding the need to set a property ahead of the test.

Approach 3: Application Context Initializer

ApplicationContextInitializer is used for programmatically initializing a Spring Application Context and it can be used with a test to inject in the property before the actual test is executed. Along these lines:

@ExtendWith(SpringExtension.class)
@SpringBootTest(classes = CitiesServiceSpringTest.SpringConfig.class)
@ContextConfiguration(initializers = {CitiesServiceSpringTest.PropertiesInitializer.class})
public class CitiesServiceSpringTest {
    private static final WireMockServer WIREMOCK_SERVER =
            new WireMockServer(wireMockConfig().dynamicPort());

    @Autowired
    private CitiesService citiesService;

    @Test
    public void testGetCitiesCleanFlow() throws Exception {
        ...
    }



    static class PropertiesInitializer implements ApplicationContextInitializer<ConfigurableApplicationContext> {

        @Override
        public void initialize(ConfigurableApplicationContext applicationContext) {
            TestPropertyValues.of(
"cityservice.url=" + "http://localhost:" + WIREMOCK_SERVER.port()
            ).applyTo(applicationContext.getEnvironment());
        }
    }

}

Wiremock is started up first, then Spring context is initialized using the initializer which injects in the "cityservice.url" property using the Wiremocks dynamic port, this way the property is available for wiring into CityService.

Conclusion

I personally prefer Approach 2, however it is good to have Spring's wiring and the dependent beans created ahead of the test and if the class utilizes these then I prefer Approach 3. Application Context initializer provides a good way to break the chicken and egg problem with properties like these which need to be available ahead of Spring's context getting engaged.

All the code samples are available here:

Approach 1: https://github.com/bijukunjummen/reactive-cities-demo/blob/master/src/test/java/samples/geo/service/CitiesServiceHardcodedPortTest.java
Approach 2: https://github.com/bijukunjummen/reactive-cities-demo/blob/master/src/test/java/samples/geo/service/CitiesServiceDirectTest.java
Approach 3: https://github.com/bijukunjummen/reactive-cities-demo/blob/master/src/test/java/samples/geo/service/CitiesServiceSpringTest.java

↧

Unit test for Spring's WebClient

September 15, 2019, 5:39 pm

≫ Next: Hash a Json

≪ Previous: Chicken and egg - resolving Spring properties ahead of a test

WebClient to quote its Java documentation is Spring Framework's

Non-blocking, reactive client to perform HTTP requests, exposing a fluent, reactive API over underlying HTTP client libraries such as Reactor Netty

.

In my current project I have been using WebClient extensively in making service to service calls and have found it to be an awesome API and I love its use of fluent interface.

Consider a remote service which returns a list of "Cities". A code using WebClient looks like this:

...
import org.springframework.http.MediaType
import org.springframework.web.reactive.function.client.WebClient
import org.springframework.web.reactive.function.client.bodyToFlux
import org.springframework.web.util.UriComponentsBuilder
import reactor.core.publisher.Flux
import java.net.URI

class CitiesClient(
        private val webClientBuilder: WebClient.Builder,
        private val citiesBaseUrl: String
) {

    fun getCities(): Flux<City> {
        val buildUri: URI = UriComponentsBuilder
                .fromUriString(citiesBaseUrl)
                .path("/cities")
                .build()
                .encode()
                .toUri()

        val webClient: WebClient = this.webClientBuilder.build()

        return webClient.get()
                .uri(buildUri)
                .accept(MediaType.APPLICATION_JSON)
                .exchange()
                .flatMapMany { clientResponse ->
                    clientResponse.bodyToFlux<City>()
                }
    }
}

It is difficult to test a client making use of WebClient though. In this post, I will go over the challenges in testing a client using WebClient and a clean solution.

Challenges in mocking WebClient

An effective unit test of the "CitiesClient" class would require mocking of WebClient and every method call in the fluent interface chain along these lines:

val mockWebClientBuilder: WebClient.Builder = mock()
val mockWebClient: WebClient = mock()
whenever(mockWebClientBuilder.build()).thenReturn(mockWebClient)

val mockRequestSpec: WebClient.RequestBodyUriSpec = mock()
whenever(mockWebClient.get()).thenReturn(mockRequestSpec)
val mockRequestBodySpec: WebClient.RequestBodySpec = mock()

whenever(mockRequestSpec.uri(any<URI>())).thenReturn(mockRequestBodySpec)

whenever(mockRequestBodySpec.accept(any())).thenReturn(mockRequestBodySpec)

val citiesJson: String = this.javaClass.getResource("/sample-cities.json").readText()

val clientResponse: ClientResponse = ClientResponse
        .create(HttpStatus.OK)
        .header("Content-Type","application/json")
        .body(citiesJson).build()

whenever(mockRequestBodySpec.exchange()).thenReturn(Mono.just(clientResponse))

val citiesClient = CitiesClient(mockWebClientBuilder, "http://somebaseurl")

val cities: Flux<City> = citiesClient.getCities()

This makes for an extremely flaky test as any change in the order of calls would result in new mocks that will need to be recorded.

Testing using real endpoints

An approach that works well is to bring up a real server that behaves like the target of a client. Two mock servers that work really well are mockwebserver in okhttp library and WireMock. An example with Wiremock looks like this:

import com.github.tomakehurst.wiremock.WireMockServer
import com.github.tomakehurst.wiremock.client.WireMock
import com.github.tomakehurst.wiremock.core.WireMockConfiguration
import org.bk.samples.model.City
import org.junit.jupiter.api.AfterAll
import org.junit.jupiter.api.BeforeAll
import org.junit.jupiter.api.Test
import org.springframework.http.HttpStatus
import org.springframework.web.reactive.function.client.WebClient
import reactor.core.publisher.Flux
import reactor.test.StepVerifier

class WiremockWebClientTest {

    @Test
    fun testARemoteCall() {
        val citiesJson = this.javaClass.getResource("/sample-cities.json").readText()
        WIREMOCK_SERVER.stubFor(WireMock.get(WireMock.urlMatching("/cities"))
                .withHeader("Accept", WireMock.equalTo("application/json"))
                .willReturn(WireMock.aResponse()
                        .withStatus(HttpStatus.OK.value())
                        .withHeader("Content-Type", "application/json")
                        .withBody(citiesJson)))

        val citiesClient = CitiesClient(WebClient.builder(), "http://localhost:${WIREMOCK_SERVER.port()}")

        val cities: Flux<City> = citiesClient.getCities()

        StepVerifier
                .create(cities)
                .expectNext(City(1L, "Portland", "USA", 1_600_000L))
                .expectNext(City(2L, "Seattle", "USA", 3_200_000L))
                .expectNext(City(3L, "SFO", "USA", 6_400_000L))
                .expectComplete()
                .verify()
    }

    companion object {
        private val WIREMOCK_SERVER = WireMockServer(WireMockConfiguration.wireMockConfig().dynamicPort())

        @BeforeAll
        @JvmStatic
        fun beforeAll() {
            WIREMOCK_SERVER.start()
        }

        @AfterAll
        @JvmStatic
        fun afterAll() {
            WIREMOCK_SERVER.stop()
        }
    }
}

Here a server is being brought up at a random port, it is then injected with a behavior and then the client is tested against this server and validated. This approach works and there is no muddling with the internals of WebClient in mocking this behavior, but technically this is an integration test and it will be slower to execute than a pure unit test.

Unit testing by short-circuiting the remote call

An approach that I have been using recently is to short circuit the remote call using an ExchangeFunction. An ExchangeFunction represents the actual mechanisms in making the remote call and can be replaced with one that responds with what the test expects the following way:

import org.junit.jupiter.api.Test
import org.springframework.http.HttpStatus
import org.springframework.web.reactive.function.client.ClientResponse
import org.springframework.web.reactive.function.client.ExchangeFunction
import org.springframework.web.reactive.function.client.WebClient
import reactor.core.publisher.Flux
import reactor.core.publisher.Mono
import reactor.test.StepVerifier

class CitiesWebClientTest {

    @Test
    fun testCleanResponse() {
        val citiesJson: String = this.javaClass.getResource("/sample-cities.json").readText()

        val clientResponse: ClientResponse = ClientResponse
                .create(HttpStatus.OK)
                .header("Content-Type","application/json")
                .body(citiesJson).build()
        val shortCircuitingExchangeFunction = ExchangeFunction {
            Mono.just(clientResponse)
        }

        val webClientBuilder: WebClient.Builder = WebClient.builder().exchangeFunction(shortCircuitingExchangeFunction)
        val citiesClient = CitiesClient(webClientBuilder, "http://somebaseurl")

        val cities: Flux<City> = citiesClient.getCities()

        StepVerifier
                .create(cities)
                .expectNext(City(1L, "Portland", "USA", 1_600_000L))
                .expectNext(City(2L, "Seattle", "USA", 3_200_000L))
                .expectNext(City(3L, "SFO", "USA", 6_400_000L))
                .expectComplete()
                .verify()
    }
}

The WebClient is injected with a ExchangeFunction which simply returns a response with the expected behavior of the remote server. This has short circuited the entire remote call and allows the client to be tested comprehensively. This approach depends on a little knowledge of the internals of the WebClient. This is a decent compromise though as it would run far faster than a test using WireMock.

This approach is not original though, I have based this test on some of the tests used for testing WebClient itself, for eg, the one here

Conclusion

I personally prefer the last approach, it has enabled me to write fairly comprehensive unit tests for a Client making use of WebClient for remote calls. My project with fully working samples is here.

↧

Hash a Json

November 12, 2019, 7:07 am

≫ Next: Project reactor - de-structuring a Tuple

≪ Previous: Unit test for Spring's WebClient

I recently wrote a simple library to predictably hash a json.

The utility is built on top of the excellent Jackson Json parsing library

Problem

I needed a hash generated out of a fairly large json based content to later determine if the content has changed at all. Treating json as a string is not an option as formatting, shuffling of keys can skew the results.

Solution

The utility is simple - it traverses the Jackson JsonNode representation of the json:

1. For every object node, it sorts the keys and then traverses the elements, calculates aggregated hash from all the children
2. For every array node, it traverses to the elements and aggregates the hash
3. For every terminal node, it takes the key and value and generates the SHA-256 hash from it

This way the hash is generated for the entire tree.

Consider a Jackson Json Node, created the following way in code:

ObjectNode jsonNode = JsonNodeFactory
        .instance
        .objectNode()
        .put("key1", "value1");

jsonNode.set("key2", JsonNodeFactory.instance.objectNode()
        .put("child-key2", "child-value2")
        .put("child-key1", "child-value1")
        .put("child-key3", 123.23f));

jsonNode.set("key3", JsonNodeFactory.instance.arrayNode()
        .add("arr-value1")
        .add("arr-value2"));

String calculatedHash = sha256Hex(
        sha256Hex("key1") + sha256Hex("value1")
                + sha256Hex("key2") + sha256Hex(
                sha256Hex("child-key1") + sha256Hex("child-value1")
                        + sha256Hex("child-key2") + sha256Hex("child-value2")
                        + sha256Hex("child-key3") + sha256Hex("123.23"))
                + sha256Hex("key3") + sha256Hex(
                sha256Hex("arr-value1")
                        + sha256Hex("arr-value2"))
);

Here the json has 3 keys, "key1", "key2", "key3". "key1" has a primitive text field, "key2" is an object node, "key3" is an array of strings. The calculatedHash shows how the aggregated hash is calculated for the entire tree, the utility follows the same process to aggregate a hash.

If you are interested in giving this a whirl, the library is available in bintray - https://bintray.com/bijukunjummen/repo/json-hash and hosted on github here - https://github.com/bijukunjummen/json-hash

↧

Project reactor - de-structuring a Tuple

December 26, 2019, 12:29 pm

≫ Next: Spring WebClient and Java date-time fields

≪ Previous: Hash a Json

Tuples are simple data structures that hold a fixed set of items, each of a different data type.

Project Reactor provides a Tuple data structure that can hold about 8 different types.

Tuples are very useful, however one of the issues in using a Tuple is that it is difficult to make out what they hold without de-structuring them at every place they get used.

Problem

Consider a simple operator, which zips together a String and an integer this way:

Mono<Tuple2<String,  Integer>> tup = Mono.zip(Mono.just("a"), Mono.just(2))

The output is a Tuple holding 2 elements.

Now, the problem that I have is this. I prefer the tuple to be dereferenced into its component elements before doing anything with it.

Say with the previous tuple, I want to generate as many of the strings(first element of the tuple) as the count(the second element of the tuple), expressed the following way:

Mono.zip(Mono.just("a"), Mono.just(2))
    .flatMapMany(tup -> {
        return Flux.range(1, tup.getT2()).map(i -> tup.getT1() + i);
    })

The problem here is when referring to "tup.getT1()" and "tup.getT2()", it is not entirely clear what the tuple holds. Though more verbose I would prefer doing something like this:

Mono.zip(Mono.just("a"), Mono.just(2))
    .flatMapMany(tup -> {
        String s = tup.getT1();
        int count = tup.getT2();

        return Flux.range(1, count).map(i -> s + i);
    });

Solution

The approach of de-structuring into explicit variables with meaningful name works, but it is a little verbose. A far better approach is provided using a utility set of functions that Project reactor comes with called TupleUtils

I feel it is best explained using a sample, with TupleUtils the previous de-structuring looks like this:

Mono.zip(Mono.just("a"), Mono.just(2))
    .flatMapMany(TupleUtils.function((s, count) ->
            Flux.range(1, count).map(i -> s + i)))

This looks far more concise than explicit de-structuring. There is a bit of cleverness that needs some getting used to though:

The signature of flatMapMany is -

public final <R> Flux<R> flatMapMany(Function<? super T,? extends Publisher<? extends R>> mapper)

TupleUtils provides another indirection which returns the Function required above, through another function:

public static <T1, T2, R> Function<Tuple2<T1, T2>, R> function(BiFunction<T1, T2, R> function) {
    return tuple -> function.apply(tuple.getT1(), tuple.getT2());
}

If you are using Kotlin, there is a simpler approach possible. This is based on the concept of "Destructuring Declarations". Project reactor provides a set of Kotlin helper utilities using an additional gradle dependency:

implementation("io.projectreactor.kotlin:reactor-kotlin-extensions")

With this dependency in place, an equivalent Kotlin code looks like this:

Mono.zip(Mono.just("a"), Mono.just(2))
    .flatMapMany { (s: String, count: Int) ->
        Flux.range(1, count).map { i: Int -> s + i }
    }

See how a tuple has been de-structured directly into variables. This looks like this in isolation:

val (s: String, count: Int) = tup

Conclusion

It is important to de-structure a tuple into more meaningful variables to improve readability of the code and the useful functions provided by the TupleUtils as well as the Kotlin extensions helps keep the code concise but readable.

The Java samples are here and Kotlin samples here

↧

Spring WebClient and Java date-time fields

January 19, 2020, 9:28 pm

≫ Next: Project Reactor expand method

≪ Previous: Project reactor - de-structuring a Tuple

WebClient is Spring Framework's reactive client for making service to service calls.

WebClient has become a go to utility for me, however I unexpectedly encountered an issue recently in the way it handles Java 8 time fields that tripped me up and this post goes into the details.

Happy Path

First the happy path. When using a WebClient, Spring Boot advices a "WebClient.Builder" to be injected into a class instead of the "WebClient" itself and a WebClient.Builder is already auto-configured and available for injection.

Consider a fictitious "City" domain and a client to create a "City". "City" has a simple structure, note that the creationDate is a Java8 "Instant" type:

import java.time.Instant

data class City(
    val id: Long,
    val name: String,
    val country: String,
    val pop: Long,
    val creationDate: Instant = Instant.now()
)

The client to create an instance of this type looks like this:

class CitiesClient(
    private val webClientBuilder: WebClient.Builder,
    private val citiesBaseUrl: String
) {
    fun createCity(city: City): Mono<City> {
        val uri: URI = UriComponentsBuilder
            .fromUriString(citiesBaseUrl)
            .path("/cities")
            .build()
            .encode()
            .toUri()

        val webClient: WebClient = this.webClientBuilder.build()

        return webClient.post()
            .uri(uri)
            .contentType(MediaType.APPLICATION_JSON)
            .accept(MediaType.APPLICATION_JSON)
            .bodyValue(city)
            .exchange()
            .flatMap { clientResponse ->
                clientResponse.bodyToMono(City::class.java)
            }
    }
}

See how the intent is expressed in a fluent way. The uri and the headers are first being set, the request body is then put in place and the response is unmarshalled back to a "City" response type.

All well and good. Now how does a test look like.

I am using the excellent Wiremock to bring up a dummy remote service and using this CitiesClient to send the request, along these lines:

@SpringBootTest 
@AutoConfigureJson
class WebClientConfigurationTest {

    @Autowired
    private lateinit var webClientBuilder: WebClient.Builder

    @Autowired
    private lateinit var objectMapper: ObjectMapper

    @Test
    fun testAPost() {
        val dateAsString = "1985-02-01T10:10:10Z"

        val city = City(
            id = 1L, name = "some city",
            country = "some country",
            pop = 1000L,
            creationDate = Instant.parse(dateAsString)
        )
        WIREMOCK_SERVER.stubFor(
            post(urlMatching("/cities"))
                .withHeader("Accept", equalTo("application/json"))
                .withHeader("Content-Type", equalTo("application/json"))
                .willReturn(
                    aResponse()
                        .withHeader("Content-Type", "application/json")
                        .withStatus(HttpStatus.CREATED.value())
                        .withBody(objectMapper.writeValueAsString(city))
                )
        )

        val citiesClient = CitiesClient(webClientBuilder, "http://localhost:${WIREMOCK_SERVER.port()}")

        val citiesMono: Mono<City> = citiesClient.createCity(city)

        StepVerifier
            .create(citiesMono)
            .expectNext(city)
            .expectComplete()
            .verify()


        //Ensure that date field is in ISO-8601 format..
        WIREMOCK_SERVER.verify(
            postRequestedFor(urlPathMatching("/cities"))
                .withRequestBody(matchingJsonPath("$.creationDate", equalTo(dateAsString)))
        )
    }

    companion object {
        private val WIREMOCK_SERVER =
            WireMockServer(WireMockConfiguration.wireMockConfig().dynamicPort().notifier(ConsoleNotifier(true)))

        @BeforeAll
        @JvmStatic
        fun beforeAll() {
            WIREMOCK_SERVER.start()
        }

        @AfterAll
        @JvmStatic
        fun afterAll() {
            WIREMOCK_SERVER.stop()
        }
    }
}

In the highlighted lines, I want to make sure that the remote service receives the date in ISO-8601 format as "1985-02-01T10:10:10Z". In this instance everything works cleanly and the test passes.

Not so happy path

Consider now a case where I have customized the WebClient.Builder in some form. An example is say I am using a registry service and I want to look up a remote service via this registry and then make a call then the WebClient has to be customized to add a "@LoadBalanced" annotation on it - some details here

So say, I have customized WebClient.Builder this way:

@Configuration
class WebClientConfiguration {

    @Bean
    fun webClientBuilder(): WebClient.Builder {
        return WebClient.builder().filter { req, next ->
            LOGGER.error("Custom filter invoked..")
            next.exchange(req)
        }
    }

    companion object {
        val LOGGER = loggerFor<WebClientConfiguration>()
    }
}

It looks straightforward, however now the previous test fails. Specifically the date format of the creationDate over the wire is not ISO-8601 anymore, the raw request looks like this:

{
"id": 1,
"name": "some city",
"country": "some country",
"pop": 1000,
"creationDate": 476100610.000000000
}

vs for a working request:

{
"id": 1,
"name": "some city",
"country": "some country",
"pop": 1000,
"creationDate": "1985-02-01T10:10:10Z"
}

See how the date format is different.

Problem

The underlying reason for this issue is simple, Spring Boot adds a bunch of configuration on WebClient.Builder that is lost when I have explicitly created the bean myself. Specifically in this instance there is a Jackson ObjectMapper created under the covers which by default writes dates as timestamps - some details here.

Solution

Okay, so how do we get back the customizations that Spring Boot makes. I have essentially replicated the behavior of a auto-configuration in Spring called "WebClientAutoConfiguration" and it looks like this:

@Configuration
class WebClientConfiguration {

    @Bean
    fun webClientBuilder(customizerProvider: ObjectProvider<WebClientCustomizer>): WebClient.Builder {
        val webClientBuilder: WebClient.Builder = WebClient
            .builder()
            .filter { req, next ->
                LOGGER.error("Custom filter invoked..")
                next.exchange(req)
            }

        customizerProvider.orderedStream()
            .forEach { customizer -> customizer.customize(webClientBuilder) }

        return webClientBuilder;
    }

    companion object {
        val LOGGER = loggerFor<WebClientConfiguration>()
    }
}

There is a likely a better approach than just replicating this behavior, but this approach works for me.

The posted content now looks like this:

{
"id": 1,
"name": "some city",
"country": "some country",
"pop": 1000,
"creationDate": "1985-02-01T10:10:10Z"
}

with the date back in the right format.

Conclusion

Spring Boot's auto-configurations for WebClient provides a opinionated set of defaults. If for any reason the WebClient and it's builder need to be configured explicitly then be wary of some of the customizations that Spring Boot adds and replicate it for the customized bean. In my case, the Jackson customization for Java 8 dates was missing in my custom "WebClient.Builder" and had to be explicitly accounted for.

A sample test and a customization is available here

↧

Project Reactor expand method

February 24, 2020, 9:16 pm

≫ Next: Processing SQS Messages using Spring Boot and Project Reactor

≪ Previous: Spring WebClient and Java date-time fields

One of my colleagues at work recently introduced me to the expand operator of the Project Reactor types and in this post I want to cover a few ways in which I have used it.

Unrolling a Paginated Result

Consider a Spring Data based repository on a model called City:

import org.springframework.data.jpa.repository.JpaRepository;
import samples.geo.domain.City;

public interface CityRepo extends JpaRepository<City, Long> {
}

This repository provides a way to retrieve the paginated result, along the following lines:

cityRepo.findAll(PageRequest.of(0, 5))

Now, if I were to unroll multiple pages into a result, the way to do it would be the following kind of a loop:

var pageable: Pageable = PageRequest.of(0, 5)
do {
    var page: Page<City> = cityRepo.findAll(pageable)
    page.content.forEach { city -> LOGGER.info("City $city") }
    pageable = page.nextPageable()
} while (page.hasNext())

An equivalent unroll of a paginated result can be done using the Reactor expand operator the following way:

val result: Flux<City> =
    Mono
        .fromSupplier { cityRepo.findAll(PageRequest.of(0, 5)) }
        .expand { page ->
            if (page.hasNext())
                Mono.fromSupplier { cityRepo.findAll(page.nextPageable()) }
            else
                Mono.empty()
        }
        .flatMap { page -> Flux.fromIterable(page.content) }

result.subscribe(
    { page -> LOGGER.info("City ${page}") },
    { t -> t.printStackTrace() }
)

Here the first page of results expands to the second page, the second page to the third page and so on until there are no pages to retrieve.

Traversing a Tree

Consider a node in a tree structure represented by the following model:

data class Node(
    val id: String,
    val nodeRefs: List<String>,
)

A sample data which looks like this:

can be traversed using a call which looks like this:

val rootMono: Mono<Node> = nodeService.getNode("1")
val expanded: Flux<Node> = rootMono.expand { node ->
    Flux.fromIterable(node.childRefs)
        .flatMap { nodeRef -> nodeService.getNode(nodeRef) }
}
expanded.subscribe { node -> println(node) }

This is a breadth-first expansion, the output looks like this:

Node-1
Node-1-1
Node-1-2
Node-1-1-1
Node-1-1-2
Node-1-2-1
Node-1-2-2

An expandDeep variation would traverse it depth-first

↧

Processing SQS Messages using Spring Boot and Project Reactor

March 21, 2020, 10:09 pm

≫ Next: Processing SQS Messages using Spring Boot and Project Reactor - Part 2

≪ Previous: Project Reactor expand method

I recently worked on a project where I had to efficiently process a large number of messages streaming in through an AWS SQS Queue. In this post (and potentially one more), I will go over the approach that I took to process the messages using the excellent Project Reactor

The following is the kind of set-up that I am aiming for:

Setting up a local AWS Environment

Before I jump into the code, let me get some preliminaries out of the way. First, how do you get a local version of SNS and SQS. One of the easiest ways is to use localstack. I use a docker-compose version of it described here

The second utility that I will be using is the AWS CLI. This website has details on how to install it locally.

Once both of these utilities are in place, a quick test should validate the setup:

# Create a queue
aws --endpoint http://localhost:4576 sqs create-queue --queue-name test-queue

# Send a sample message
aws --endpoint http://localhost:4576 sqs send-message --queue-url http://localhost:4576/queue/test-queue --message-body "Hello world"

# Receive the message
aws --endpoint http://localhost:4576 sqs receive-message --queue-url http://localhost:4576/queue/test-queue

Basics of Project Reactor

Project Reactor implements the Reactive Streams specification and provides a way of handling streams of data across asynchronous boundaries that respects backpressure. A lot of words here but in essence think of it this way:
1. SQS Produces data
2. The application is going to consume and process it as a stream of data
3. The application should consume data at a pace that is sustainable - too much data should not be pumped in. This is formally referred to as "Backpressure"

AWS SDK 2

The library that I will be using to consume AWS SQS data is the AWS SDK 2. The library uses non-blocking IO under the covers.

The library offers both a sync version of making calls as well as an async version. Consider the synchronous way to fetch records from an SQS queue:

import software.amazon.awssdk.services.sqs.model.ReceiveMessageRequest
import software.amazon.awssdk.services.sqs.SqsClient

val receiveMessageRequest: ReceiveMessageRequest = ReceiveMessageRequest.builder()
    .queueUrl(queueUrl)
    .maxNumberOfMessages(5)
    .waitTimeSeconds(10)
    .build()

val messages: List<Message> = sqsClient.receiveMessage(receiveMessageRequest).messages()

Here "software.amazon.awssdk.services.sqs.SqsClient" is being used for querying sqs and retrieving a batch of results synchronously. An async result, on the other hand, looks like this:

val receiveMessageRequest: ReceiveMessageRequest = ReceiveMessageRequest.builder()
    .queueUrl(queueUrl)
    .maxNumberOfMessages(5)
    .waitTimeSeconds(10)
    .build()

val messages: CompletableFuture<List<Message>> = sqsAsyncClient
    .receiveMessage(receiveMessageRequest)
    .thenApply { result -> result.messages() }

The output now is now a "CompletableFuture"

Infinite loop and no backpressure

My first attempt at creating a stream(Flux) of message is fairly simple - an infinite loop that polls AWS sqs and creates a Flux from it using the "Flux.create" operator, this way:

fun listen(): Flux<Pair<String, () -> Unit>> {
    return Flux.create { sink: FluxSink<List<Message>> ->
            while (running) {
                try {
                    val receiveMessageRequest: ReceiveMessageRequest = ReceiveMessageRequest.builder()
                        .queueUrl(queueUrl)
                        .maxNumberOfMessages(5)
                        .waitTimeSeconds(10)
                        .build()

                    val messages: List<Message> = sqsClient.receiveMessage(receiveMessageRequest).messages()
                    LOGGER.info("Received: $messages")
                    sink.next(messages)
                } catch (e: InterruptedException) {
                    LOGGER.error(e.message, e)
                } catch (e: Exception) {
                    LOGGER.error(e.message, e)
                }
            }
        }
        .flatMapIterable(Function.identity())
        .doOnError { t: Throwable -> LOGGER.error(t.message, t) }
        .retry()
        .map { snsMessage: Message ->
            val snsMessageBody: String = snsMessage.body()
            val snsNotification: SnsNotification = readSnsNotification(snsMessageBody)
            snsNotification.message to { deleteQueueMessage(snsMessage.receiptHandle(), queueUrl) }
        }
}

The way this works is that there is an infinite loop that checks for new messages using long-polling. Messages may not be available at every poll, in which case an empty list is added to the stream.

This list of atmost 5 messages is then mapped to a stream of individual messages using the "flatMapIterable" operator, which is further mapped by extracting the message from the SNS wrapper (as message gets forwarded from SNS to SQS, SNS adds a wrapper to the message) and a way to delete the message(deleteHandle) once the message is successfully processed is returned as Pair.

This approach works perfectly fine... but imagine a case where a huge number of messages have come in, since the loop is not really aware of the throughput downstream it will keep pumping data to the stream. The default behavior is for the intermediate operators to buffer this data flowing in based on how the final consumer is consuming the data. Since this buffer is unbounded it is possible that the system may reach an unsustainable state.

Backpressure aware stream

The fix is to use a different operator to generate the stream of data - Flux.generate.
Using this operator the code looks like this:

fun listen(): Flux<Pair<String, () -> Unit>> {
    return Flux.generate { sink: SynchronousSink<List<Message>> ->
            val receiveMessageRequest: ReceiveMessageRequest = ReceiveMessageRequest.builder()
                .queueUrl(queueUrl)
                .maxNumberOfMessages(5)
                .waitTimeSeconds(10)
                .build()

            val messages: List<Message> = sqsClient.receiveMessage(receiveMessageRequest).messages()
            LOGGER.info("Received: $messages")
            sink.next(messages)
        }
        .flatMapIterable(Function.identity())
        .doOnError { t: Throwable -> LOGGER.error(t.message, t) }
        .retry()
        .map { snsMessage: Message ->
            val snsMessageBody: String = snsMessage.body()
            val snsNotification: SnsNotification = readSnsNotification(snsMessageBody)
            snsNotification.message to { deleteQueueMessage(snsMessage.receiptHandle(), queueUrl) }
        }
}

The way this works is that the block passed to the "Flux.generate" operator is repeatedly called - similar to the while loop, in each loop one item is expected to be added to the stream. In this instance, the item added to the stream happens to be a list which like before is broken down into individual messages.

How does backpressure work in this scenario -
So again consider the case where the downstream consumer is processing at a slower rate than the generating end. In this case, Flux itself would slow down at the rate at which the generate operator is called, thus being considerate of the throughput of the downstream system.

Conclusion

This should set up a good pipeline for processing messages from SQS, there are a few more nuances to this to process messages in parallel later in the stream which I will cover in a future post.

The codebase of this example is available in my github repository here - https://github.com/bijukunjummen/boot-with-sns-sqs. The code has a complete pipeline which includes processing the message and deleting it once processed.

↧

Processing SQS Messages using Spring Boot and Project Reactor - Part 2

April 8, 2020, 9:25 pm

≫ Next: AWS DynamoDB version field using AWS SDK for Java 2

≪ Previous: Processing SQS Messages using Spring Boot and Project Reactor

This is a follow up to my blog post about processing SQS messages efficiently using Spring Boot and Project Reactor

There are a few gaps in the approach that I have listed in the first part.

1. Handling failures in SQS Client calls
2. The approach would process only 1 message from SQS at a time, how can it be parallelized
3. It does not handle errors, any error in the pipeline would break the entire process and stop reading newer messages from the queue.

Recap

Just to recap, the previous post demonstrates creating a pipeline to process messages from an AWS SQS Queue using the excellent Project Reactor

The end result of that exercise was a pipeline which looks like this:

Given this pipeline, let me now go over how to bridge the gaps:

Handling SQS Client Failures

This is the function that generates the stream of messages read from SQS.

Flux.generate { sink: SynchronousSink<List<Message>> ->
    val receiveMessageRequest: ReceiveMessageRequest = ReceiveMessageRequest.builder()
        .queueUrl(queueUrl)
        .maxNumberOfMessages(5)
        .waitTimeSeconds(10)
        .build()

    val messages: List<Message> = sqsClient.receiveMessage(receiveMessageRequest).messages()
    sink.next(messages)
}
    .flatMapIterable(Function.identity())

Now consider a case where the "sqsClient" above has a connectivity issue, the behavior with Flux is that in case of an error the stream is terminated. This, of course, will not do for a service whose job is to process messages as long the service is running.

The fix is to simply retry the processing flow in case of errors.

Flux.generate { sink: SynchronousSink<List<Message>> ->
    val receiveMessageRequest: ReceiveMessageRequest = ReceiveMessageRequest.builder()
        .queueUrl(queueUrl)
        .maxNumberOfMessages(5)
        .waitTimeSeconds(10)
        .build()

    val messages: List<Message> = sqsClient.receiveMessage(receiveMessageRequest).messages()
    sink.next(messages)
}
    .flatMapIterable(Function.identity())
    .retry()

This would result in Flux re-establishing the stream of messages in case of any errors up to this point.

Processing Messages in Parallel

Project Reactor provides a few ways of parallelizing a processing pipeline. My first attempt at processing in parallel was to add a "subscribeOn" method to the processing chain.

Flux.generate { sink: SynchronousSink<List<Message>> ->
    val receiveMessageRequest: ReceiveMessageRequest = ReceiveMessageRequest.builder()
        .queueUrl(queueUrl)
        .maxNumberOfMessages(5)
        .waitTimeSeconds(10)
        .build()

    val messages: List<Message> = sqsClient.receiveMessage(receiveMessageRequest).messages()
    sink.next(messages)
}
    .flatMapIterable(Function.identity())
    .retry()
    .subscribeOn(Schedulers.newElastic("sub"))

However, this is not quite how "subscribeOn" works. An output when I send a few messages to this pipeline is the following:

2020-04-07 20:52:53.241  INFO 1137 --- [          sub-3] sample.msg.MessageListenerRunner         : Processed Message hello
2020-04-07 20:52:53.434  INFO 1137 --- [          sub-3] sample.msg.MessageListenerRunner         : Processed Message hello
2020-04-07 20:52:53.493  INFO 1137 --- [          sub-3] sample.msg.MessageListenerRunner         : Processed Message hello
2020-04-07 20:52:53.538  INFO 1137 --- [          sub-3] sample.msg.MessageListenerRunner         : Processed Message hello
2020-04-07 20:52:53.609  INFO 1137 --- [          sub-3] sample.msg.MessageListenerRunner         : Processed Message hello
2020-04-07 20:52:53.700  INFO 1137 --- [          sub-3] sample.msg.MessageListenerRunner         : Processed Message hello

The "sub-3" above is the name of the thread processing the message, and it looks like all the messages are getting processed on the "sub-3" thread and on no other threads!

subscribeOn simply changes the execution context by borrowing "a thread" from this scheduler pool and does not use all the threads in the pool itself.

So how can the processing be parallelized? This StackOverflow answer provides a very good approach that I am using here, essentially to use a flatMap operator and adding the "subscribeOn" operator inside the "flatMap" operator.

This operator eagerly subscribes to its inner publishers and then flattens the result, the trick is that the inner subscribers can be provided their own schedulers and for each subscription will end up using a thread from the scheduler pool. The number of these concurrent subscribers can be controlled using a "concurrency" parameter passed to the flatMap operator.

Flux.generate { sink: SynchronousSink<List<Message>> ->
    val receiveMessageRequest: ReceiveMessageRequest = ReceiveMessageRequest.builder()
        .queueUrl(queueUrl)
        .maxNumberOfMessages(5)
        .waitTimeSeconds(10)
        .build()

    val messages: List<Message> = sqsClient.receiveMessage(receiveMessageRequest).messages()
    sink.next(messages)
}
    .flatMapIterable(Function.identity())
    .retry()
    .flatMap({ (message: String, deleteHandle: () -> Unit) ->
        task(message)
            .then(Mono.fromSupplier { Try.of { deleteHandle() } })
            .then()
            .subscribeOn(taskScheduler)
    }, concurrency)

and an output when processing multiple messages looks like this -

2020-04-08 21:03:24.582  INFO 17541 --- [  taskHandler-4] sample.msg.MessageListenerRunner         : Processed Message hello
2020-04-08 21:03:24.815  INFO 17541 --- [  taskHandler-4] sample.msg.MessageListenerRunner         : Processed Message hello
2020-04-08 21:03:24.816  INFO 17541 --- [  taskHandler-5] sample.msg.MessageListenerRunner         : Processed Message hello
2020-04-08 21:03:24.816  INFO 17541 --- [  taskHandler-6] sample.msg.MessageListenerRunner         : Processed Message hello
2020-04-08 21:03:24.816  INFO 17541 --- [  taskHandler-7] sample.msg.MessageListenerRunner         : Processed Message hello
2020-04-08 21:03:24.817  INFO 17541 --- [  taskHandler-8] sample.msg.MessageListenerRunner         : Processed Message hello

see how there are more than thread name (taskHandler-*) in the logs now!

Handling downstream errors

One of my previous fixes with "retry" operator was about handling upstream errors with sqsClient connectivity. However, it is possible that as messages are being processed in the pipeline and any of the steps throw an error then the entire pipeline would fail. So it is important to guard EVERY step against failure. A neat way that I have been ensuring that errors don't propagate out is to use the excellent vavr library and its "Try" type. Try type holds two outcomes - a successful one(Success) or an Exception(Failure). This allows the rest of the pipeline to act on the outcome of the previous step in a measured way:

.flatMap({ (message: String, deleteHandle: () -> Unit) ->
    task(message)
        .then(Mono.fromSupplier { Try.of { deleteHandle() } })
        .doOnNext { t ->
            t.onFailure { e -> LOGGER.error(e.message, e) }
        }
        .then()
        .subscribeOn(taskScheduler)
}, concurrency)

The above snippet demonstrates an approach where I know that "deleteHandle" which is responsible for deleting a message can throw an exception, Try captures this and if there is an error logs it and this way the exception does not short circuit the flow of messages.

Conclusion

My initial thinking was that just because I have taken a reactive approach to process messages I would get huge boost in my sqs message processing pipeline, however, my learning has been that just like everything else it requires careful understanding and tuning for a Project reactor based stream to process messages efficiently. I am sure there a few more lessons for me to learn and I will be documenting those as I do.

This entire sample is available in my github repository here - https://github.com/bijukunjummen/boot-with-sns-sqs

↧

AWS DynamoDB version field using AWS SDK for Java 2

May 25, 2020, 6:39 am

≫ Next: Expressing a conditional expression using Json - Java Implementation

≪ Previous: Processing SQS Messages using Spring Boot and Project Reactor - Part 2

It is useful to have a version attribute on any entity saved to an AWS DynamoDB database which is simply a numeric indication of the number of times the entity has been modified. When the entity is first created it can be set to 1 and then incremented on every update.

The benefit is immediate - an indicator of the number of times an entity has been modified which can be used for auditing the entity. Also, an additional use is for optimistic locking where an entity is allowed to be updated only if the holder updating it has the right version of the entity.

This post will go into details of how to introduce such a field with the DynamoDB related libraries of AWS SDK 2

Model

Consider a model called Hotel which is being persisted into a dynamo database. In Kotlin, it can be represented using the following data class:

data class Hotel(
    val id: String = UUID.randomUUID().toString(),
    val name: String,
    val address: String? = null,
    val state: String? = null,
    val zip: String? = null,
    val version: Long = 1L
)

A version field has been introduced in this model with an initial value of 1. The aim will be to save this field as-is and then let dynamo atomically manage the increment of this field at the point of saving this entity.

As the fields in this model gets changed, I would like the version to be updated along these lines:

Local version of DynamoDB

It is useful to have DynamoDB running on the local machine, this way not having to create real DynamoDB tables in AWS.

There are multiple ways of doing this. One is to use a docker version of DynamoDB Local, which can be started up the following way to listen on port 4569:

docker run -p 4569:8000 amazon/dynamodb-local:1.13

My personal preference is to use localstack and the instructions at the site have different ways to start it up. I normally use docker-compose to bring it up. One of the reasons to use localstack over DynamoDB Local is that localstack provides a comprehensive set of AWS services for local testing and not just DynamoDB.

Quick Demo

I have the entire code available in my github repo here - https://github.com/bijukunjummen/boot-with-dynamodb

Once the application is brought up using the local version of dynamoDB, an entity can be created using the following httpie request:

http :9080/hotels id=4 name=name address=address zip=zip state=OR

With a response, where the version field is set to 1:

{
"address": "address",
"id": "4",
"name": "name",
"state": "OR",
"version": 1,
"zip": "zip"
}

Then if the name is updated for the entity:

http PUT :9080/hotels/4 name=name1 address=address zip=zip state=OR version=1

the version field gets updated to 2 and so on:

{
"address": "address",
"id": "4",
"name": "name1",
"state": "OR",
"version": 2,
"zip": "zip"
}

Also note that if during an update a wrong version number is provided, the call would fail as there is an optimistic locking in place using this version field.

Implementing the version field

Implementing the version field depends on the powerful UpdateItem API provided by DynamoDB. One of the features of UpdateItem API is that it takes in a "UpdateExpression" which is a dsl which shows how different Dynamo attributes should be updated.

The raw request to AWS DynamoDB looks like this:

{
"TableName": "hotels",
"Key": {
"id": {
"S": "1"
    }
  },
"UpdateExpression": "\nSET #name=:name,\n #state=:state,\naddress=:address,\nzip=:zip\nADD version :inc\n ",
"ExpressionAttributeNames": {
"#state": "state",
"#name": "name"
  },
"ExpressionAttributeValues": {
":name": {
"S": "testhotel"
    },
":address": {
"S": "testaddress"
    },
":state": {
"S": "OR"
    },
":zip": {
"S": "zip"
    },
":inc": {
"N": "1"
    }
  }
}

From the articles perspective, specifically focus on "ADD version :inc", which is an expression that tells AWS DynamoDB to increment the value of version by ":inc" value, which is provided separately using "ExpressionAttributeValues" with "1". Dealing with raw API in its json form is daunting, that is where the Software Development Kit(SDK) that AWS provides comes in, AWS SDK for Java 2 is a rewrite of AWS SDK's with a focus on using the latest Java features and Non-Blocking IO over the wire. Using AWS SDK for Java 2, an "UpdateItem" looks like this(using Kotlin code):

val updateItemRequest = UpdateItemRequest.builder()
    .tableName(TABLE_NAME)
    .key(
        mapOf(
            ID to AttributeValue.builder().s(hotel.id).build()
        )
    )
    .updateExpression(
"""
        SET #name=:name,
        #state=:state,
        address=:address,
        zip=:zip 
        ADD version :inc
"""
    )
    .conditionExpression("version = :version")
    .expressionAttributeValues(
        mapOf(
":${NAME}" to AttributeValue.builder().s(hotel.name).build(),
":${ZIP}" to AttributeValue.builder().s(hotel.zip).build(),
":${STATE}" to AttributeValue.builder().s(hotel.state).build(),
":${ADDRESS}" to AttributeValue.builder().s(hotel.address).build(),
":${VERSION}" to AttributeValue.builder().n(hotel.version.toString()).build(),
":inc" to AttributeValue.builder().n("1").build()
        )
    )
    .expressionAttributeNames(
        mapOf(
"#name" to "name",
"#state" to "state"
        )
    )
    .build()

val updateItem: CompletableFuture<UpdateItemResponse> = dynamoClient.updateItem(updateItemRequest)

return Mono.fromCompletionStage(updateItem)
    .flatMap {
        getHotel(hotel.id)
    }

The highlighted line has the "Update Expression" with all the existing fields set to a new value and the version attribute incremented by 1. Another thing to note about this call is the "conditionExpression", which is essentially a way to tell DynamoDB to update the attributes if a condition matches up, in this specific instance if the existing value of version matches up. This provides a neat way to support optimistic locking on the record.

Conclusion

A lot of details here - the easiest way to get a feel for it is by trying out the code which is available in my github repository here - https://github.com/bijukunjummen/boot-with-dynamodb. The readme has good details on how to run it in a local environment.

AWS DynamoDB provides a neat way to manage a version field on entities, ensuring that they are atomically updated and provides a way for them to used for optimistic locking

↧

Expressing a conditional expression using Json - Java Implementation

July 4, 2020, 4:52 pm

≫ Next: Backpressure in Project Reactor

≪ Previous: AWS DynamoDB version field using AWS SDK for Java 2

I had a need recently to express a conditional expression in a form that a front end Javascript application and a backend Java application could both create and read. Expressing the conditional expression as a Json felt logical and after a quick search, JsonLogic library appeared to fit exactly what I was looking for.

JsonLogic follows a prefix notation for its expressions, along these lines:

{"operator" : ["values" ... ]}

So for eg, given a JSON input data that looks like this:

{
"a": "hello",
"b": 1,
"c": [
"elem1",
"elem2"
  ]
}

For equality, an expression using JsonLogic is the following:

{"==" : [ { "var" : "a" }, "hello" ] }

Here the data is being looked up using "var" expression and the equality is checked using the "==" operator.

Though it is a good fit, I decided to go with an alternate way of expressing the conditional expression but heavily inspired by JsonLogic. So, in my implementation, equality with the sample JSON looks like this:

{
"equals": [
"/a", "hello"
    ]
}

Fairly similar, the location to the data is expressed as a Json Pointerand the operators are textual ("equals" vs "==") The full set of supported features are also much smaller than JsonLogic as that sufficed my needs for the project. So now I have a small Java-based library that supports these simplified conditional expressions and this post will go into the details of the operators and the usage of the library.

Sample Expressions

Once more to touch on the sample conditional expressions, everything takes the form of:

{"operator" : ["values" ... ]}

A check for equality looks like this:

{
"equals": [
"/a", "hello"
    ]
}

Not operator:

{
"not": [
        {
"equals": [
"/a",
"hello"
            ]
        }
    ]
}

And/Or operator:

{
"and": [
        {
"equal": [
"/a", "hello"
            ]
        },
        {
"equal": [
"/b", 1
            ]
        }
    ]
}

There are a few operators that work on collections, for eg, to check if "c" in the sample JSON has elements "elem1", "elem2":

{
"contains": [
"/c", ["elem1", "elem2"]
    ]
}

or to check if the collection has any of the elements "elem1", "elem2":

{
"containsAnyOf": [
"/c", ["elem1", "elem2"]
    ]
}

Details of the Library

The Java-based library is built on top of the excellent Jackson JSON parser library and uses it to parse the expression which once parsed is interpreted by the library. A Gradle based project can pull in the dependency the following way(published currently to JCenter):

implementation 'com.github.bijukunjummen:json-conditional-expression:0.4.0'

and use the library along these lines, using a sample Kotlin code:

val jsonExpressionEvaluator: JsonExpressionEvaluator = JsonExpressionEvaluator(ObjectMapper())
jsonExpressionEvaluator.matches(expression, json) //returns true

Conclusion

The conditional expression and the corresponding Java-based interpreter are fairly simple and have sufficed my needs with the kind of operators that I needed for my project, however, I will be more than happy to extend and support more extensive operators if there is enough interest in using the library.

Reference

1. JsonLogic which provided the inspiration for using a prefix notation to represent a conditional expression

2. The project resides in GitHub here - https://github.com/bijukunjummen/json-conditional-expression

↧

Backpressure in Project Reactor

July 26, 2020, 4:51 pm

≫ Next: An experiment with Little's Law

≪ Previous: Expressing a conditional expression using Json - Java Implementation

Project Reactor implements the Reactive Streams specification, which is a standard for asynchronously processing a stream of data while respecting the processing capabilities of a consumer.

At a very broad level, there are two entities involved, a Producer that produces the stream of data and a Consumer that consumes data. If the rate at which a Consumer consumes data is less than the rate at which a Producer produces data (referred to as a Fast Producer/Slow Consumer), then signals from the consumer can constrain the rate of production, this is referred to as Backpressure and in this post, I will be demonstrating a few backpressure examples using Project Reactor.

Before I go ahead, I have to acknowledge that these examples are loosely based on what I learned from the "Reactive Programming with RxJava" book.

Producer

Flux in Project Reactor represents an asynchronous stream of 0..N data, where N can potentially be infinite.

Consider a simple example, generating a sequence of numbers. There are built-in ways in Flux to do this, but for the example, I will be using an operator called Flux.generate. Sample code looks like this:

fun produce(targetRate: Int, upto: Long): Flux<Long> {
    val delayBetweenEmits: Long = 1000L / targetRate

    return Flux.generate(
        { 1L },
        { state: Long, sink: SynchronousSink<Long> ->
            sleep(delayBetweenEmits)
            val nextState: Long = state + 1
            if (state > upto) {
                sink.complete()
                nextState
            } else {
                LOGGER.info("Emitted {}", state)
                sink.next(state)
                nextState
            }
        }
    )
}

Here "targetRate" is the rate per second at which the Producer is expected to produce a sequence of numbers and "upto" represents the range for which the sequence is to be generated. "Thread.sleep" is used for introducing the delay between emissions.

Consumer

A consumer for this stream of data just consumes the sequence of numbers and to simulate processing while consuming the data, delays are again introduced just before reading the information, along these lines:

val delayBetweenConsumes: Long = 1000L / consumerRate
producer.produce(producerRate, count)
    .subscribe { value: Long ->
        sleep(delayBetweenConsumes)
        logger.info("Consumed {}", value)
    }

Just like with rate at the Producer side, there is a rate of consuming on the consumer side which drives the delay before consuming the data.

Scenario 1: Fast Producer, Slow Consumer without Threading

Now that I have a stream of data for which I can control the rate of production and rate of consumption, the first test that I ran was with the producer and the consumer chained together.

The Producer produces at the rate of 100 requests a second and the consumer consuming it at 3 per second.

If there were no backpressure mechanisms in place you would expect that Producer would merrily go along and produce all the records at its own pace of 100 per second and Consumer would slowly catch up at the rate of 3 per second. This is NOT what happens though.

The reason is not that intuitive I feel, it is not really backpressure coming into play either. The Producer is constrained to 10 requests per second merely because the entire flow from the Producer to the Consumer is synchronous by default and since the production and the consumption are happening on the same thread, the behavior is automatically constrained to what the Consumer is comfortable in consuming.

Here is a graph which simply plots the rate of production and consumption over time and captures clearly the exact same rate of Production and Consumption throughout:

This behavior is borne out from the logs also, which show that the consumer and producer remain in sync:

2020-07-26 17:51:58.712  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 84
2020-07-26 17:51:59.048  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 84
2020-07-26 17:51:59.059  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 85
2020-07-26 17:51:59.393  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 85
2020-07-26 17:51:59.404  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 86
2020-07-26 17:51:59.740  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 86
2020-07-26 17:51:59.751  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 87
2020-07-26 17:52:00.084  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 87
2020-07-26 17:52:00.095  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 88
2020-07-26 17:52:00.430  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 88
2020-07-26 17:52:00.441  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 89
2020-07-26 17:52:00.777  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 89
2020-07-26 17:52:00.788  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 90
2020-07-26 17:52:01.087  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 90
2020-07-26 17:52:01.097  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 91
2020-07-26 17:52:01.432  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 91
2020-07-26 17:52:01.442  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 92
2020-07-26 17:52:01.777  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 92
2020-07-26 17:52:01.788  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 93
2020-07-26 17:52:02.123  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 93
2020-07-26 17:52:02.133  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 94
2020-07-26 17:52:02.467  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 94
2020-07-26 17:52:02.478  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 95
2020-07-26 17:52:02.813  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 95
2020-07-26 17:52:02.824  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 96
2020-07-26 17:52:03.157  INFO 1 --- [pool-1-thread-1] sample.meter.Consumer                    : Consumed 96
2020-07-26 17:52:03.168  INFO 1 --- [pool-1-thread-1] sample.meter.Producer                    : Emitted 97

Scenario 2: Fast Producer, Slow Consumer with Threading

The second scenario that I considered was with the Producer and the Consumer being produced independently in different threads.

Project reactor makes this possible through two operators subscribeOn() which changes the thread where in my case the Producer produces the sequence and a publishOn() which shifts the consumption to a different thread.

With these in place, the code looks like this:

producer.produce(producerRate, count)
    .subscribeOn(subscribeOnScheduler)
    .publishOn(publishOnScheduler)
    .subscribe { value: Long ->
        sleep(delayBetweenConsumes)
        logger.info("Consumed {}", value)
    }

The results were a little surprising, this is what I saw in the logs:

...
2020-07-26 18:42:41.774  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 252
2020-07-26 18:42:41.786  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 253
2020-07-26 18:42:41.797  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 254
2020-07-26 18:42:41.809  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 255
2020-07-26 18:42:41.819  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 256
2020-07-26 18:42:42.019  INFO 1 --- [      publish-2] sample.meter.Consumer                    : Consumed 9
2020-07-26 18:42:42.354  INFO 1 --- [      publish-2] sample.meter.Consumer                    : Consumed 10
2020-07-26 18:42:42.689  INFO 1 --- [      publish-2] sample.meter.Consumer                    : Consumed 11
2020-07-26 18:42:43.024  INFO 1 --- [      publish-2] sample.meter.Consumer                    : Consumed 12
2020-07-26 18:42:43.358  INFO 1 --- [      publish-2] sample.meter.Consumer                    : Consumed 13
2020-07-26 18:42:43.691  INFO 1 --- [      publish-2] sample.meter.Consumer                    : Consumed 14
2020-07-26 18:42:44.027  INFO 1 --- [      publish-2] sample.meter.Consumer                    : Consumed 15
2020-07-26 18:42:44.363  INFO 1 --- [      publish-2] sample.meter.Consumer                    : Consumed 16
.....
2020-07-26 18:43:43.724  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 299
2020-07-26 18:43:43.735  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 300
2020-07-26 18:43:43.913  INFO 1 --- [      publish-2] sample.meter.Consumer                    : Consumed 194
2020-07-26 18:43:44.248  INFO 1 --- [      publish-2] sample.meter.Consumer                    : Consumed 195
2020-07-26 18:43:44.581  INFO 1 --- [      publish-2] sample.meter.Consumer                    : Consumed 196
...

A sequence of numbers upto 256 was produced immediately and then the Producer waited for the Consumer to catch up, once the consumer caught up, the remaining emissions happened. This is how the graph for this looks:

Clearly, backpressure is acting on this stream of data. The surprising aspect for me was the backpressure appeared to be triggering at a large value of 256 records from upstream.

Analyzing this is a little, the reason I realized is that an intermediate operation is buffering the requests. The intermediate operation in this instance happens to be the "publishOn()" operator that I am using, a variant of "publishOn()" which additionally takes in a prefetch parameter fixes the size of the buffer.

In my case setting it to 10 felt reasonable, the code looks like this now:

producer.produce(producerRate, count)
    .subscribeOn(subscribeOnScheduler)
    .publishOn(publishOnScheduler, 10)
    .subscribe { value: Long ->
        sleep(delayBetweenConsumes)
        logger.info("Consumed {}", value)
    }

and the graph with the Producer and Consumer remains closely in sync:

Producer In Green, Consumer in Red

Scenario 3: Fast Producer, Multi-threaded Consumer

If you look closely at the name of the threads in logs from the first two scenarios then you would notice that the names of the thread at the point of production and at the point of consumption are always the same. The operators "publishOn()" and "subscribeOn()" don't parallelize the operation, they only switch the execution context of the operations. To really parallelize the operations, two approaches can be taken:

Using the parallel operator
Using flatMap flavors with their own "subscribeOn" operators

For the 3rd scenario, I went for the second option of using flatMap and it looks something like this:

producer.produce(producerRate, count)
    .subscribeOn(subscribeOnScheduler)
    .publishOn(publishOnScheduler, 10)
    .flatMap({ value: Long ->
        Mono.fromSupplier {
            sleep(delayBetweenConsumes)
            logger.info("Consumed {}", value)
            null
        }.subscribeOn(flatMapScheduler)
    }, concurrency)
    .subscribe()

The work of consuming the produced sequence of numbers is being done inside the flatMap operation, the number of concurrent consumption is set to 5 by default. Running this scenario produces the following logs, the consumers are now running 5 at a time on multiple threads:

2020-07-26 23:26:27.212  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 1
2020-07-26 23:26:27.321  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 2
2020-07-26 23:26:27.423  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 3
...
2020-07-26 23:26:28.040  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 9
2020-07-26 23:26:28.143  INFO 1 --- [    subscribe-3] sample.meter.Producer                    : Emitted 10
2020-07-26 23:26:28.222  INFO 1 --- [      flatMap-4] sample.meter.Consumer                    : Consumed 1
2020-07-26 23:26:28.328  INFO 1 --- [      flatMap-5] sample.meter.Consumer                    : Consumed 2
2020-07-26 23:26:28.428  INFO 1 --- [      flatMap-6] sample.meter.Consumer                    : Consumed 3
2020-07-26 23:26:28.527  INFO 1 --- [      flatMap-7] sample.meter.Consumer                    : Consumed 4
...

The rate of production lines up with the rate of consumption

Producer In Green/Consumer in Red

Conclusion

These are different experiments that I was able to run to simulate backpressure scenarios with Project Reactor and the behavior should be true for most Reactive Streams based libraries.

Project Reactor has sane defaults in managing the backpressure needs of a Consumer and provides ways to override the defaults.

In all scenarios that I have run in this post, the Producer throttled the production to a rate that the Consumer was comfortable in consuming.

If you are interested in exploring the scenarios further, my codebase along with the grafana/prometheus set up for graphing the output is available in my github repository here - https://github.com/bijukunjummen/backpressure-demo

↧

An experiment with Little's Law

October 13, 2020, 6:43 am

≫ Next: Permutation - Heap's Algorithm

≪ Previous: Backpressure in Project Reactor

My previous blog post about Project Reactor and Backpressure was about how Project Reactor provides sane defaults for scenarios where a lot more information is produced than a Consumer can consume. It does this by throttling the Producer such that in a steady-state the Producer's rate of production matches the Consumer's rate of consumption. This throttling is referred to as Backpressure.

For a stream of data, ultimately the Producer and the Consumer reach a steady state where the Producer is producing at a rate that the consumer is comfortable consuming.

So now in this stable system, how many items are in the system at any point in time. I decided to simply collect metrics for this information, by keeping a counter which is incremented and decremented as items are added into the system by the Producer and later consumed and removed by the Consumer.

Little's Law

This is a well-known problem however and I realized through more reading that this need not be measured but instead can be calculated using Little's Law defined the following way:

L=\lambda W

For my case, this maps to :

L - Number of Items in the system

λ - Consumption or Production Rate

W - Average Time spent in the System

Of course, it does require measuring the Consumption rate and the Average Time spent in the system though! The point is knowing any 2 of the values, the remaining value can be easily calculated.

Experiment

The exercise that I next performed was to compare the values that I measured in the system against the value calculated by Little's Law. No surprises here, the value measured by the system closely matches Little's law!

Let me consider a few of these scenarios here. To recap, my simple application consists of a Producer that can produce a sequence of numbers at a pre-defined rate. These sequences of numbers are consumed by a set of consumers.

Scenario 1: Unstable system with a high Producer rate

The first scenario that I considered is for a case where the Little's law actually is not supposed to work effectively, this is for an unstable system where the Producer and the Consumer produce and consume at a different rate. In my example the Producer produces a large amount of data (256 records at a time at the rate of 10 per second) and waits for the Consumer's to catch up at the rate of 4 per second) and then produces the next amount of data. You can imagine that a lot of data will be buffered in the system and the L value will be high.

A graph of the calculated(in Yellow) and measured L(in Green) value shows up the following way:

The L is around 150, so 150 records are in the system. Although this is not a stable system, the calculated L value matches the measured L value fairly well.

Scenario 2: Stable System with similar Producer and Consumer rate

Little's law shines for a stable system. Consider the following scenario where the Producer and Consumer rate matches up. A graph now looks like this:

This time the measured L value lines up perfectly with the calculated value, thus proving the utility of Little's law.

Conclusion

There is nothing much to conclude here, Little's law is a proven law, the interest for me was in observing how well it pans out with an experiment. Personally, it has been satisfying to see a law line up against an experiment.

I have this entire set-up in a github repository here if you would like to replicate it.

↧

Permutation - Heap's Algorithm

November 14, 2020, 9:58 pm

≫ Next: AWS SDK 2 for Java and storing a Json in DynamoDB

≪ Previous: An experiment with Little's Law

This is a little bit of an experimentation that I did recently to figure out a reasonable code to get all possible permutations of a set of characters.

So say given a set of characters "ABC", my objective is to come up code which can spit out "ABC", "ACB", "BAC", "BCA", "CBA", "CAB".

The approach I took is to go with the definition of permutation itself, so with "ABCD" as the set of characters a 4 slot that needs to be filled.

The first slot can be filled by any of A, B, C, D, in 4 ways:

The second slot by any of the remaining 3 characters, so with "A" in the first slot -

The third slot by the remaining 2 characters, so with "A", "B" in the first two slots:

And finally, the fourth slot by the remaining 1 character, with say "A", "B", "C" in the first 3 slots:

In total, there would be 4 for the first slot * 3 for the 2nd slot * 2 for the 3rd slot * 1 for the 4th slot - 24 permutations altogether.

I can do this in place, using an algorithm that looks like this:

A trace of flow and the swaps is here:

The only trick here is that the code does all the holding of characters and getting it into the right place in place by swapping the right characters to the right place and restoring it at the end of it.

This works well for a reasonably sized set of characters - reasonable because for just 10 characters, there would be 3,628,800 permutations.

An algorithm that works even better, though a complete mystery to me how it actually functions(well explained here if anybody is interested), is the Heap's Algorithm. Here is a java implementation of it: It very efficiently does one swap per permutation, which is still high but better than the approach that I have described before.

In a sample perumutation of 8 characters, which generates 40320 permutations, the home cooked version swaps 80638 times, and the Heap's algorithm swaps 40319 times! thus proving its efficacy.

↧

AWS SDK 2 for Java and storing a Json in DynamoDB

November 30, 2020, 6:23 pm

≫ Next: Generating a stream of Fibonacci numbers

≪ Previous: Permutation - Heap's Algorithm

AWS DynamoDB is described as a NoSQL key-value and a document database. In my work I mostly use the key-value behavior of the database but rarely use the document database features, however the document database part is growing on me and this post highlights some ways of using the document database feature of DynamoDB along with introducing a small utility library built on top of AWS SDK 2.X for Java that simplifies using document database features of AWS DynamoDB

The treatment of the document database features will be very high level in this post, I will plan a follow up which goes into more details later

DynamoDB as a document database

So what does it mean for AWS DynamoDB to be treated as a document database. Consider a json representation of an entity, say something representing a Hotel:

{
"id": "1",
"name": "test",
"address": "test address",
"state": "OR",
"properties": {
"amenities":{
"rooms": 100,
"gym": 2,
"swimmingPool": true                    
        }                    
    },
"zip": "zip"
}

This json has some top level attributes like "id", a name, an address etc. But it also has a free form "properties" holding some additional "nested" attributes of this hotel.

A document database can store this document representing the hotel in its entirety OR can treat individual fields say the "properties" field of the hotel as a document.

A naive way to do this will be to simply serialize the entire content into a json string and store it in, say for eg, for the properties field transform into a string representation of the json and store in the database, this works, but there are a few issues with it.

None of the attributes of the field like properties can be queried for, say if I wanted to know whether the hotel has a swimming pool, there is no way just to get this information of of the stored content.
The attributes cannot be filtered on - so say if wanted hotels with atleast 2 gyms, this is not something that can be filtered down to.

A document database would allow for the the entire document to be saved, individual attributes, both top level and nested ones, to be queried/filtered on.

So for eg, in the example of "hotel" document the top level attributes are "id", "name", "address", "state", "zip" and the nested attributes are "properties.amenities.rooms", "properties.amenities.gym", "properties.amenities.swimmingPool" and so on.

AWS SDK 2 for DynamoDB and Document database support

If you are writing a Java based application to interact with a AWS DynamoDB database, then you would have likely used the new AWS SDK 2 library to make the API calls. However one issue with the library is that it natively does not support a json based document model. Let me go into a little more detail here.

From the AWS SDK 2 for AWS DynamoDB's perspective every attribute that is saved is an instance of something called an AttributeValue.

A row of data, say for a hotel, is a simple map of "attribute" names to Attribute values, and a sample code looks something like this:

val putItemRequest = PutItemRequest.builder()
    .tableName(TABLE_NAME)
    .item(
        mapOf(
            ID to AttributeValue.builder().s(hotel.id).build(),
            NAME to AttributeValue.builder().s(hotel.name).build(),
            ZIP to AttributeValue.builder().s(hotel.zip).build(),
            STATE to AttributeValue.builder().s(hotel.state).build(),
            ADDRESS to AttributeValue.builder().s(hotel.address).build(),
            PROPERTIES to objectMapper.writeValueAsString(hotel.properties),
            VERSION to AttributeValue.builder().n(hotel.version.toString()).build()
        )
    )
    .build()
dynamoClient.putItem(putItemRequest)

Here a map of each attribute to an AttributeValue is being created with an appropriate "type" of content, "s" indicates a string, "n" a number in the above sample.

There are other AttributeValue types like "m" representing a map and "l" representing a list.

The neat thing is that "m" and "l" types can have nested AttributeValues, which maps to a structured json document, however there is no simple way to convert a json to this kind of an Attribute Value and back.

So for eg. if I were to handle the raw "properties" of a hotel which understands the nested attributes, an approach could be this:

val putItemRequest = PutItemRequest.builder()
    .tableName(TABLE_NAME)
    .item(
        mapOf(
            ID to AttributeValue.builder().s(hotel.id).build(),
            NAME to AttributeValue.builder().s(hotel.name).build(),
            ZIP to AttributeValue.builder().s(hotel.zip).build(),
            STATE to AttributeValue.builder().s(hotel.state).build(),
            ADDRESS to AttributeValue.builder().s(hotel.address).build(),
            PROPERTIES to AttributeValue.builder()
                .m(
                    mapOf(
"amenities" to AttributeValue.builder()
                            .m(
                                mapOf(
"rooms" to AttributeValue.builder().n("200").build(),
"gym" to AttributeValue.builder().n("2").build(),
"swimmingPool" to AttributeValue.builder().bool(true).build()
                                )
                            )
                            .build()
                    )
                )
                .build(),
            VERSION to AttributeValue.builder().n(hotel.version.toString()).build()
        )
    )
    .build()

See how the nested attributes are being expanded out recursively.

Introducing the Json to AttributeValue utility library

This is exactly where the utility library that I have developed comes in.

Given a json structure as a Jackson JsonNode it converts the Json into an appropriately nested AttributeValue type and when retrieving back from DynamoDB, can convert the resulting nested AttributeValue type back to a json.

The structure would look exactly similar to the handcrafted sample shown before. So using the utility saving the "properties" would look like this:

val putItemRequest = PutItemRequest.builder()
    .tableName(TABLE_NAME)
    .item(
        mapOf(
            ID to AttributeValue.builder().s(hotel.id).build(),
            NAME to AttributeValue.builder().s(hotel.name).build(),
            ZIP to AttributeValue.builder().s(hotel.zip).build(),
            STATE to AttributeValue.builder().s(hotel.state).build(),
            ADDRESS to AttributeValue.builder().s(hotel.address).build(),
            PROPERTIES to JsonAttributeValueUtil.toAttributeValue(hotel.properties),
            VERSION to AttributeValue.builder().n(hotel.version.toString()).build()
        )
    )
    .build()
dynamoClient.putItem(putItemRequest)

and when querying back from DynamoDB, the resulting nested AttributeValue converted back to a json this way(Kotlin code in case you are baffled by the "?let"):

properties = map[PROPERTIES]?.let { attributeValue ->
    JsonAttributeValueUtil.fromAttributeValue(
        attributeValue
    )
} ?: JsonNodeFactory.instance.objectNode()

The neat thing is even the top level attributes can be generated given a json representing the entire Hotel type. So say a json representing a Hotel is provided:

val hotel = """
    {
"id": "1",
"name": "test",
"address": "test address",
"state": "OR",
"properties": {
"amenities":{
"rooms": 100,
"gym": 2,
"swimmingPool": true                    
            }                    
        },
"zip": "zip"
    }
""".trimIndent()
val attributeValue = JsonAttributeValueUtil.toAttributeValue(hotel, objectMapper)
dynamoDbClient.putItem(
    PutItemRequest.builder()
            .tableName(DynamoHotelRepo.TABLE_NAME)
            .item(attributeValue.m())
            .build()
    )

Using the Library

The utility library is available here - https://github.com/bijukunjummen/aws-sdk2-dynamo-json-helper and provides details of how to get the binaries in place and use it with code.

Conclusion

AWS SDK 2 is an excellent and highly performant client, providing non-blocking support for client calls. I like how it provides a synchronous API and an asynchronous API and remains highly opionionated in consistenly providing a low level client API for calling the different AWS services. This utlility library provides a nice bridge for AWS SDK 2 to remain low level but be able to manage a json based document persistence and back. All the samples in this post are available in my github repository here - https://github.com/bijukunjummen/dynamodb-document-sample

↧

Generating a stream of Fibonacci numbers

January 18, 2021, 8:46 pm

≫ Next: Top "n" using a Priority Queue

≪ Previous: AWS SDK 2 for Java and storing a Json in DynamoDB

A Java stream represents potentially an infinite sequence of data. This is a simple post that will go into the mechanics involved in generating a simple stream of Fibonacci numbers.

The simplest way to get this stream of data is to use the generate method of Stream.

As you can imagine to generate a specific Fibonacci number in this sequence, the previous two numbers are required, which means the state of the previous two numbers need to be maintained somewhere. The two solutions that I will be describing here both maintain this state, however they do it differently.

Mutable State

In the first approach, I am just going to maintain state this way:

class FibState {
    private long[] prev = new long[2];
    private int index = 0;

    public long nextFib() {
        long result = (index == 0) ? 1
                : ((index == 1) ? 1 : prev[0] + prev[1]);
        prev[0] = prev[1];
        prev[1] = result;
        index++;
        return result;
    }
}

with index keeping track of the index of the current fibonacci number in the sequence and prev capturing the most recent two in the sequence. So the next in the series is generated by mutating the index and the changing the array holding the recent values. So given this state, how do we generate the stream, using code which looks like this:

Stream<Long> streamOfFib() {
    FibState fibState = new FibState();
    return Stream.generate(() -> fibState.nextFib());
}

This is using a closure to capture the fibState and mutating it repeatedly as the stream of numbers is generated. The approach works well, though the thought of mutating one value probably should induce a level of dread - is it thread safe (probably not), will it work for parallel streams(likely not), but should suffice for cases where the access is strictly sequential. A far better approach is to get a version of state that is immutable.

Immutable State

class FibState {
    private final long[] prev;
    private final int index;

    public FibState() {
        this(new long[]{-1, 1}, 0);
    }

    public FibState(long[] prev, int index) {
        this.prev = prev;
        this.index = index;
    }

    public FibState nextFib() {
        int nextIndex = index + 1;
        long result = (nextIndex == 1) ? 1 : prev[0] + prev[1];
        return new FibState(new long[]{prev[1], result}, nextIndex);
    }

    public long getValue() {
        return prev[1];
    }
}

Instead of mutating the state, it returns the next immutable state. Alright, so now that this version of state is available, how can it be used - by using the "iterate" function of Stream, like this:

Stream<Long> streamOfFib() {
    return Stream
            .iterate(new FibState(), fibState -> fibState.nextFib())
            .map(fibState -> fibState.getValue());
}

this function takes two parameters - the initial state and something which can generate the next state. Also to return numbers from it, I am mapping this "state" type to a number in the "map" operation.

Conclusion

This is generally instructive on how to generate a stream using Fibonacci sequence as an example. The approach will be fairly similar for any stream that you may need to generate. My personal preference is for the Immutable state variation as that delegates most of the mechanism of generating the stream to the excellent "iterate" helper function of Stream.

↧

Top "n" using a Priority Queue

February 28, 2021, 11:21 am

≫ Next: Jackson Kotlin extension and reified types

≪ Previous: Generating a stream of Fibonacci numbers

If you ever need to capture the smallest or largest "n" from a stream of data, the approach more often than not will be to use a simple data-structure called the Priority Queue.

Priority Queues do one thing very well - once a bunch of data is added, it can return the lowest value (or the highest value) in constant time.

How is this useful to answer a top or bottom "n" type question. Let's see.

Consider this hypothetical stream of data:

And you have to answer the smallest 2 at any point as this stream of data comes in. The trick is to use a Priority Queue

with a size limit of 2
which returns the largest of these 2 when asked for (sometimes referred to as a Max Priority Queue)

Two considerations as data streams in:

if the size of the priority queue is less than 2 then add the value to the priority queue
if the size of the priority queue is equal to 2, then compare the value from the stream with the largest in the queue

if less then remove the largest and add the new value
if more then do nothing

At the bottom is the state of the Priority Queue as each data in the stream is processed:

See how it always holds the bottom 2.

Similarly for the largest 3, a Priority Queue with a max capacity of 3 which returns the smallest (referred to as a Min Priority Queue) can be used the following way:

if the size of the priority queue is less than 3, then add to the priority queue
if the size is equal to 2, then check the value from the stream with the smallest in the queue

if more then remove smallest add the value from stream and ignore otherwise

Implementation

Here is a simple kotlin based implementation that uses the built in PriorityQueue in Java standard library.

fun findNSmallestAndLargest(nums: List<Int>, n: Int): Pair<List<Int>, List<Int>> {
    val minFirst: Comparator<Int> = Comparator.naturalOrder<Int>()
    val maxFirst: Comparator<Int> = minFirst.reversed()
    val minPq: PriorityQueue<Int> = PriorityQueue(minFirst)
    val maxPq: PriorityQueue<Int> = PriorityQueue(maxFirst)
    for (num in nums) {
        checkAndAddIfSmallest(maxPq, n, num)
        checkAndAddIfLargest(minPq, n, num)
    }
    return maxPq.toList() to minPq.toList()
}

private fun checkAndAddIfSmallest(maxPq: PriorityQueue<Int>, n: Int, num: Int) {
    if (maxPq.size < n) {
        maxPq.add(n)
    } else if (num < maxPq.peek()) {
        maxPq.poll()
        maxPq.add(num)
    }
}

private fun checkAndAddIfLargest(minPq: PriorityQueue<Int>, n: Int, num: Int) {
    if (minPq.size < n) {
        minPq.add(n)
    } else if (num > minPq.peek()) {
        minPq.poll()
        minPq.add(num)
    }
}

The implementation is very straightforward and follows the outlined algorithm to the letter.

↧