The project is implemented in Java 22, and Spring Boot. In particular with Spring WebFlux for efficient stream and ingestion of a large amount of trade quotes. Read below for details.
To build the application you will need Java 22. You can use Sdkman to install and manage JVM versions:
sdk list java
sdk install java 22-tem
java -version*You can also build and run it on IntelIiJ or other IDE.
Now you can build the project and run the tests (Gradle wrapper is included):
./gradlew check # build and run the tests
./gradlew bootRun # starts the appOnce started, you can launch a cURL request to both endpoints, to check that it is up and running:
🏄️ The application can read a huge stream of prices and process/stream them as they are being received without filling them into the memory to comply with high throughput requirements.
# Try to request the stats: As no quotes has been ingested yet, it returns a "204 No Content"
curl 'http://localhost:8080/stats?symbol=EURUSD&k=8'
# Then add some data points to the ingestion endpoint:
curl -X POST 'http://localhost:8080/add_batch?symbol=EURUSD' \
-H 'Content-Type: application/json' \
-d '[1.10403, 1.10402, 1.10631, 1.10207, 1.10401]'
# Again, request for the stats:
curl 'http://localhost:8080/stats?symbol=EURUSD&k=8'The expected result for the /stats endpoint after the requests above is a JSON with the following:
{"min":1.10207000000000,"max":1.10631000000000,"last":1.10401000000000,"avg":1.10408800000000,"var":0.00000180473600}💡 Please, see Addendum.md for a more in-depth application testing using the provided data point generation script.
The source code is well-documented. Here is an overview of the solution.
- Since the server processes data as a stream, it avoids loading the entire list into memory, and processing can start when the first element or chunk arrives.
- The data to calculate the stats is updated in real-time for every ingested price from the input stream (the web flux). In this way, the statistics retrieval will be fast and efficient, and the stats can be requested at any time, even during the batch ingestion (they update live!).
- All data structures used keep the memory constrained,
k-bufferswith fixed size are allocated, and the stats retrieval has a time complexity underO(N)as requested.
There were 2 things in the problem statement that particularilly draw my attention:
- No defined limit for the payload on the post method (a limit of data points)
- The words near-real time, high-performance and high-frequency.
That inclined me to use Spring WebFlux instead of Spring Web MVC for the data ingestion, as it provides a way to consume and process the data as a stream instead of filling it into the memory. Also, non-blocking IO allows for high degree of concurrency—though in this case, concurrency is actually limited. It could also potentially benefit from the backpressure feature if the client of the API supported that.
-
The implemented data structure to keep the constraints of space and time is a dynamic collection of circular buffers (
CircularSortingArray), one of them for each(symbol, k)-tuple. Symbols, contrary tokare unconstrained as per the problem statement. -
To compute the stats, this buffer relies on a custom implementation of a
Multiset(which allows keeping track of market quotes sorted in a Red-Black Tree that allows duplicates) for efficiently obtaining and updating themin/maxvalues. -
This Multiset relies on the
Fastutilefficient collections to eliminate the overhead of Java's boxing/unboxing, and reduce the memory allocation and garbage collection, improving on performance. -
As per the requirements, the maximum number of elements to process is the last
10^k, this size is used to define the underlying array and constrain the runtime memory. -
As it is guaranteed by the problem statement that there won't be concurrent requests for a given symbol, no thread synchronization is required. On real production system, this should probably be taken in consideration.
"No two concurrent add or/and get requests will occur simultaneously within a given symbol.”
-
Limit 10 symbols means we can fix the initial map size to that number to optimize resizes.
-
In the topic of use
doublevs.BigDecimal, after considering the pros and cons of each option I decided to usedouble.Rationale: Being this a high-frequency real-time system, performance is utterly important. And while
doublecould suffer from precision loss, it is a primitive type with very efficient memory allocation. So for a simple statistics servicedoublecould be sufficient.On the other hand, while BigDecimal would provide accurate decimal precision, the allocation of objects could add a penalty in terms of memory and add pressure to the Garbage Collector. Moreover,
BigDecimalis immutable, which means that every arithmetic operation would add more allocations. There's aMutableBigDecimal(Apache Commons Lang library) that would improve on arithmetic, but we still need to deal with object allocation for each data point from the request stream.
- 💡 After the first implementation I realized that despite using the
doubleprimitive type, and an array-based buffer to store the quotes (prices), there was still a lot of boxing/unboxing as I was using ajava.util.TreeSetto keep track of the minimum and maximum values for the buffer, and find the next minimum or maximum when every buffer exceeds its fixed capacity. So, after taking that into account, I switched to theFastutillibrary, which provides efficient primitive-based collections that mitigate any memory overhead caused by Java's autoboxing. SeeCircularSortingArrayclass for detail.
- More tests in general. There's some Spring integration tests that test the whole application and different flows, but I would like to have a better testing strategy, to catch more edge cases. And in particular unit testing for the critical parts.
- As it was not mentioned in the statement, to reduce the scope of the exercise I didn't add custom API error handlers, so I could focus on the algorithm and data structures part. But general error handling for the REST API can be implemented via Spring exception handlers. Despite this, the API now returns different status codes including
200 OK,204 No Contentand400 Bad Request.