neptune.ai Home Assignment

The project is implemented in Java 22, and Spring Boot. In particular with Spring WebFlux for efficient stream and ingestion of a large amount of trade quotes. Read below for details.

How to run

Prerequisites

To build the application you will need Java 22. You can use Sdkman to install and manage JVM versions:

sdk list java
sdk install java 22-tem
java -version

*You can also build and run it on IntelIiJ or other IDE.

Build & run

Now you can build the project and run the tests (Gradle wrapper is included):

./gradlew check    # build and run the tests
./gradlew bootRun  # starts the app

Test the API

Once started, you can launch a cURL request to both endpoints, to check that it is up and running:

🏄️ The application can read a huge stream of prices and process/stream them as they are being received without filling them into the memory to comply with high throughput requirements.

# Try to request the stats: As no quotes has been ingested yet, it returns a "204 No Content"
curl 'http://localhost:8080/stats?symbol=EURUSD&k=8'

# Then add some data points to the ingestion endpoint:
curl -X POST 'http://localhost:8080/add_batch?symbol=EURUSD' \
     -H 'Content-Type: application/json' \
     -d '[1.10403, 1.10402, 1.10631, 1.10207, 1.10401]'

# Again, request for the stats:
curl 'http://localhost:8080/stats?symbol=EURUSD&k=8'

The expected result for the /stats endpoint after the requests above is a JSON with the following:

{"min":1.10207000000000,"max":1.10631000000000,"last":1.10401000000000,"avg":1.10408800000000,"var":0.00000180473600}

💡 Please, see Addendum.md for a more in-depth application testing using the provided data point generation script.

Solution detail

The source code is well-documented. Here is an overview of the solution.

Main features

Since the server processes data as a stream, it avoids loading the entire list into memory, and processing can start when the first element or chunk arrives.
The data to calculate the stats is updated in real-time for every ingested price from the input stream (the web flux). In this way, the statistics retrieval will be fast and efficient, and the stats can be requested at any time, even during the batch ingestion (they update live!).
All data structures used keep the memory constrained, k-buffers with fixed size are allocated, and the stats retrieval has a time complexity under O(N) as requested.

Technical aspects

There were 2 things in the problem statement that particularilly draw my attention:

No defined limit for the payload on the post method (a limit of data points)
The words near-real time, high-performance and high-frequency.

That inclined me to use Spring WebFlux instead of Spring Web MVC for the data ingestion, as it provides a way to consume and process the data as a stream instead of filling it into the memory. Also, non-blocking IO allows for high degree of concurrency—though in this case, concurrency is actually limited. It could also potentially benefit from the backpressure feature if the client of the API supported that.

The implemented data structure to keep the constraints of space and time is a dynamic collection of circular buffers (CircularSortingArray), one of them for each (symbol, k)-tuple. Symbols, contrary to k are unconstrained as per the problem statement.
To compute the stats, this buffer relies on a custom implementation of a Multiset (which allows keeping track of market quotes sorted in a Red-Black Tree that allows duplicates) for efficiently obtaining and updating the min/max values.
This Multiset relies on the Fastutil efficient collections to eliminate the overhead of Java's boxing/unboxing, and reduce the memory allocation and garbage collection, improving on performance.
As per the requirements, the maximum number of elements to process is the last 10^k, this size is used to define the underlying array and constrain the runtime memory.
As it is guaranteed by the problem statement that there won't be concurrent requests for a given symbol, no thread synchronization is required. On real production system, this should probably be taken in consideration.

"No two concurrent add or/and get requests will occur simultaneously within a given symbol.”
Limit 10 symbols means we can fix the initial map size to that number to optimize resizes.
In the topic of use double vs. BigDecimal, after considering the pros and cons of each option I decided to use double.

Rationale: Being this a high-frequency real-time system, performance is utterly important. And while double could suffer from precision loss, it is a primitive type with very efficient memory allocation. So for a simple statistics service double could be sufficient.

On the other hand, while BigDecimal would provide accurate decimal precision, the allocation of objects could add a penalty in terms of memory and add pressure to the Garbage Collector. Moreover, BigDecimal is immutable, which means that every arithmetic operation would add more allocations. There's a MutableBigDecimal (Apache Commons Lang library) that would improve on arithmetic, but we still need to deal with object allocation for each data point from the request stream.

Notes

💡 After the first implementation I realized that despite using the double primitive type, and an array-based buffer to store the quotes (prices), there was still a lot of boxing/unboxing as I was using a java.util.TreeSet to keep track of the minimum and maximum values for the buffer, and find the next minimum or maximum when every buffer exceeds its fixed capacity. So, after taking that into account, I switched to the Fastutil library, which provides efficient primitive-based collections that mitigate any memory overhead caused by Java's autoboxing. See CircularSortingArray class for detail.

What's missing

More tests in general. There's some Spring integration tests that test the whole application and different flows, but I would like to have a better testing strategy, to catch more edge cases. And in particular unit testing for the critical parts.

Out of scope

As it was not mentioned in the statement, to reduce the scope of the exercise I didn't add custom API error handlers, so I could focus on the algorithm and data structures part. But general error handling for the REST API can be implemented via Spring exception handlers. Despite this, the API now returns different status codes including 200 OK, 204 No Content and 400 Bad Request.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
gradle/wrapper		gradle/wrapper
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
ADDENDUM.md		ADDENDUM.md
README.md		README.md
build.gradle		build.gradle
generate_data_points.py		generate_data_points.py
gradlew		gradlew
gradlew.bat		gradlew.bat
memory-use.png		memory-use.png
neptune.ai_Homework_Assignment_Backend_Software_Engineer.pdf		neptune.ai_Homework_Assignment_Backend_Software_Engineer.pdf
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

neptune.ai Home Assignment

How to run

Prerequisites

Build & run

Test the API

Solution detail

Main features

Technical aspects

Notes

What's missing

Out of scope

About

Uh oh!

Releases

Packages

Uh oh!

Languages

gerardbosch/coding-challenge-neptune.ai-trading-api

Folders and files

Latest commit

History

Repository files navigation

neptune.ai Home Assignment

How to run

Prerequisites

Build & run

Test the API

Solution detail

Main features

Technical aspects

Notes

What's missing

Out of scope

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages