-
Notifications
You must be signed in to change notification settings - Fork 320
Integrate thespians actors with asyncio library. #1978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…reating thespian actors. (elastic#1975)" This reverts commit 81abfc3.
|
This is in an early working stage. It requires more testing and better documentation. |
aae6145 to
a82d48f
Compare
a82d48f to
30732d2
Compare
…ncelation and problems debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for this work. A lot of effort was put into this.
I'd like to capture some context for posterity.
This work was triggered by an intention to speed-up corpus download and decompression. One of the speed-up ideas is to enable multi-part download instead of existing single-part download. Today, corpus files are downloaded during track processing via DefaultTrackPreparator. The download is concurrent thanks to Thespian actor structure. The work is dispatched by TrackPreparationActor and executed in a single background thread in TaskExecutionActor (see this diagram). There are as many TrackPreparationActor instances as load-driver hosts, and as many TaskExecutionActor instances as there are CPU cores on load-driver hosts. The tasks to be executed are sent via Thespian DoTask messages. Each message contains WorkerTask object with a callable and its parameters. The list of tasks is populated using processor.on_prepare_track() generator (see here). Processors are taken from TrackProcessorRegistry. There is only one TrackProcessor implementation that has meaningful on_prepare_track() method defined which is the aforementioned DefaultTrackPreparator. The DefaultTrackProcessor.on_prepare_track() generator yields a separate callable for every document corpus defined in corpora section in track definition.
After reviewing this structure it dawned on me that the easiest way to implement yet another level of concurrency (multi-part download) would be by adding Async I/O loop in TaskExecutionActor background thread similarly to what Worker actor is doing via AsyncIoAdapter (see this diagram), and introducing asynchronous version of TrackProcessor.
@fressi-elastic Have you considered the above option? Can you describe briefly how you intend to use the new AsyncActor class in the context of the structure described above? Will the download be still done via DefaultTrackPreparator?
…re receiving it from creator.
…in the same actor system.
This implements a new
AsyncActorbase class intended to be used within anasyncioevent loop.This actor supports asynchronous requests (that can be sent either from an external system or from another actor), and handle these requests from a non blocking async method that will be executed from inside an event loop owned by the actor itself.
This introduce some new global functions to help implementing a more dynamic stateless workflow. According to the context (inside of outside an actor these will be implemented using an Actor or an ActorSystem).
actors.create(cls, requirements): it allow creating an actor from anywhere. The actor will be created from the current actor or using a pre-initializedActorSystem.actors.send(destination, message): a wrapper aroundAsyncActor.sendorActorSystem.telldepending if it is called from an a Actor or not.actors.request(destination, message, timeout): a wrapper aroundActor.sendorActorSystem.askthat allows to send a request that will return an awaitable response. It is implemented as anasyncco-routine and it is intended to be used within an event loop. Target actor should be a subclass ofAsyncActor. It returns anasyncio.Futureintended for retrieving the result or error of the actorreceiveMessagemethod. In the case of running inside of an actor the implementation is really asynchronous, so it means actors can truly communicate each other using non blocking asyncio tasks. This should help implementing stateless workflows that would not force setting static variables inside actor instances to keep track of the current execution state.actors.shutdown: it shut downs the current actor system or the current actor. It is intended to be used as atexit callback to nicely clean up actor system or actors async tasks.Following the workflow of actors.request function:
[TBD]