Skip to content

Add server component for batched alignment calls#25

Open
robinp wants to merge 8 commits intorobertostling:masterfrom
robinp:server
Open

Add server component for batched alignment calls#25
robinp wants to merge 8 commits intorobertostling:masterfrom
robinp:server

Conversation

@robinp
Copy link

@robinp robinp commented Jul 15, 2025

Below commits add an eflomal-server binary, that can be started after activating the virtualenv (or after packaging and installing). It by default looks for a server_config.json file that describes a list of aligners, as determined by their prior files.

Then a JSON call can be made (see the shell script in devscripts directory for example) to a specified aligner, with one or more sentence pair passed. A sentence can either be a string, or a list of tokens.

The endpoint also takes optional parameters, for example scoring can be disabled; model updates can be disabled; iterations per model can be specified.

The server mode preloads and preprocesses the priors, so subsequent API calls are reasonably fast. The calls still operate as eflomal binary executions, but the exec overhead is not significant compared to the alignment computation itself. NULL priors are not supported, but comments are left where they could be added.

Other notable changes:

  • In eflomal C binary, allow to treat zero (that is, no) sentences as clean for model update. So the previous zero-default that meant all as clean is now -1, and zero really means zero.
  • eflomal C binary debug printouts also include which pass is it (forward/reverse)
  • eflomal-align can skip N input lines, also limit to processing M input lines, in order to allow aligning windows from a larger input file.
  • bump python to >=3.12, probably for the NamedTemporaryFile safety (new param available from there).
  • some clarifying comments

Bugfix (?): calculate_priors in reverse mode didn't reverse the pairs, fixed that.

Robin Palotai added 8 commits May 12, 2025 11:05
Not sure, but sounds logical.
Passing trust_sents=False will set n_clean=0, which (after this change)
means no sentences are trusted for statistics, so prior updates don't
happen. Useful for batched sending of sentences of dubious quality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant