config: Use a global lockfile separate from the config/secrets files#157
Conversation
Locking the same file we then re-open for writing doesn't work on Windows because the underlying LockFileEx is fd/handle-scoped not process-scoped and a mandatory, OS enforced-lock, as documented¹: > If the locking process opens the file a second time, it cannot access > the specified region through this second handle until it unlocks the > region. The pattern works on Unix because the fcntl()-based lock isn't mandatory. This difference is noted in fastener's README², but I didn't realize the full implications of Windows' mandatory lock when introducing the locking in 821c08e. My thinking was enforcement was process-scoped not fd/handle-scoped. Using a global lockfile instead of a per-file sidecar lockfile avoids having to reason about the parent path existence checks which is a) hard and b) creates several small race conditions (which were present until now). The downside of this simpler implementation is that accesses to different files will be bottlenecked on the same lock (e.g. a writer to secrets will block a reader of config; a reader of config will block a writer of secrets). This is just fine, however, given our usage and very short holding times of the locks. Resolves #151, which also discusses a rejected alternative of re-using the same fd that fasteners locks. ¹ https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-lockfileex ² https://pypi.org/project/fasteners/0.17.3/#description
victorlin
left a comment
There was a problem hiding this comment.
Nice job tracking this down without access to a Windows machine! Tested on Windows 10 with #151 (comment) and this solves the issue.
The downside of this simpler implementation is that accesses to
different files will be bottlenecked on the same lock
I'm not sure where these locks take place and for how long, but would it result in scenarios where multiple SLURM jobs are submitted under the same user, happen to run nextstrain update at the same time, causing all but one job to crash?
If the chances of that happening are slim, then this is perfect. If not, then some sort of error handling or retry would be better than [Errno 13] Permission denied.
Ah, so the effect of the locking is writes are serialized and reads wait until a write is finished, at which point reads can proceed in parallel. Lock acquisition retries automatically, as the default acquisition method is blocking without a timeout. The errno 13 crash shouldn't happen again; it was the result of the mandatory lock denying access to an operation that didn't hold the lock, not the result of lock acquisition failing or waiting. Re: SLURM specifically, if the jobs are accessing a shared filesystem that supports Did you have a specific use case in mind with the SLURM question, or was it just a general example? |
No, was just trying to think of ways where a global lock could be problematic. But like you've described, we should be good here! |
Locking the same file we then re-open for writing doesn't work on
Windows because the underlying LockFileEx is fd/handle-scoped not
process-scoped and a mandatory, OS enforced-lock, as documented¹:
The pattern works on Unix because the fcntl()-based lock isn't
mandatory. This difference is noted in fastener's README², but I didn't
realize the full implications of Windows' mandatory lock when
introducing the locking in 821c08e. My thinking was enforcement was
process-scoped not fd/handle-scoped.
Using a global lockfile instead of a per-file sidecar lockfile avoids
having to reason about the parent path existence checks which is a) hard
and b) creates several small race conditions (which were present until
now). The downside of this simpler implementation is that accesses to
different files will be bottlenecked on the same lock (e.g. a writer to
secrets will block a reader of config; a reader of config will block a
writer of secrets). This is just fine, however, given our usage and
very short holding times of the locks.
Resolves #151, which also discusses a rejected alternative of re-using
the same fd that fasteners locks.
¹ https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-lockfileex
² https://pypi.org/project/fasteners/0.17.3/#description