Skip to content

Probable concurrency issue with updateHackageIndex #111

@int-e

Description

@int-e

I saw this transient (hadrian+)stack build failure on IRC, https://gitlab.haskell.org/ghc/ghc/-/jobs/1682718

As far as I can make out, the key events are

[...]
hadrian/build-stack --version
[which invokes `stack build --no-library-profiling` in ghc's `hadrian` subdirectory]
[...]
Error: [S-922]
No cryptographic hash found for Hackage package data-array-byte-0.1.0.1, updating
[...]
Error: [S-7282]
       Stack failed to execute the build plan.
       
       While executing the build plan, Stack encountered the error:
       
       Error: [S-922]
       No cryptographic hash found for Hackage package splitmix-0.1.0.4

Note that the first S-922 error mentions a different package than the final error, and there was no corresponding "..., updating" message for splitmix-0.1.0.4. That looks like a race.

A bit of analysis supports this idea:

The messages come from here

          updated <- updateHackageIndex $ Just $ display exc <> ", updating"
          mpair2 <-
            case updated of
              UpdateOccurred -> withStorage $ loadHackageTarballInfo name ver
              NoUpdateOccurred -> pure Nothing
          case mpair2 of
            Nothing -> throwIO exc
            Just pair2 -> pure pair2

So when the corresponding package information is not found, the hackage index is updated, and if an update actually took place, the operation is retried.

Inside updateHackageIndex we find this code

  gateUpdate inner = do
    pc <- view pantryConfigL
    join $ modifyMVar (pcUpdateRef pc) $ \toUpdate -> pure $
      if toUpdate
        then (False, UpdateOccurred <$ inner)
        else (False, pure NoUpdateOccurred)

which uses an MVar to ensure that the index isn't updated multiple times. However, if two threads find a stale index concurrently, only one of them will be notified that an update occurred; the other one will be stuck with whatever stale information was in the index.

Fixing this looks tricky, because the current updateHackageIndex interface fundamentally cannot reliably detect whether another update has just taken place, even if it is changed to wait for pending updates of the index to complete (that part looks easy to do by moving the inner call inside the modifyMVar, but I don't know how pcUpdateRef is used elsewhere). So all callers need to be investigated, I believe.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions