Skip to content

Conversation

@0rphee
Copy link
Contributor

@0rphee 0rphee commented Apr 16, 2025

Should fix haskell/haskell-language-server#4515 (comment)

I did a quick benchmark to try out the suggestion to use directory-ospath-streaming, with this results:

benchmarking findCabalFiles/streaming
time                 227.6 ms   (194.2 ms .. 259.8 ms)
                     0.990 R²   (0.965 R² .. 1.000 R²)
mean                 277.9 ms   (252.6 ms .. 349.6 ms)
std dev              53.94 ms   (1.425 ms .. 67.77 ms)
variance introduced by outliers: 57% (severely inflated)
benchmarking findCabalFiles/original ospath
time                 370.2 ms   (313.7 ms .. 434.7 ms)
                     0.996 R²   (0.987 R² .. 1.000 R²)
mean                 391.7 ms   (377.1 ms .. 406.2 ms)
std dev              18.44 ms   (9.247 ms .. 22.70 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking findCabalFiles/original filepath
time                 681.7 ms   (622.5 ms .. 749.8 ms)
                     0.999 R²   (0.996 R² .. 1.000 R²)
mean                 689.1 ms   (676.5 ms .. 700.1 ms)
std dev              13.06 ms   (11.07 ms .. 14.55 ms)
variance introduced by outliers: 19% (moderately inflated)

and this code:

`bench/Main.hs`

{-# LANGUAGE QuasiQuotes #-}
module Main (main) where

import Criterion.Main
import Stan.Cabal (findCabalFiles, findCabalFilesStreaming, findCabalFilesFilePath)
import System.Directory.OsPath (setCurrentDirectory)
import System.OsPath (osp)

main :: IO ()
main = do
  setCurrentDirectory [osp|../haskell-language-server/|] -- here i have the extra 100,000 files + hls
  defaultMain
    [ bgroup
        "findCabalFiles"
        [ bench "streaming" $ nfIO findCabalFilesStreaming,
          bench "original ospath" $ nfIO findCabalFiles,
          bench "original filepath" $ nfIO findCabalFilesFilePath
        ]
    ]

`src/Stan/Cabal.hs`

{- | Recursively find all @.cabal@ files in the current directory and its
subdirectories. It returns maximum 1 @.cabal@ file from each directory.
-}
findCabalFiles :: IO [FilePath]
findCabalFiles = do
    dir <- getCurrentDirectory
    curDirCabal <- findCabalFileDir dir
    dirs <- getSubdirsRecursive dir
    subDirsCabals <- mapM findCabalFileDir dirs
    pure $ catMaybes $ curDirCabal : subDirsCabals

-- | Find a @.cabal@ file in the given directory.
-- TODO: better error handling in stan.
findCabalFileDir :: FilePath -> IO (Maybe FilePath)
findCabalFileDir dir = do
    dirContent <- listDirectory dir
    let cabalFiles = filter isCabal dirContent
    pure $ case cabalFiles of
        []            -> Nothing
        cabalFile : _ -> Just $ dir </> cabalFile
  where
    isCabal :: FilePath -> Bool
    isCabal p = takeExtension p == ".cabal"

getSubdirsRecursive :: FilePath -> IO [FilePath]
getSubdirsRecursive fp = do
    f <- OsPath.encodeFS fp
    res <- getSubdirsRecursiveOs f
    traverse OsPath.decodeFS res

getSubdirsRecursiveOs :: OsPath.OsPath -> IO [OsPath.OsPath]
getSubdirsRecursiveOs fp = do
    all' <- filter nonGenDir <$> OsPath.listDirectory fp
    dirs <- filterM OsPath.doesDirectoryExist (mkRel <$> all')
    case dirs of
        [] -> pure []
        ds -> do
            -- unsafeInterleaveIO is required here for performance reasons
            next <- unsafeInterleaveIO $ foldMapA getSubdirsRecursiveOs ds
            pure $ dirs ++ next
  where
    nonGenDir :: OsPath.OsPath -> Bool
    nonGenDir d =
           d /= [OsPath.osp|dist|]
        && d /= [OsPath.osp|dist-newstyle|]
        && d /= [OsPath.osp|.stack-work|]
        && d /= [OsPath.osp|.git|]

    mkRel :: OsPath.OsPath -> OsPath.OsPath
    mkRel = (fp OsPath.</>)


{- | Recursively find all @.cabal@ files in the current directory and its
subdirectories. It returns maximum 1 @.cabal@ file from each directory.
-}
findCabalFilesFilePath :: IO [FilePath]
findCabalFilesFilePath = do
    dir <- getCurrentDirectory
    curDirCabal <- findCabalFileDirFilePath dir
    dirs <- getSubdirsRecursiveFilePath dir
    subDirsCabals <- mapM findCabalFileDirFilePath dirs
    pure $ catMaybes $ curDirCabal : subDirsCabals

-- | Find a @.cabal@ file in the given directory.
-- TODO: better error handling in stan.
findCabalFileDirFilePath :: FilePath -> IO (Maybe FilePath)
findCabalFileDirFilePath dir = do
    dirContent <- listDirectory dir
    let cabalFiles = filter isCabal dirContent
    pure $ case cabalFiles of
        []            -> Nothing
        cabalFile : _ -> Just $ dir </> cabalFile
  where
    isCabal :: FilePath -> Bool
    isCabal p = takeExtension p == ".cabal"

getSubdirsRecursiveFilePath :: FilePath -> IO [FilePath]
getSubdirsRecursiveFilePath fp = do
    all' <- filter nonGenDir <$> listDirectory fp
    dirs <- filterM doesDirectoryExist (mkRel <$> all')
    case dirs of
        [] -> pure []
        ds -> do
            -- unsafeInterleaveIO is required here for performance reasons
            next <- unsafeInterleaveIO $ foldMapA getSubdirsRecursiveFilePath ds
            pure $ dirs ++ next
  where
    nonGenDir :: FilePath -> Bool
    nonGenDir d =
           d /= "dist"
        && d /= "dist-newstyle"
        && d /= ".stack-work"

    mkRel :: FilePath -> FilePath
    mkRel = (fp </>)

findCabalFilesStreaming :: IO [FilePath]
findCabalFilesStreaming = do
    setRef <- IORef.newIORef S.empty -- stores the directories where we already found 1 cabal file
    root <- OsPath.getCurrentDirectory
    traverse OsPath.decodeFS =<<
        OPS.listContentsRecFold
           Nothing -- Depth limit
            (\_ _ (OPS.Relative _dirRelPath) (OPS.Basename dirBasename)  _symlinkType _consDirToList traverseThisSubdir rest ->
                 if visitCurrSubdirPred dirBasename -- if this condition is satisfied
                 then traverseThisSubdir rest -- True -> then this subdir will be traversed
                 else rest -- False -> else, this subdir will not be traversed
                ) -- how to fold this directory and its children, given its path
            (\_ _ (OPS.Relative path) (OPS.Basename fileBasename) _ft -> do
                  let parentDir = OsPath.takeDirectory path
                  set <- IORef.readIORef setRef
                  if not (S.member parentDir set) && collectPred fileBasename -- if this condition is satisfied
                  then do
                        IORef.writeIORef setRef $  S.insert parentDir set -- we add the parentDir of this file, to prevent adding more than 1 .cabal file
                        pure (Just path) -- True -> then this path will be added to the results
                  else pure Nothing -- False -> else, this path wont be added
                )
            (Identity root) -- (f a), list of roots to search in
  where
    visitCurrSubdirPred :: OsPath.OsPath -> Bool
    visitCurrSubdirPred d =
           d /= [OsPath.osp|dist|]
        && d /= [OsPath.osp|dist-newstyle|]
        && d /= [OsPath.osp|.stack-work|]
        && d /= [OsPath.osp|.git|]

    collectPred :: OsPath.OsPath -> Bool
    collectPred p =
        OsPath.takeExtension p == [OsPath.osp|.cabal|]

And here is a comparison of the memory usage of the streaming implementation (right) vs. the ospath implementation (left) I posted on the other issue.

Screenshot 2025-04-16 at 12 46 40 p m

Bodigrim and others added 4 commits March 13, 2025 22:16
In an unlikely event of several Cabal file in the same folder
Stan currently chooses to return the first of them, but inadverently
forgets to prepend a directory name.

Merging (x : _xs) case with [cabalFile] has a nice side effect
of making the search a bit faster: previous version of the code
would necessarily scan the entire folder looking for the second
Cabal file (because if it is to be found, outcomes will be different),
but now we shall stop after finding the first one.

Related to haskell/haskell-language-server#4515
.git folder can contain a maze of thousands of subfolders,
this is definitely not a place to look for Cabal files.

Related to haskell/haskell-language-server#4515
Use "directory-ospath-streaming: Stream directory entries in constant
memory in vanilla IO", for searching files. Should have a lower memory
usage than both the other implementations (original with
OsPath/FilePath)
@tomjaguarpaw
Copy link
Collaborator

tomjaguarpaw commented Apr 16, 2025

Seems like CI can't find the right version of a dependency. EDIT: It looks like it works only on 9.6 and above.

@tomjaguarpaw
Copy link
Collaborator

Regarding the heap profiles, looks like the streaming version is quicker but uses the same amount of memory. Is that right?

@0rphee
Copy link
Contributor Author

0rphee commented Apr 16, 2025

Seems like CI can't find the right version of a dependency. EDIT: It looks like it works only on 9.6 and above.

The problem seems to be that versions lower 9.6 use filepath<1.4.100.0 (which implements AFPP) because when using ghc (the library), any boot lib cannot be changed (though as I understand, otherwise it can be the case w/some of them, for example filepath. The only information that I could find about this is here: haskell/cabal#10087). However, versions previous to AFPP are not supported by directory-ospath-streaming.

@tomjaguarpaw would it be fine to add CPP for <9.6 with the original FilePath implementation?

Regarding the heap profiles, looks like the streaming version is quicker but uses the same amount of memory. Is that right?

More or less the same, yes, the Detailed view doesn't show much information for the original OsPath version, but from the Heap Profile, the maximum allocation seems to be 10MB smaller with the streaming version.

@tomjaguarpaw
Copy link
Collaborator

I think you can use the same strategy as here to use OsPath on all GHCs: https://github.com/haskell/process/pull/339/files

@0rphee
Copy link
Contributor Author

0rphee commented Apr 17, 2025

I think you can use the same strategy as here to use OsPath on all GHCs: https://github.com/haskell/process/pull/339/files

Maybe I'm missing something, but I don't think that it will be a very good solution (or even possible):

If we implement (48f98ad) for ghc<9.6, we need:

  • OsPath: no problem, type OsPath = OsString (from os-string)
  • osp: OsPath QQ, no problem*, we can use osstr (from os-string, or use encodeUtf). *Assuming all uses are valid filepaths, no validation compared to the original osp.
  • encodeFS: from os-string
  • listDirectory :: OsPath -> [OsPath]: here the problems begin, this function is defined in terms of getDirectoryContents which itself has a more complex (platform specific) definition in directory-1.3.8 (boot lib required to be 1.3.7.1 (ghc 9.4.8) or lower in ghc < 9.6, due to the previously mentioned limitations imposed by the usage of the ghc package).
  • </> :: OsPath -> OsPath -> OsPath another complex definition in (https://github.com/haskell/filepath/blob/master/System/OsPath/Common.hs#L868) though I suppose its possible to roundtrip the changes between OsPath & FilePath with the FilePath version of </>, and encodeFS/decodeFS, but it would kinda beat the point of all of this.

I also did a quick look into supporting filepath<1.4.100 in directory-ospath-streaming:

@Bodigrim
Copy link

Bodigrim commented May 1, 2025

It looks like it works only on 9.6 and above.

Given that HLS is dropping GHC < 9.6 soon (haskell/haskell-language-server#4567), this might be not that bad?..

I also did a quick look into supporting filepath<1.4.100 in directory-ospath-streaming...

Yeah, this is unlikely to be a realistic goal.

@0rphee 0rphee force-pushed the finding-cabal-files branch 3 times, most recently from 9d27258 to 5c689a2 Compare May 14, 2025 18:11
@0rphee 0rphee force-pushed the finding-cabal-files branch from 5c689a2 to dc502bf Compare May 14, 2025 18:20
stan.cabal Outdated
, trial ^>= 0.0.0.0
, trial-optparse-applicative ^>= 0.0.0.0
, trial-tomland ^>= 0.0.0.0
if impl(ghc >= 9.6)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more flexible to introduce an automatic Cabal flag, True by default. When building with GHC < 9.6 (or when available versions of filepath are constrained by other components of build plan) Cabal solver will automatically flip it. Also easier to coerce Stack to build by simply flipping the flag in stack.yaml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the branch, Is the new version what you had in mind?

@0rphee 0rphee force-pushed the finding-cabal-files branch 2 times, most recently from baf7c23 to 0ef1041 Compare May 15, 2025 15:42
@tomjaguarpaw
Copy link
Collaborator

Please let me know when you've got something you're happy with and I'll take a look.

@0rphee 0rphee force-pushed the finding-cabal-files branch 2 times, most recently from dde851e to defc920 Compare May 19, 2025 03:50
Copy link

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@0rphee 0rphee force-pushed the finding-cabal-files branch from defc920 to dd78450 Compare May 19, 2025 04:00
@0rphee
Copy link
Contributor Author

0rphee commented May 19, 2025

@tomjaguarpaw I think its fine to check now!

@tomjaguarpaw
Copy link
Collaborator

Thanks for this. The implementation looks fine to me, so let's merge.


I'm still confused about the graph in #586 (comment) though. Is this the heap profile for all of HLS? Or just for finding cabal files? (I hope the former because if the latter then it seems bizarre we're using 380 MB to find some cabal files.)

@tomjaguarpaw tomjaguarpaw enabled auto-merge (squash) May 21, 2025 10:11
@tomjaguarpaw tomjaguarpaw disabled auto-merge May 21, 2025 10:11
@tomjaguarpaw tomjaguarpaw merged commit a08e5b9 into kowainik:main May 21, 2025
29 checks passed
@0rphee
Copy link
Contributor Author

0rphee commented May 21, 2025

I'm still confused about the graph in #586 (comment) though. Is this the heap profile for all of HLS? Or just for finding cabal files? (I hope the former because if the latter then it seems bizarre we're using 380 MB to find some cabal files.)

Oh, sorry for the confusion. All the graphs I've posted are from HLS. In the original profiling (haskell/haskell-language-server#4515 (comment)) the leaked FilePath memory seemed to amount to about ~200MB.

@tomjaguarpaw
Copy link
Collaborator

Thanks, makes sense. I thought that's what it had to be from the context.

@tomjaguarpaw
Copy link
Collaborator

I'll need to do a release that incorporates this. Feel free to poke me if it doesn't appear soon.

@Bodigrim
Copy link

@tomjaguarpaw just a gentle ping ;)

@tomjaguarpaw
Copy link
Collaborator

Thanks for the ping, let's track the release at #588

@tomjaguarpaw
Copy link
Collaborator

OK, released as: https://hackage.haskell.org/package/stan-0.2.1.0

0rphee added a commit to 0rphee/haskell-language-server that referenced this pull request Jun 8, 2025
0rphee added a commit to 0rphee/haskell-language-server that referenced this pull request Jun 8, 2025
0rphee added a commit to 0rphee/haskell-language-server that referenced this pull request Jun 8, 2025
- Fixed by: kowainik/stan#586
- Released in stan-0.2.1.0
0rphee added a commit to 0rphee/haskell-language-server that referenced this pull request Jun 8, 2025
- Fixed by: kowainik/stan#586
- Released in stan-0.2.1.0
0rphee added a commit to 0rphee/haskell-language-server that referenced this pull request Jun 8, 2025
- Fixed by: kowainik/stan#586
- Released in stan-0.2.1.0
0rphee added a commit to 0rphee/haskell-language-server that referenced this pull request Jun 8, 2025
- Fixed by: kowainik/stan#586
- Released in stan-0.2.1.0
0rphee added a commit to 0rphee/haskell-language-server that referenced this pull request Jun 8, 2025
- Fixed by: kowainik/stan#586
- Released in stan-0.2.1.0
mergify bot pushed a commit to haskell/haskell-language-server that referenced this pull request Jun 9, 2025
- Fixed by: kowainik/stan#586
- Released in stan-0.2.1.0
@0rphee 0rphee deleted the finding-cabal-files branch June 13, 2025 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Space leak caused by stan plugin

3 participants