Skip to content

Commit 0cffe93

Browse files
committed
check for dir/symlink conflicts on checkout/pull
Our "git lfs checkout" and "git lfs pull" commands, at present, follow any extant symbolic links when they populate the current working tree with files containing the content of Git LFS objects, even if the symbolic links point to locations outside of the working tree. This vulnerability has been assigned the identifier CVE-2025-26625. In previous commits we partially addressed this vulnerability by ensuring that the "git lfs checkout" and "git lfs pull" commands remove any file or symbolic link which already exists at the location where they intend to write the contents of a Git LFS file, and by checking for symbolic links at these locations first in the DecodePointerFromBlob() function of the "lfs" package. However, these changes still allow for the possibility that a symbolic link exists in place of a directory in the path between the root of the working tree and the location where the commands intend to create a file. At present, the "git lfs checkout" and "git lfs pull" commands will not detect such links, and so may be induced to write to a location outside of the working tree. To address this issue, revise the "git lfs checkout" and "git lfs pull" commands so they check each path component from the root of the working tree to a Git LFS file. If any are missing, a directory is created, and if any already exist but are not directories, the commands report an error and do not try to create the Git LFS file or write to it. In our implementation of these checks, we adopt a similar approach to the one used by Git, which also tries to avoid accidentally traversing symbolic links when updating the files in a working tree. For performance and compatibility reasons, though, Git does not try to completely eliminate all TOCTOU (time-of-check/time-of-use) races involving symbolic links. Likewise, we do not aim to prevent every possible race which might allow the Git LFS client to unintentionally write through a symbolic link. Instead, we try to limit the chances of this occurring as far as we reasonably can, while avoiding significant performance penalties. One difference between our approach and that taken by Git is that when the we check whether a directory exists and find something other than a directory, we do not try to remove it. This design choice retains compatibility with the legacy behaviour of the Git LFS client, which simply invoked the MkdirAll() function of the "os" package in the Go standard library. That function returns an error if any of the directories in the given path do not already exist and cannot be created, and the "git lfs checkout" and "git lfs pull" commands would just report that error rather than attempt to resolve it by removing anything. Another difference between the way Git checks for directory path conflicts and the implementation we introduce in this commit is that Git retains the results of its checks in a simple single-entry cache while we repeat our checks for each new Git LFS file we process. We can add caching logic in the future if we find it valuable, but we would require a more complex and thread-safe cache than Git's due to our use of multiple goroutines in the "git lfs pull" command, and initial testing indicates that the performance gains would be relatively limited. When the "git checkout" command runs, the checkout_entry_ca() function performs the necessary changes in the working tree in order to be able to write a copy of a given file at its expected location. This function invokes the create_directories() function to ensure that all of the directories between the root of the working tree and the file are present before the file is created. If the create_directories() function detects a conflict in place of any directory, such as a file or symbolic link, it tries to remove the conflicting entry and then create a new directory in its place. As noted above, though, Git does not re-check every directory entry in a file's path in all cases, and also does not try to avoid TOCTOU races in the checks it does perform. The create_directories() function relies on the has_dirs_only_path() function to report whether a path consists of only directories, and that function ultimately invokes the lstat_cache_matchlen() function to determine whether Git believes this to be the case or not: https://github.com/git/git/blob/v2.50.1/entry.c#L582 https://github.com/git/git/blob/v2.50.1/entry.c#L41-L42 https://github.com/git/git/blob/v2.50.1/symlinks.c#L257 https://github.com/git/git/blob/v2.50.1/symlinks.c#L276-L278 https://github.com/git/git/blob/v2.50.1/symlinks.c#L199-L200 https://github.com/git/git/blob/v2.50.1/symlinks.c#L63-L193 The lstat_cache_matchlen() function accepts a path from the root of the repository as its "name" parameter, and for each component of the path for which the function does not have any cached information, it uses the lstat(2) POSIX system call to test whether that path component exists and if it is a directory or not. The final result is then retained in the function's single-entry cache. The use of a cache with only a single entry is viable for Git because in almost all cases, it processes files in sorted order. Thus it can make use of the cached lstat(2) information about the directory "abc" from the path "abc/bar.txt" when checking the path of "abc/foo.txt", for instance. The use of cache in this function, though, is one of the reasons Git is not immune to TOCTOU races involving symbolic links. If a directory is replaced with a symbolic link after the lstat_cache_matchlen() function has checked the path, the lstat_cache_matchlen() function will assume another file with the same leading path components can be created without re-checking for symbolic links, and Git will traverse the new symbolic link when writing the file, even if it leads to a location outside of the working tree. Git also has to be careful to reset the cache whenever it removes any of the directories in the cached path, as may occur when Git processes files that are not in sorted order and their paths conflict with each other due to case-insensitivity or case-folding on the part of the filesystem. This type of situation was described in commit git/git@684dd4c, which added logic to ensure the cache is cleared under these conditions as part of the remediation for the vulnerability identified as CVE-2021-21300. Further, Git would also need to consistently use the openat(2) family of POSIX system calls in conjunction with their O_NOFOLLOW flags, or their equivalent on Windows, in order to guarantee that a given path consists solely of directories and no symbolic links. As noted in commit git/git@f4aa8c8 in relation to the vulnerability identified as CVE-2024-32004, on Windows this type of implementation would require the use of the relatively expensive NtCreateFile() system call (and its FILE_OPEN_REPARSE_POINT flag): https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html https://pubs.opengroup.org/onlinepubs/9699919799/functions/fstatat.html https://www.man7.org/linux/man-pages/man2/openat.2.html https://www.man7.org/linux/man-pages/man2/stat.2.html https://learn.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntcreatefile Beginning with version 1.24.0, Go introduced a Root structure type in the "os" package of the standard library, with a set of methods which explicitly enforces file path boundaries, using the openat(2) family of system calls where they are available, and the NtCreateFile() system call on Windows. Go v1.25.0 expanded the set of methods in the Root type, and in particular added a MkdirAll() method which mirrors the regular MkdirAll() function in the "os" package, but checks that none of the components in a path are symbolic links to locations outside a given initial "root" path. The development of the Root type and its API was tracked in golang/go#67002. One minor caveat with the MkdirAll() method of the Root structure type is that it allows symbolic links to exist in a path, so long as they do not resolve to location outside the path that was initially passed to the OpenRoot() function. We would prefer to avoid these types of "local" symbolic links as well when they conflict with a directory we expect to exist, so the Root type's MkdirAll() method would not suffice for our purposes. A more important challenge with the Root structure type is that consistent use of its methods would result in a noticeable increase in the execution time of our commands when processing even moderate numbers of Git LFS files. Each of the type's methods, including Lstat(), Mkdir(), OpenFile(), and Remove(), traverses the directories in its path parameter and checks that none are symbolic links to locations outside the path initially passed to the OpenRoot() function. Each method's cost therefore scales with the number of directories in its path parameter; i.e., given "m" method calls and "n" directories in a path, the number of system calls scales as O(m*n). For this reason, the Go documentation states that: "Root operations on filenames containing many directory components can be much more expensive than the equivalent non-Root operation." https://go.dev/blog/osroot#performance We verified this performance penalty in tests of a modified "git lfs checkout" command which checks for symbolic links in each Git LFS file's path within the repository by calling the methods of the Root structure type. We also tested the implementation from this commit, and we report those results in more detail below. In brief, even without a cache like the one in Git's lstat_cache_matchlen() function, the technique we introduce in this commit adds a modest overhead, while the use of the Root structure's methods significantly increased the command's runtime. Our preferred technique relies on several enhancements we made in previous commits to the "git lfs checkout" and "git lfs pull" commands. These commands retrieve a list of Git LFS pointer files from the ScanLFSFiles() method of the GitScanner structure type in our "lfs" package, and for each file, invoke the Run() method of the singleCheckout structure type in our "commands" package. The Run() method then determines whether or not to write the contents of the object referenced by the pointer into a file in the working tree at the appropriate path. In prior commits we revised the newSingleCheckout() function to verify whether a working tree exists when it initializes a new singleCheckout structure, and if a tree is present, to change the current working directory to the root of the tree. We also adjusted the Run() method so that it returns immediately without taking action if no working tree was found by the newSingleCheckout() function. We now introduce a new DirWalker structure type in our "tools" package, with Walk() and WalkAndCreate() methods which check that each component of a given path is a directory, and return an error if a conflict is found. If a directory is missing, the Walk() method will return an error, while the WalkAndCreate() method will try to create the directory. Both methods are simple wrappers around the internal walk() method, whose "create" parameter indicates whether the method should try to create missing directories or not. To initialize a DirWalker structure we define a NewDirWalkerForFile() function, which requires three parameters. The first is an initial parent path which should be specified as a path relative to the current working directory, and which is stored in the "parentPath" element of the new DirWalker structure. The second parameter is a file path which should be specified as a path relative to the parent path. If the parent path is empty, the file path is understood to be relative to the current working directory. The third parameter must be a structure with a RepositoryPermissions() method which conforms to the repositoryPermissionFetcher interface type from our "tools" package. The NewDirWalkerForFile() function removes the final filename path segment from its second "filePath" parameter in order to populate the new DirWalker structure's "path" element with leading directories in the file's path, if any. If the "filePath" parameter contains a bare filename, because the file resides at the root of the repository, then the "path" element is set to an empty path. Note that we do not use the Dir() function from the "path/filepath" package in the Go standard library to remove the filename from the "filePath" parameter because that function returns a "." path when a path has no leading directory components, and because it replaces the "/" separator with the "\" separator on Windows, which we do not want to do in this context. When the DirWalker structure's walk() method is called, it assumes that the path identified by the structure's "parentPath" element exists within the current working directory, and then checks each of the directories in the "path" element until either an error is returned or all the directories have been checked. If a directory does not exist, the walk() method returns an ErrNotExist error unless the "create" parameter is set to "true", in which case the walk() method will try to create the missing directory. If a conflict is found in the place of a directory, such as a pre-existing file or symbolic link with the same name, then the walk() method returns a custom errNotDir error. Assuming that the newSingleCheckout() function found an extant working tree and was able to change the current working directory to the root of the tree, the singleCheckout structure's Run() method creates a new DirWalker structure and calls its Walk() method to determine which directories in the given Git LFS pointer file's path already exist, without at first trying to create any new directories. Since the current working directory is the root of the work tree, the Run() method passes an empty path to the NewDirWalkerForFile() function as its "parentPath" parameter, and the pointer file's path as the "filePath" parameter. The pointer file paths processed by the Run() method are guaranteed to be those supplied by Git, since they are the paths returned by the ScanLFSFiles() method of the GitScanner structure, which reads the paths from the output of either a "git ls-files" or "git ls-tree" command. As such, we expect these paths to always use forward slash characters as separators, to always be relative paths and not absolute paths, and to never contain empty path components or "." or ".." path components. For safety, the DirWalker structure's walk() method rejects any path which contains any of these path components and returns an error in such a case. If the call to the Walk() method returns an error, the Run() method checks whether the error was due to a missing directory or some other issue. If an ErrNotExist error from the "os" package was returned, this indicates that at least one directory in the current Git LFS pointer file's path does not exist, in which case the Run() method skips calling the DecodePointerFromFile() function from our "lfs" package, since there is no value in trying to read a non-existent file's contents to see if it contains a raw Git LFS pointer. If some other type of error was returned, the Run() method logs the error and returns without proceeding further, and if no error was returned, then all the file's ancestor directories were found, so the Run() method does call the DecodePointerFromFile() function in that case. The Run() method then proceeds to check the results from the DecodePointerFromFile() function, if it was called at all. This logic remains unchanged, but can take advantage of the fact that an ErrNotExist error from the call to the Walk() method implies that no pointer file exists. When this type of error is returned by either the Walk() method or the DecodePointerFromFile() function, the Run() method then calls the DiffIndexWithPaths() function in our "git" package to check if the user has intentionally removed the file from Git's index, in which case no further action should be taken. If an ErrNotExist error was returned by either the Walk() method or the DecodePointerFromFile() function, and the user has not removed the file from Git's index, then the Run() method calls the DirWalker structure's WalkAndCreate() method in order to create any directories in the file's path which are missing. For this call, the internal walk() method of the DirWalker structure continues where the previous invocation left off, based on the values of the internal "parentPath" and "path" elements of the structure. The previous invocation of the walk() method by the Walk() method will have set the structure's "parentPath" element to contain the leading directories in the file's path that were found to exist, and set the "path" element to contain just those directories which need to be created. Note that either of these paths may be empty, since there may be no missing directories, or all the directories in the file's path may be missing, or the file may be located in the top-level directory. To verify that the DirWalker structure's internal walk() method handles all of these potential conditions, along with various types of directory conflicts such as pre-existing files or symbolic links, we add a TestDirWalkerWalk() Go test function and define a large number of valid and invalid test cases for this function. The test function then exercises the walk() method in all the defined test cases, both with an empty parent path and with a non-empty parent path. When the Run() method calls the DirWalker's WalkAndCreate() method, this passes a "true" value to the walk() method for its "create" parameter, so any directories that are missing will be created. This means that when the Run() method then calls the RunToPath() method, and it invokes the SmudgeToFile() method of the GitFilter structure in our "lfs" package, that method no longer needs to try to create any directories. We therefore remove the call to the MkdirAll() function in our "tools" package from the SmudgeToFile() method. However, the MkdirAll() function in our "tools" package is designed to enforce any umask settings defined by Git's "core.sharedRepository" configuration option, which is why the SmudgeToFile() method did not simply invoke the MkdirAll() function from the "os" package. Since we want to retain support for this Git configuration option, we add a Mkdir() function to our "tools" package which mirrors the MkdirAll() function, with the only difference being that it wraps the Mkdir() function from the "os" package rather than the MkdirAll() function. We then call the new function in the walk() method instead of calling the Mkdir() function from the "os" package directly. There is one use case where we still need to use the MkdirAll() function from our "tools" package, though. When the "git lfs checkout" command is run with its --to option, the RunToPath() method of the singleCheckout structure is invoked directly. The file path specified as the parameter of the --to option is converted to an absolute path and passed to the RunToPath() method so that the contents of the Git LFS object identified by the other command-line parameters are written to a file at the given path. Since the Run() method does not execute in this case, the WalkAndCreate() method is not called and therefore will not create any directories that might be missing in the path specified by the --to option, and neither will the SmudgeToFile() method, because it no longer calls the MkdirAll() function from our "tools" package. To ensure that we still support the use of the --to option with an arbitrary file path parameter, we now call the "tools" package's MkdirAll() function in the checkoutConflict() function of the "git lfs checkout" command immediately after we convert the --to option's parameter into an absolute file path. In previous commits we expanded the checks in the "checkout: conflicts" test in our t/t-checkout.sh test script so it will validate the use of the "git lfs checkout" command's --to option in a wide range of conditions, including with file path parameters to locations with ancestor directories that do not exist. As a consequence, we can be confident that the test validates that our changes in this commit do not introduce a regression in our support of the --to option of the "git lfs checkout" command. On the other hand, we do require additional shell tests to thoroughly validate the effectiveness of our revisions to the methods of the singleCheckout structure. Since we expect the "git lfs checkout" and "git lfs pull" commands to now try to detect when symbolic links exist in place of the directories in the paths to Git LFS files in a work tree, even if the targets of those links are themselves directories, we expand the "checkout: skip directory symlink conflicts" and "pull: skip directory symlink conflicts" tests that we added to our t/t-checkout.sh and t/t-pull.sh test scripts in a prior commit. Previously, these two tests verified that the "git lfs checkout" and "git lfs pull" commands would skip attempting to write out the contents of Git LFS objects into files in the work tree if the files' paths conflicted with pre-existing symbolic links, but only when the targets of the links were not directories. The tests now also specifically check the commands' behaviour when the targets of the links are directories, since before our changes in this commit the commands would traverse these links and create or update files and subdirectories within the target directories. Note, though, that we do not check this behaviour under TOCTOU race conditions, because we do not expect the commands to avoid traversing symbolic links in those cases, as described above. We also expand the "checkout: skip case-based symlink conflicts" and "pull: skip case-based symlink conflicts" tests we added in a previous commit. These tests now also check that when when the directories in Git LFS file paths conflict with symbolic links as a result of case-insensitivity on the part of a filesystem, the "git lfs checkout" and "git lfs pull" commands detect the conflicts and report errors instead of trying to populate the Git LFS files with their objects' contents. In both these two tests and the "checkout: skip directory symlink conflicts" and "pull: skip directory symlink conflicts" tests, we make an additional check to confirm that when symbolic links to directories exist in place of regular directories in the paths to Git LFS files, the Git error message "is beyond a symbolic link" does not appear in the output of the "git lfs checkout" and "git lfs pull" commands. This message would indicate that the Git LFS commands attempted to refresh the Git index using the "git update-index" command for a file whose path contains a symbolic link to a directory in place of a regular directory. As the "git lfs checkout" and "git lfs pull" commands should now detect such symbolic links (so long as there is no TOCTOU race), these Git error messages should not appear in the commands' output. Finally, we adjust the "checkout: skip directory file conflicts" and "pull: skip directory file conflicts" tests we added in another prior commit. These tests check that the "git lfs checkout" and "git lfs pull" commands detect when a regular file exists in the place of a directory in a Git LFS file's path. Our changes in this commit do not alter that fundamental behaviour, but they do result in a more consistent error message from the commands when a regular file exists in place of a directory. Previously, when a file conflicted with a directory in a Git LFS file's path, the output of the "git lfs checkout" and "git lfs pull" commands differed between Unix and Windows systems due to a difference in the error returned by the Lstat() function call performed in the DecodePointerFromFile() function. On Unix systems, this error encapsulates an ENOTDIR error number, which the IsNotExist() function of the "os" package does not consider equivalent to an ErrNotExist error. On these systems, the Run() method would therefore report the error immediately after calling the DecodePointerFromFile() function and then return without taking further action. On Windows systems, however, the same circumstances caused the Lstat() function to return an ErrNotExist error, due to the implementation of the Lstat() function in the Go standard library, which maps the Windows ERROR_FILE_NOT_FOUND error number to an ErrNotExist error. As a result, the Run() method would proceed to call the RunToPath() method, which invoked the SmudgeToFile() method. When that method called the OpenFile() function from the "os" package to try to create the Git LFS file, though, an error would occur, and this was then the error whose message would be logged by the by the "git lfs checkout" and "git lfs pull" commands. Now that the DecodePointerFromFile() function is only called by the Run() method if its invocation of the DirWalker structure's Walk() method does not return an error, the "git lfs checkout" and "git lfs pull" commands will report the same error message on both Unix and Windows systems if the Walk() method encounters a regular file in place of a directory. To account for this change, we update our "checkout: skip directory file conflicts" and "pull: skip directory file conflicts" tests so they expect the same error message on all systems. In addition to these changes to our regular Go and shell test suites, we also evaluated the impact of our changes in this commit to the speed of the "git lfs checkout" and "git lfs pull" commands under moderate workloads. Our performance testing focused on the "git lfs checkout" command since we are not concerned with the time required to fetch Git LFS objects from a remote server. For our principal test scenario, we created 10,000 small Git LFS files, with each file containing roughly 10 bytes of data only, so that the time required to write out the Git LFS object data of each file was minimal. Because the cost of checking for symbolic links in the paths to Git LFS files will scale with the number of files and the number of path components, we chose a distribution of our test files with the intent that it would emulate a relatively normal repository and not a pathological use case. For example, if we placed all the Git LFS files at the root of the repository, we would not exercise our new checks for symbolic links at all. For our principal test repository, we therefore distributed the Git LFS files in groups of 100 into 100 subdirectories, with 5 ancestor directories between these each of these subdirectories and the root of the repository. In a completely empty working tree, the runtime of the "git lfs checkout" command is heavily dominated by the cost of repeatedly spawning the "git diff-index" command, which we execute once for each file we find to be missing from the work tree. (Improving this behaviour so that the "git diff-index" command could be invoked with multiple file paths would be a valuable enhancement we might want to explore in the future.) So as to better evaluate the performance impact of our changes in this commit, we usually populated our working tree with raw Git LFS pointer files, as might occur after running "git clone" with the GIT_LFS_SKIP_SMUDGE environment variable set to a value equivalent to "true". This avoids the cost of executing the "git diff-index" command, which can otherwise result in a tenfold increase in the runtime of the "git lfs checkout" command. For the majority of our tests, we utilized a Linux system with 16 cores running at 2.10 GHz and a 5.15 kernel version. We also repeated our tests on macOS and Windows systems, with similar results. The times reported below are from the Linux system tests. In our primary test scenario, with 10,000 small Git LFS files in groups of 100 with 6 levels of subdirectories for each group, the impact of checking of each directory in the files' paths amounted to a 15% increase in the average runtime of the "git lfs checkout" command compared to the 3.7.0 version of the Git LFS client. The v3.7.0 client's average runtime was 3.89s and the average runtime with this commit's changes was 4.46s. We also experimented with the inclusion of a simple lock-free single-entry cache in the walk() function, similar to the cache implemented by Git in its lstat_cache_matchlen() function. This reduced the average runtime of the "git lfs checkout" command in the same scenario described above to 4.23s, an 8% increase over the v3.7.0 client's average runtime. Our test scenario represented the ideal conditions for this simple cache, however. The "git lfs checkout" command processes files sequentially in the order returned by the "git ls-files" command (or the "git ls-tree" command, if the installed version of Git is older than v2.42.0), and so we could avoid the need for any locks around our cache, or use a more complex multiple-entry cache. The "git lfs pull" command, though, invokes the Run() method of the singleCheckout structure from two separate goroutines, one of which receives its list of Git LFS pointer files from the transfer queue as their corresponding objects' data is downloaded. A functional cache implementation would consequently require locks to avoid contention between parallel invocations of the walk() method by separate goroutines, which would somewhat diminish any potential performance gains. A single-entry cache might also prove to be ineffective with the "git lfs pull" command, since some files would be processed immediately if their objects were present in the local Git LFS storage directories, while others would be processed as their objects were downloaded, which might occur in a significantly different order than the sort order of the pointers' file paths. Instead of a single-entry cache, we could use a simple map of unbounded size, or an LRU (Least-Recently Used) cache with a bounded number of elements. However, if we do choose to add a cache in the future, it should not expose us to the type of vulnerability which the Git project reported in CVE-2021-21300. That issue resulted partly from the use of a single-entry cache and an incorrect assumption that files would always be processed in sorted order, but the key difference between Git and Git LFS in this regard is that Git tries to conform the working tree to have the contents it expects, and Git LFS does not. During a "git checkout" command, Git will try to remove directory entries such as files and symbolic links which conflict with the file paths Git intends to create. Thus, when Git encountered files whose paths conflicted on a case-insensitive filesystem, if these files were processed out of the usual sorted order, Git might cache one file path, then remove it from the filesystem but not the cache, and then assume the file path still existed based on the contents of the cache. Git LFS should not be vulnerable to this type of problem because it does not try to remove entries which conflict with the ancestor directories in a Git LFS file's path. Overall, though, the performance of the "git lfs checkout" command with the changes from this commit but without any form of caching appears to be acceptable, so we do not implement a cache in the DirWalker structure's methods at this time. We can always revisit this decision in the future, of course. As well as testing our changes from this commit (both with and without a simple cache), we also tested an experimental version of the "git lfs checkout" command which used the methods of the Root structure type from the "os" package. As described above, these methods are designed to ensure that they never operate on files outside a given initial "root" file path. On our Linux test system, the average runtime of the "git lfs checkout" command, when all filesystem operations were converted to use the methods of the Root type, was 6.57s in our primary test scenario, a 69% increase over the average runtime of the command when using the 3.7.0 version of Git LFS client, and a 47% increase over the average runtime of the command when using the changes from this commit. (Those average runtimes were 3.89s and 4.46s, respectively.) On a GitHub Actions runner with Windows Server 2025, the average runtime of the "git lfs checkout" command when all its filesystem operations used the Root type's methods was 28.83s, a 63% increase over the average runtime of the command when using the 3.7.0 version of the client, and a 39% increase over the average runtime of the command when using the changes from this commit. (Those average runtimes were 17.70s and 20.70s, respectively.) Intriguingly, on a GitHub Actions runner with macOS 15.5 (Sequoia), the average runtime of the "git lfs checkout" command with the changes from this commit was 5.81s, 5% faster than the 6.14s average runtime when using the 3.7.0 version of the Git LFS client. The average runtime of the command when all filesystem operations used the Root type's methods, however, was 11.73s, a 91% increase compared to the runtime of the command with the 3.7.0 version of the client and a 102% increase compared to the runtime of the command with the changes from this commit. | v3.7.0 | DirWalker | os.Root --------+-----------+-----------+----------- Linux | 3.89s | 4.46s | 6.57s macOS | 6.14s | 5.81s | 11.73s Windows | 17.70s | 20.70s | 28.83s As we explained above, these performance impacts are the primary reason why we avoid the use of the Root interface and its methods and prefer to check for symbolic links in a more efficient manner, even if that allows for the possibility that we cannot detect some race conditions.
1 parent e735de5 commit 0cffe93

File tree

8 files changed

+829
-88
lines changed

8 files changed

+829
-88
lines changed

commands/command_checkout.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import (
1010
"github.com/git-lfs/git-lfs/v3/git"
1111
"github.com/git-lfs/git-lfs/v3/lfs"
1212
"github.com/git-lfs/git-lfs/v3/tasklog"
13+
"github.com/git-lfs/git-lfs/v3/tools"
1314
"github.com/git-lfs/git-lfs/v3/tq"
1415
"github.com/git-lfs/git-lfs/v3/tr"
1516
"github.com/spf13/cobra"
@@ -110,6 +111,11 @@ func checkoutConflict(file string, stage git.IndexStage) {
110111
Exit(tr.Tr.Get("Could not convert %q to absolute path: %v", checkoutTo, err))
111112
}
112113

114+
err = tools.MkdirAll(filepath.Dir(checkoutTo), cfg)
115+
if err != nil {
116+
Exit(tr.Tr.Get("Could not create path %q: %v", checkoutTo, err))
117+
}
118+
113119
// will chdir to root of working tree, if one exists
114120
singleCheckout := newSingleCheckout(cfg.Git, "")
115121
if singleCheckout.Skip() {

commands/pull.go

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ import (
1212
"github.com/git-lfs/git-lfs/v3/git"
1313
"github.com/git-lfs/git-lfs/v3/lfs"
1414
"github.com/git-lfs/git-lfs/v3/subprocess"
15+
"github.com/git-lfs/git-lfs/v3/tools"
1516
"github.com/git-lfs/git-lfs/v3/tq"
1617
"github.com/git-lfs/git-lfs/v3/tr"
1718
)
@@ -75,8 +76,20 @@ func (c *singleCheckout) Run(p *lfs.WrappedPointer) {
7576
return
7677
}
7778

78-
// Check the content - either missing or still this pointer (not exist is ok)
79-
filepointer, err := lfs.DecodePointerFromFile(p.Name)
79+
dirWalker := tools.NewDirWalkerForFile("", p.Name, cfg)
80+
err := dirWalker.Walk()
81+
82+
var filepointer *lfs.Pointer
83+
if err != nil {
84+
if !os.IsNotExist(err) {
85+
LoggedError(err, tr.Tr.Get("Checkout error trying to check path for %q: %s", p.Name, err))
86+
return
87+
}
88+
} else {
89+
// Check the content - either missing or still this pointer (not exist is ok)
90+
filepointer, err = lfs.DecodePointerFromFile(p.Name)
91+
}
92+
8093
if err != nil {
8194
if os.IsNotExist(err) {
8295
output, err := git.DiffIndexWithPaths("HEAD", true, []string{p.Name})
@@ -106,6 +119,13 @@ func (c *singleCheckout) Run(p *lfs.WrappedPointer) {
106119
return
107120
}
108121

122+
if err != nil && os.IsNotExist(err) {
123+
if err := dirWalker.WalkAndCreate(); err != nil {
124+
LoggedError(err, tr.Tr.Get("Checkout error trying to create path for %q: %s", p.Name, err))
125+
return
126+
}
127+
}
128+
109129
if err := c.RunToPath(p, p.Name); err != nil {
110130
if errors.IsDownloadDeclinedError(err) {
111131
// acceptable error, data not local (fetch not run or include/exclude)

lfs/gitfilter_smudge.go

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,6 @@ import (
1616
)
1717

1818
func (f *GitFilter) SmudgeToFile(filename string, ptr *Pointer, download bool, manifest tq.Manifest, cb tools.CopyCallback) error {
19-
tools.MkdirAll(filepath.Dir(filename), f.cfg)
20-
2119
// When no pointer file exists on disk, we should use the permissions
2220
// defined for the file in Git, since the executable mode may be set.
2321
// However, to conform with our legacy behaviour, we do not do this

t/t-checkout.sh

Lines changed: 101 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -191,13 +191,8 @@ begin_test "checkout: skip directory file conflicts"
191191
echo >&2 "fatal: expected checkout to succeed ..."
192192
exit 1
193193
fi
194-
if [ "$IS_WINDOWS" -eq 1 ]; then
195-
grep 'could not check out "dir1/a\.dat": could not create working directory file' checkout.log
196-
grep 'could not check out "dir2/dir3/dir4/a\.dat": could not create working directory file' checkout.log
197-
else
198-
grep 'Checkout error for "dir1/a\.dat": lstat' checkout.log
199-
grep 'Checkout error for "dir2/dir3/dir4/a\.dat": lstat' checkout.log
200-
fi
194+
grep '"dir1/a\.dat": not a directory' checkout.log
195+
grep '"dir2/dir3/dir4/a\.dat": not a directory' checkout.log
201196

202197
[ -f "dir1" ]
203198
[ -f "dir2/dir3" ]
@@ -209,13 +204,8 @@ begin_test "checkout: skip directory file conflicts"
209204
echo >&2 "fatal: expected checkout to succeed ..."
210205
exit 1
211206
fi
212-
if [ "$IS_WINDOWS" -eq 1 ]; then
213-
grep 'could not check out "dir1/a\.dat": could not create working directory file' checkout.log
214-
grep 'could not check out "dir2/dir3/dir4/a\.dat": could not create working directory file' checkout.log
215-
else
216-
grep 'Checkout error for "dir1/a\.dat": lstat' checkout.log
217-
grep 'Checkout error for "dir2/dir3/dir4/a\.dat": lstat' checkout.log
218-
fi
207+
grep '"dir1/a\.dat": not a directory' checkout.log
208+
grep '"dir2/dir3/dir4/a\.dat": not a directory' checkout.log
219209
popd
220210

221211
[ -f "dir1" ]
@@ -224,8 +214,6 @@ begin_test "checkout: skip directory file conflicts"
224214
)
225215
end_test
226216

227-
# Note that the conditions validated by this test are at present limited,
228-
# but will be expanded in the future.
229217
begin_test "checkout: skip directory symlink conflicts"
230218
(
231219
set -e
@@ -247,6 +235,64 @@ begin_test "checkout: skip directory symlink conflicts"
247235
git add .gitattributes dir1 dir2
248236
git commit -m "initial commit"
249237

238+
# test with symlinks to directories
239+
rm -rf dir1 dir2/dir3 ../link*
240+
mkdir ../link1 ../link2
241+
ln -s ../link1 dir1
242+
ln -s ../../link2 dir2/dir3
243+
244+
git lfs checkout 2>&1 | tee checkout.log
245+
if [ "0" -ne "${PIPESTATUS[0]}" ]; then
246+
echo >&2 "fatal: expected checkout to succeed ..."
247+
exit 1
248+
fi
249+
grep '"dir1/a\.dat": not a directory' checkout.log
250+
grep '"dir2/dir3/dir4/a\.dat": not a directory' checkout.log
251+
[ -z "$(grep "is beyond a symbolic link" checkout.log)" ]
252+
253+
[ -L "dir1" ]
254+
[ -L "dir2/dir3" ]
255+
[ ! -e "../link1/a.dat" ]
256+
[ ! -e "../link2/dir4" ]
257+
assert_clean_index
258+
259+
rm -rf dir1 dir2/dir3
260+
mkdir link1 link2
261+
ln -s link1 dir1
262+
ln -s ../link2 dir2/dir3
263+
264+
git lfs checkout 2>&1 | tee checkout.log
265+
if [ "0" -ne "${PIPESTATUS[0]}" ]; then
266+
echo >&2 "fatal: expected checkout to succeed ..."
267+
exit 1
268+
fi
269+
grep '"dir1/a\.dat": not a directory' checkout.log
270+
grep '"dir2/dir3/dir4/a\.dat": not a directory' checkout.log
271+
[ -z "$(grep "is beyond a symbolic link" checkout.log)" ]
272+
273+
[ -L "dir1" ]
274+
[ -L "dir2/dir3" ]
275+
[ ! -e "link1/a.dat" ]
276+
[ ! -e "link2/dir4" ]
277+
assert_clean_index
278+
279+
pushd dir2
280+
git lfs checkout 2>&1 | tee checkout.log
281+
if [ "0" -ne "${PIPESTATUS[0]}" ]; then
282+
echo >&2 "fatal: expected checkout to succeed ..."
283+
exit 1
284+
fi
285+
grep '"dir1/a\.dat": not a directory' checkout.log
286+
grep '"dir2/dir3/dir4/a\.dat": not a directory' checkout.log
287+
[ -z "$(grep "is beyond a symbolic link" checkout.log)" ]
288+
popd
289+
290+
[ -L "dir1" ]
291+
[ -L "dir2/dir3" ]
292+
[ ! -e "link1/a.dat" ]
293+
[ ! -e "link2/dir4" ]
294+
assert_clean_index
295+
250296
# test with symlink to file and dangling symlink
251297
rm -rf dir1 dir2/dir3 ../link*
252298
touch ../link1
@@ -258,20 +304,16 @@ begin_test "checkout: skip directory symlink conflicts"
258304
echo >&2 "fatal: expected checkout to succeed ..."
259305
exit 1
260306
fi
261-
if [ "$IS_WINDOWS" -eq 1 ]; then
262-
grep 'could not check out "dir1/a\.dat": could not create working directory file' checkout.log
263-
else
264-
grep 'Checkout error for "dir1/a\.dat": lstat' checkout.log
265-
fi
266-
grep 'could not check out "dir2/dir3/dir4/a\.dat": could not create working directory file' checkout.log
307+
grep '"dir1/a\.dat": not a directory' checkout.log
308+
grep '"dir2/dir3/dir4/a\.dat": not a directory' checkout.log
267309

268310
[ -L "dir1" ]
269311
[ -L "dir2/dir3" ]
270312
[ -f "../link1" ]
271313
[ ! -e "../link2" ]
272314
assert_clean_index
273315

274-
rm -rf dir1 dir2/dir3
316+
rm -rf dir1 dir2/dir3 link*
275317
touch link1
276318
ln -s link1 dir1
277319
ln -s ../link2 dir2/dir3
@@ -281,12 +323,8 @@ begin_test "checkout: skip directory symlink conflicts"
281323
echo >&2 "fatal: expected checkout to succeed ..."
282324
exit 1
283325
fi
284-
if [ "$IS_WINDOWS" -eq 1 ]; then
285-
grep 'could not check out "dir1/a\.dat": could not create working directory file' checkout.log
286-
else
287-
grep 'Checkout error for "dir1/a\.dat": lstat' checkout.log
288-
fi
289-
grep 'could not check out "dir2/dir3/dir4/a\.dat": could not create working directory file' checkout.log
326+
grep '"dir1/a\.dat": not a directory' checkout.log
327+
grep '"dir2/dir3/dir4/a\.dat": not a directory' checkout.log
290328

291329
[ -L "dir1" ]
292330
[ -L "dir2/dir3" ]
@@ -300,12 +338,8 @@ begin_test "checkout: skip directory symlink conflicts"
300338
echo >&2 "fatal: expected checkout to succeed ..."
301339
exit 1
302340
fi
303-
if [ "$IS_WINDOWS" -eq 1 ]; then
304-
grep 'could not check out "dir1/a\.dat": could not create working directory file' checkout.log
305-
else
306-
grep 'Checkout error for "dir1/a\.dat": lstat' checkout.log
307-
fi
308-
grep 'could not check out "dir2/dir3/dir4/a\.dat": could not create working directory file' checkout.log
341+
grep '"dir1/a\.dat": not a directory' checkout.log
342+
grep '"dir2/dir3/dir4/a\.dat": not a directory' checkout.log
309343
popd
310344

311345
[ -L "dir1" ]
@@ -480,20 +514,26 @@ begin_test "checkout: skip case-based symlink conflicts"
480514
mkdir dir1
481515
ln -s ../link1 A.dat
482516
ln -s ../../link2 dir1/a.dat
517+
ln -s ../link3 DIR3
518+
ln -s ../../link4 dir1/dir2
483519

484-
git add A.dat dir1
520+
git add A.dat dir1 DIR3
485521
git commit -m "initial commit"
486522

487-
rm A.dat dir1/a.dat
523+
rm A.dat dir1/* DIR3
488524

489525
echo "*.dat filter=lfs diff=lfs merge=lfs -text" >.gitattributes
490526

491527
contents="a"
492528
contents_oid="$(calc_oid "$contents")"
529+
mkdir dir3 dir1/DIR2
493530
printf "%s" "$contents" >a.dat
494531
printf "%s" "$contents" >dir1/A.dat
532+
printf "%s" "$contents" >dir3/a.dat
533+
printf "%s" "$contents" >dir1/DIR2/a.dat
495534

496-
git -c core.ignoreCase=false add .gitattributes a.dat dir1/A.dat
535+
git -c core.ignoreCase=false add .gitattributes a.dat dir1/A.dat \
536+
dir3/a.dat dir1/DIR2/a.dat
497537
git commit -m "case-conflicting commit"
498538

499539
git push origin main
@@ -512,25 +552,32 @@ begin_test "checkout: skip case-based symlink conflicts"
512552

513553
assert_local_object "$contents_oid" 1
514554

515-
rm -rf *.dat dir1 ../link*
555+
rm -rf *.dat dir1 *3 ../link*
556+
mkdir ../link3 ../link4
516557

517558
git lfs checkout 2>&1 | tee checkout.log
518559
if [ "0" -ne "${PIPESTATUS[0]}" ]; then
519560
echo >&2 "fatal: expected checkout to succeed ..."
520561
exit 1
521562
fi
522-
grep -q 'Checking out LFS objects: 100% (2/2), 2 B' checkout.log
563+
grep -q 'Checking out LFS objects: 100% (4/4), 4 B' checkout.log
523564

524565
[ -f "a.dat" ]
525566
[ "$contents" = "$(cat "a.dat")" ]
526567
[ -f "dir1/A.dat" ]
527568
[ "$contents" = "$(cat "dir1/A.dat")" ]
569+
[ -f "dir3/a.dat" ]
570+
[ "$contents" = "$(cat "dir3/a.dat")" ]
571+
[ -f "dir1/DIR2/a.dat" ]
572+
[ "$contents" = "$(cat "dir1/DIR2/a.dat")" ]
528573
[ ! -e "../link1" ]
529574
[ ! -e "../link2" ]
575+
[ ! -e "../link3/a.dat" ]
576+
[ ! -e "../link4/a.dat" ]
530577
assert_clean_index
531578

532-
rm -rf a.dat dir1/A.dat
533-
git checkout -- A.dat dir1/a.dat
579+
rm -rf a.dat dir1/A.dat dir3 dir1/DIR2
580+
git checkout -- A.dat dir1/a.dat DIR3 dir1/dir2
534581

535582
git lfs checkout 2>&1 | tee checkout.log
536583
if [ "0" -ne "${PIPESTATUS[0]}" ]; then
@@ -539,11 +586,14 @@ begin_test "checkout: skip case-based symlink conflicts"
539586
fi
540587
if [ "$collision" -eq "0" ]; then
541588
# case-sensitive filesystem
542-
grep -q 'Checking out LFS objects: 100% (2/2), 2 B' checkout.log
589+
grep -q 'Checking out LFS objects: 100% (4/4), 4 B' checkout.log
543590
else
544591
# case-insensitive filesystem
545592
grep '"a\.dat": not a regular file' checkout.log
546593
grep '"dir1/A\.dat": not a regular file' checkout.log
594+
grep '"dir3/a\.dat": not a directory' checkout.log
595+
grep '"dir1/DIR2/a\.dat": not a directory' checkout.log
596+
[ -z "$(grep "is beyond a symbolic link" checkout.log)" ]
547597
fi
548598

549599
if [ "$collision" -eq "0" ]; then
@@ -552,13 +602,21 @@ begin_test "checkout: skip case-based symlink conflicts"
552602
[ "$contents" = "$(cat "a.dat")" ]
553603
[ -f "dir1/A.dat" ]
554604
[ "$contents" = "$(cat "dir1/A.dat")" ]
605+
[ -f "dir3/a.dat" ]
606+
[ "$contents" = "$(cat "dir3/a.dat")" ]
607+
[ -f "dir1/DIR2/a.dat" ]
608+
[ "$contents" = "$(cat "dir1/DIR2/a.dat")" ]
555609
else
556610
# case-insensitive filesystem
557611
[ -L "a.dat" ]
558612
[ -L "dir1/A.dat" ]
613+
[ -L "dir3" ]
614+
[ -L "dir1/DIR2" ]
559615
fi
560616
[ ! -e "../link1" ]
561617
[ ! -e "../link2" ]
618+
[ ! -e "../link3/a.dat" ]
619+
[ ! -e "../link4/a.dat" ]
562620
assert_clean_index
563621
)
564622
end_test

0 commit comments

Comments
 (0)