Skip to content

Commit 4f6d26b

Browse files
derrickstoleegitster
authored andcommitted
list-objects: consume sparse tree walk
When creating a pack-file using 'git pack-objects --revs' we provide a list of interesting and uninteresting commits. For example, a push operation would make the local topic branch be interesting and the known remote refs as uninteresting. We want to discover the set of new objects to send to the server as a thin pack. We walk these commits until we discover a frontier of commits such that every commit walk starting at interesting commits ends in a root commit or unintersting commit. We then need to discover which non-commit objects are reachable from uninteresting commits. This commit walk is not changing during this series. The mark_edges_uninteresting() method in list-objects.c iterates on the commit list and does the following: * If the commit is UNINTERSTING, then mark its root tree and every object it can reach as UNINTERESTING. * If the commit is interesting, then mark the root tree of every UNINTERSTING parent (and all objects that tree can reach) as UNINTERSTING. At the very end, we repeat the process on every commit directly given to the revision walk from stdin. This helps ensure we properly cover shallow commits that otherwise were not included in the frontier. The logic to recursively follow trees is in the mark_tree_uninteresting() method in revision.c. The algorithm avoids duplicate work by not recursing into trees that are already marked UNINTERSTING. Add a new 'sparse' option to the mark_edges_uninteresting() method that performs this logic in a slightly different way. As we iterate over the commits, we add all of the root trees to an oidset. Then, call mark_trees_uninteresting_sparse() on that oidset. Note that we include interesting trees in this process. The current implementation of mark_trees_unintersting_sparse() will walk the same trees as the old logic, but this will be replaced in a later change. Add a '--sparse' flag in 'git pack-objects' to call this new logic. Add a new test script t/t5322-pack-objects-sparse.sh that tests this option. The tests currently demonstrate that the resulting object list is the same as the old algorithm. This includes a case where both algorithms pack an object that is not needed by a remote due to limits on the explored set of trees. When the sparse algorithm is changed in a later commit, we will add a test that demonstrates a change of behavior in some cases. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent f1f5de4 commit 4f6d26b

File tree

8 files changed

+192
-17
lines changed

8 files changed

+192
-17
lines changed

Documentation/git-pack-objects.txt

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ SYNOPSIS
1414
[--local] [--incremental] [--window=<n>] [--depth=<n>]
1515
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
1616
[--stdout [--filter=<filter-spec>] | base-name]
17-
[--shallow] [--keep-true-parents] < object-list
17+
[--shallow] [--keep-true-parents] [--sparse] < object-list
1818

1919

2020
DESCRIPTION
@@ -196,6 +196,15 @@ depth is 4095.
196196
Add --no-reuse-object if you want to force a uniform compression
197197
level on all data no matter the source.
198198

199+
--sparse::
200+
Use the "sparse" algorithm to determine which objects to include in
201+
the pack, when combined with the "--revs" option. This algorithm
202+
only walks trees that appear in paths that introduce new objects.
203+
This can have significant performance benefits when computing
204+
a pack to send a small change. However, it is possible that extra
205+
objects are added to the pack-file if the included commits contain
206+
certain types of direct renames.
207+
199208
--thin::
200209
Create a "thin" pack by omitting the common objects between a
201210
sender and a receiver in order to reduce network transfer. This

bisect.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -656,7 +656,7 @@ static void bisect_common(struct rev_info *revs)
656656
if (prepare_revision_walk(revs))
657657
die("revision walk setup failed");
658658
if (revs->tree_objects)
659-
mark_edges_uninteresting(revs, NULL);
659+
mark_edges_uninteresting(revs, NULL, 0);
660660
}
661661

662662
static void exit_if_skipped_commits(struct commit_list *tried,

builtin/pack-objects.c

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ static unsigned long pack_size_limit;
8484
static int depth = 50;
8585
static int delta_search_threads;
8686
static int pack_to_stdout;
87+
static int sparse;
8788
static int thin;
8889
static int num_preferred_base;
8990
static struct progress *progress_state;
@@ -3135,7 +3136,7 @@ static void get_object_list(int ac, const char **av)
31353136

31363137
if (prepare_revision_walk(&revs))
31373138
die(_("revision walk setup failed"));
3138-
mark_edges_uninteresting(&revs, show_edge);
3139+
mark_edges_uninteresting(&revs, show_edge, sparse);
31393140

31403141
if (!fn_show_object)
31413142
fn_show_object = show_object;
@@ -3292,6 +3293,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
32923293
{ OPTION_CALLBACK, 0, "unpack-unreachable", NULL, N_("time"),
32933294
N_("unpack unreachable objects newer than <time>"),
32943295
PARSE_OPT_OPTARG, option_parse_unpack_unreachable },
3296+
OPT_BOOL(0, "sparse", &sparse,
3297+
N_("use the sparse reachability algorithm")),
32953298
OPT_BOOL(0, "thin", &thin,
32963299
N_("create thin packs")),
32973300
OPT_BOOL(0, "shallow", &shallow,

builtin/rev-list.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -543,7 +543,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
543543
if (prepare_revision_walk(&revs))
544544
die("revision walk setup failed");
545545
if (revs.tree_objects)
546-
mark_edges_uninteresting(&revs, show_edge);
546+
mark_edges_uninteresting(&revs, show_edge, 0);
547547

548548
if (bisect_list) {
549549
int reaches, all;

http-push.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1933,7 +1933,7 @@ int cmd_main(int argc, const char **argv)
19331933
pushing = 0;
19341934
if (prepare_revision_walk(&revs))
19351935
die("revision walk setup failed");
1936-
mark_edges_uninteresting(&revs, NULL);
1936+
mark_edges_uninteresting(&revs, NULL, 0);
19371937
objects_to_send = get_delta(&revs, ref_lock);
19381938
finish_all_active_slots();
19391939

list-objects.c

Lines changed: 59 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -222,25 +222,73 @@ static void mark_edge_parents_uninteresting(struct commit *commit,
222222
}
223223
}
224224

225-
void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
225+
static void add_edge_parents(struct commit *commit,
226+
struct rev_info *revs,
227+
show_edge_fn show_edge,
228+
struct oidset *set)
229+
{
230+
struct commit_list *parents;
231+
232+
for (parents = commit->parents; parents; parents = parents->next) {
233+
struct commit *parent = parents->item;
234+
struct tree *tree = get_commit_tree(parent);
235+
236+
if (!tree)
237+
continue;
238+
239+
oidset_insert(set, &tree->object.oid);
240+
241+
if (!(parent->object.flags & UNINTERESTING))
242+
continue;
243+
tree->object.flags |= UNINTERESTING;
244+
245+
if (revs->edge_hint && !(parent->object.flags & SHOWN)) {
246+
parent->object.flags |= SHOWN;
247+
show_edge(parent);
248+
}
249+
}
250+
}
251+
252+
void mark_edges_uninteresting(struct rev_info *revs,
253+
show_edge_fn show_edge,
254+
int sparse)
226255
{
227256
struct commit_list *list;
228257
int i;
229258

230-
for (list = revs->commits; list; list = list->next) {
231-
struct commit *commit = list->item;
259+
if (sparse) {
260+
struct oidset set;
261+
oidset_init(&set, 16);
232262

233-
if (commit->object.flags & UNINTERESTING) {
234-
mark_tree_uninteresting(revs->repo,
235-
get_commit_tree(commit));
236-
if (revs->edge_hint_aggressive && !(commit->object.flags & SHOWN)) {
237-
commit->object.flags |= SHOWN;
238-
show_edge(commit);
263+
for (list = revs->commits; list; list = list->next) {
264+
struct commit *commit = list->item;
265+
struct tree *tree = get_commit_tree(commit);
266+
267+
if (commit->object.flags & UNINTERESTING)
268+
tree->object.flags |= UNINTERESTING;
269+
270+
oidset_insert(&set, &tree->object.oid);
271+
add_edge_parents(commit, revs, show_edge, &set);
272+
}
273+
274+
mark_trees_uninteresting_sparse(revs->repo, &set);
275+
oidset_clear(&set);
276+
} else {
277+
for (list = revs->commits; list; list = list->next) {
278+
struct commit *commit = list->item;
279+
if (commit->object.flags & UNINTERESTING) {
280+
mark_tree_uninteresting(revs->repo,
281+
get_commit_tree(commit));
282+
if (revs->edge_hint_aggressive && !(commit->object.flags & SHOWN)) {
283+
commit->object.flags |= SHOWN;
284+
show_edge(commit);
285+
}
286+
continue;
239287
}
240-
continue;
288+
mark_edge_parents_uninteresting(commit, revs, show_edge);
241289
}
242-
mark_edge_parents_uninteresting(commit, revs, show_edge);
243290
}
291+
244292
if (revs->edge_hint_aggressive) {
245293
for (i = 0; i < revs->cmdline.nr; i++) {
246294
struct object *obj = revs->cmdline.rev[i].item;

list-objects.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ typedef void (*show_object_fn)(struct object *, const char *, void *);
1010
void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *);
1111

1212
typedef void (*show_edge_fn)(struct commit *);
13-
void mark_edges_uninteresting(struct rev_info *, show_edge_fn);
13+
void mark_edges_uninteresting(struct rev_info *revs,
14+
show_edge_fn show_edge,
15+
int sparse);
1416

1517
struct oidset;
1618
struct list_objects_filter_options;

t/t5322-pack-objects-sparse.sh

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
#!/bin/sh
2+
3+
test_description='pack-objects object selection using sparse algorithm'
4+
. ./test-lib.sh
5+
6+
test_expect_success 'setup repo' '
7+
test_commit initial &&
8+
for i in $(test_seq 1 3)
9+
do
10+
mkdir f$i &&
11+
for j in $(test_seq 1 3)
12+
do
13+
mkdir f$i/f$j &&
14+
echo $j >f$i/f$j/data.txt
15+
done
16+
done &&
17+
git add . &&
18+
git commit -m "Initialized trees" &&
19+
for i in $(test_seq 1 3)
20+
do
21+
git checkout -b topic$i master &&
22+
echo change-$i >f$i/f$i/data.txt &&
23+
git commit -a -m "Changed f$i/f$i/data.txt"
24+
done &&
25+
cat >packinput.txt <<-EOF &&
26+
topic1
27+
^topic2
28+
^topic3
29+
EOF
30+
git rev-parse \
31+
topic1 \
32+
topic1^{tree} \
33+
topic1:f1 \
34+
topic1:f1/f1 \
35+
topic1:f1/f1/data.txt | sort >expect_objects.txt
36+
'
37+
38+
test_expect_success 'non-sparse pack-objects' '
39+
git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
40+
git index-pack -o nonsparse.idx nonsparse.pack &&
41+
git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
42+
test_cmp expect_objects.txt nonsparse_objects.txt
43+
'
44+
45+
test_expect_success 'sparse pack-objects' '
46+
git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
47+
git index-pack -o sparse.idx sparse.pack &&
48+
git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
49+
test_cmp expect_objects.txt sparse_objects.txt
50+
'
51+
52+
test_expect_success 'duplicate a folder from f3 and commit to topic1' '
53+
git checkout topic1 &&
54+
echo change-3 >f3/f3/data.txt &&
55+
git commit -a -m "Changed f3/f3/data.txt" &&
56+
git rev-parse \
57+
topic1~1 \
58+
topic1~1^{tree} \
59+
topic1^{tree} \
60+
topic1 \
61+
topic1:f1 \
62+
topic1:f1/f1 \
63+
topic1:f1/f1/data.txt | sort >required_objects.txt
64+
'
65+
66+
test_expect_success 'non-sparse pack-objects' '
67+
git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
68+
git index-pack -o nonsparse.idx nonsparse.pack &&
69+
git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
70+
comm -1 -2 required_objects.txt nonsparse_objects.txt >nonsparse_required_objects.txt &&
71+
test_cmp required_objects.txt nonsparse_required_objects.txt
72+
'
73+
74+
test_expect_success 'sparse pack-objects' '
75+
git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
76+
git index-pack -o sparse.idx sparse.pack &&
77+
git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
78+
comm -1 -2 required_objects.txt sparse_objects.txt >sparse_required_objects.txt &&
79+
test_cmp required_objects.txt sparse_required_objects.txt
80+
'
81+
82+
test_expect_success 'duplicate a folder from f1 into f3' '
83+
mkdir f3/f4 &&
84+
cp -r f1/f1/* f3/f4 &&
85+
git add f3/f4 &&
86+
git commit -m "Copied f1/f1 to f3/f4" &&
87+
cat >packinput.txt <<-EOF &&
88+
topic1
89+
^topic1~1
90+
EOF
91+
git rev-parse \
92+
topic1 \
93+
topic1^{tree} \
94+
topic1:f3 | sort >required_objects.txt
95+
'
96+
97+
test_expect_success 'non-sparse pack-objects' '
98+
git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
99+
git index-pack -o nonsparse.idx nonsparse.pack &&
100+
git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
101+
comm -1 -2 required_objects.txt nonsparse_objects.txt >nonsparse_required_objects.txt &&
102+
test_cmp required_objects.txt nonsparse_required_objects.txt
103+
'
104+
105+
test_expect_success 'sparse pack-objects' '
106+
git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
107+
git index-pack -o sparse.idx sparse.pack &&
108+
git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
109+
comm -1 -2 required_objects.txt sparse_objects.txt >sparse_required_objects.txt &&
110+
test_cmp required_objects.txt sparse_required_objects.txt
111+
'
112+
113+
test_done

0 commit comments

Comments
 (0)