-
Notifications
You must be signed in to change notification settings - Fork 472
[Syntax] Use BumpPtrAllocator for Syntax node internals #2925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f270db9 to
f390137
Compare
| public subscript<RangeType: RangeExpression<Int>>( | ||
| range: RangeType | ||
| ) -> SyntaxArenaAllocatedBufferPointer<Element> { | ||
| return SyntaxArenaAllocatedBufferPointer(UnsafeBufferPointer(rebasing: self.buffer[range])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self as SubSequence was an anti-pattern. That violated "The subsequence shares indices with the original collection".
f390137 to
f1bc852
Compare
f9b6c7e to
cf0292d
Compare
|
|
||
| // If the buffer is already populated, return it. | ||
| if let baseAddress = baseAddressRef.pointee { | ||
| return SyntaxDataReferenceBuffer(UnsafeBufferPointer(start: baseAddress, count: childCount)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should not read the memory before the lock, or we should use atomic read/write. But does it really matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think you either need to guard this with the lock, or use atomics. Another thread could potentially perform the writes out-of-order, so the buffer may not be fully initialized. SE-0282 requires concurrent read/write accesses to be done using atomic operations (a lock also works as it would make them non-concurrent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reference! Changed to use _Atomic(const void *), and it doesn't seem to affect the performance numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Could be worth seeing if this can be done purely using atomics to avoid needing PlatformMutex, e.g compare-and-exchange a null pointer with 0x1 on the thread that computes the layout, with racing threads spinning until it becomes > 0x1.
Edit: Having thought about this a bit more, I'm not sure it's actually a good idea: we'd still need a mutex to guard the allocator, and doing a spinlock can potentially lead to deadlocks
10d9545 to
fd35cc9
Compare
2132a4f to
2829606
Compare
74beef9 to
6a5f0fe
Compare
|
Rebased on main after #2924 |
6a5f0fe to
c53efa4
Compare
|
@swift-ci Please test |
ahoppen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. 🚀 Looks great to me overall, I left a few comments inline.
| /// Each node consumes `SyntaxData` size at least. Non-empty layout node tail | ||
| /// allocates a pointer storage for the base address of the layout buffer. | ||
| /// | ||
| /// For layout buffers, each child element consumes a `SyntaxDataReference` in | ||
| /// the parent's layout. But non-collection layout nodes, the layout is usually | ||
| /// sparse, so we can't calculate the exact memory size until we see the RawSyntax. | ||
| /// That being said, `SytnaxData` + 4 pointer size looks like an enough estimation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a bit and think that we should be a little more conservative with the memory that we allocate. My rationale is as follows (if you agree, I make this a doc comment).
We can’t know the size required to hold all
SyntaxDatafor the entire tree until we have traversed it becausetotalNodesonly counts the non-nil syntax nodes in the tree (which require the size ofSyntaxDataplus the size ofAtomicPointerby which theSyntaxDatais referenced from the parent node). Additionally, all layout children that arenilrequire the size ofAtomicPointerbut thesenil-slots are not accounted for intotalNodes, so we need to estimate the ratio ofnilchildren to non-nilchildren. The facts that we can base our estimate on are:
- In a valid syntax tree, every layout node with n children has
n + 1unexpected*children, which arenilfor valid syntax trees, which account for the majority of syntax trees. We can thus expect at least 1nil-child for every non-nil child.- Some layout nodes have optional children, which further increases the number of
nil-children.- Collection nodes don’t have
nilchildren, which offsets the optional children to some degree.Based on that information, we can choose to over- or underestimate:
- Under-estimating the slab size by a constant-ish factor of 2 or 3 is not a big deal because it means that we just need to allocate another 1 or 2 slabs if the user does a full-tree traversal. If the user doesn’t traverse the entire tree, we save a little memory.
- Over-estimating the slab size if the full tree size is less than 4096 means that we allocate memory which we’ll never use.
We thus assume that every non-
nilchild has roughly one child that isnil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep single node trees minimal. Note that single node trees are often created when rewriting TokenSyntax in SyntaxRewriter. MemoryLayout<SyntaxData>.stride (32 bytes) is enough for them. I don't want to waste extra pointers size. So I think we should handle root node specially.
As for the actual estimation, actually measured with our swift-syntax test suite:
- with my estimation logic,
1,043/998,482(0.1%) trees created withslabSize < 4096overflows the estimated slab size, - with your estimation logic,
54,601/1,047,632(5.2%) trees created withslabSize < 4096overflowed - fwiw,
(dataSize + pointerSize * 3) * nodeCountstill overflows in28,417/1,020,774trees.
And I think under-estimating the slab size is more problematic than over-estimating. because it wastes most of the second slab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t think about the single-node trees. That’s a good point and a worthwhile optimization. Maybe worth a comment.
Your analysis is correct under the assumption that the user visits the entire tree, which might not necessarily be true. And it might be worth measuring how much memory in the slabs are not used with the different estimation logics.
Why is the denominator of your measurements different for the different estimation logics? I would expect them to be the same because you should have created the same number of trees by running the test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your analysis is correct under the assumption that the user visits the entire tree, which might not necessarily be true.
FWIW my measurements are based on the actual test cases we have. They might be different from the typical use-cases, but I just wanted to point out that.
It's true that users don't always visit the entire tree, but I don't think some overallocation is that problematic. (of course we want to avoid allocating memory that we know will never be used)
And it might be worth measuring how much memory in the slabs are not used with the different estimation logics.
That would be interesting 👍
Why is the denominator of your measurements different for the different estimation logics? I would expect them to be the same because you should have created the same number of trees by running the test suite.
The denominator is the number of "trees created with slabSize < 4096" which is different between the estimation logics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will revisit this.
03ed9c5 to
8949f20
Compare
hamishknight
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
| } | ||
|
|
||
| return SyntaxIdentifier(rootId: Self.rootId(of: root.raw), indexInTree: indexInTree) | ||
| return SyntaxIdentifier(rootId: UInt(rawID: root.raw.id), indexInTree: indexInTree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but out of curiosity, is there any reason rootId isn't RawSyntax.ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we thought it should be serializable. But at this point, I don't think there's a reason. Actually I was thinking to make it just RawSyntax.ID in a follow up :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you mention serializing, that might have been the reason when we still transferred the syntax tree via XPC or JSON from the C++ parser in 2018-ish.
8949f20 to
7a36dbb
Compare
|
@swift-ci Please test |
Rework `Syntax` internals. "Red" tree is now bump-pointer allocated and visited children are cached. * `Syntax` is a pair of the allocator (strong reference to `SyntaxDataArena` class) and a pointer to the allocated data (`SyntaxData` struct). * All the red-tree data (parent and positions) are bump-pointer-allocated and cached. * The arena and the tree are 1:1. Each mutation creates a new arena. * `child(at:)` is now O(1) because the red-tree children are cached in the arena. * `root` is now O(1) regardless of the depth because the arena holds the reference. * Simplify `SyntaxChildren`. `SyntaxChildrenIndex` is now just a `Int` Remove some hacks as they aren't needed anymore: * `UnownedSyntax` because `SyntaxDataReference` replaces it. * `SyntaxNodeFactory` because `Syntax.Info` is gone. Additionally: * Flatten `AbsoluteSyntaxInfo` to simplify the data and clarify what `advancedBySibling(_:)` and `advancedToFirstChild()` do. * Remove `RawSyntaxChildren` and `RawNonNilSyntaxChildren` as they were implementation detail of `SyntaxChildren`. * Remove `AbsoluteRawSyntax` and `AbsoluteSyntaxPosition` as nobody was really using it. * Rename `Syntax.indexInParent` to `layoutIndexInParent` because it was confusing with `SyntaxProtocol.indexInParent: SyntaxChildrenIndex`
7a36dbb to
8fff0de
Compare
|
@swift-ci Please test |
|
@swift-ci Please test Windows |
Rework
Syntaxinternals. "Red" tree is now bump-pointer allocated and visited children are cached.Syntaxis a pair of the allocator (strong reference toSyntaxDataArenaclass) and a pointer to the allocated data (SyntaxDatastruct).child(at:)is now O(1) because the red-tree children are cached in the arena.rootis now O(1) regardless of the depth because the arena holds the reference.SyntaxChildren.SyntaxChildrenIndexis now just aIntRemove some hacks as they aren't needed anymore:
UnownedSyntaxbecauseSyntaxDataReferencereplaces it.SyntaxNodeFactorybecauseSyntax.Infois gone.Additionally:
AbsoluteSyntaxInfoto simplify the data and clarify whatadvancedBySibling(_:)andadvancedToFirstChild()do.RawSyntaxChildrenandRawNonNilSyntaxChildrenas they were implementation detail ofSyntaxChildren.AbsoluteRawSyntaxandAbsoluteSyntaxPositionas nobody was really using it.Syntax.indexInParenttolayoutIndexInParentbecause it was confusing withSyntaxProtocol.indexInParent: SyntaxChildrenIndexBaseline (main)
Recreate
SyntaxDataArenafor each iteration.Reuse fully populated
SyntaxDataArenafor all iterations.Reuse fully populated
SyntaxDataArena. Combined with #2924