Skip to content

Conversation

@macmv
Copy link

@macmv macmv commented Oct 5, 2025

Closes #7450.

This PR adds semantic tokens from language servers to the editor.

With semantic tokens on the left, without on the right:
image

I find this a very useful feature for the following reasons:

  • Highlighting functions from the standard library differently from user-defined functions makes code a lot easier to scan at a glance. My eyes can just skip over .map() and not need to worry that it's a user-defined function.
  • Making parameters/local variables a different color is quite handy when I'm trying to parse some dense code.
  • Differentiating function calls/properties (specifically in rust) makes code a lot easier to read at a glance.

Overall, while it is a quality of life feature, I find code a lot easier to understand at a glance with semantic tokens.

There are still some things left to finish:

  • Remove the semantic token range request. I intended to use it for multibuffers, but I think it's simpler (and performs well enough) to just grab all the semantic tokens for each buffer. Semantic token ranges also cannot be diffed, so we'd need to fetch all the semantic tokens if the user modified the multibuffer at all.
  • Improve perf. For large files (10,000+ lines), I am starting to notice a bit of delay when moving the cursor around. This was much worse in my first draft, where I made an anchor for every token, but I still think it can be improved. Right now, the editor feels a lot more clunky, just from that slight extra delay from these tokens.
  • Remote editors. I added the protobufs, but I never actually handled the requests on the server side. I'm not sure if the server should respond with its locally cached tokens or send a fresh request to the LSP server.

Edit: Perf got fixed by clamping the buffer ranges in create_hightlight_endpoints correctly. I don't notice any change now.

Aside from that, it works nicely and highlights all the tokens I could think of.

In terms of conflicting with tree-sitter, these highlights are applied ontop of tree-sitter highlights. Additionally, the LSP spec documents how this should be handled. Within the initialization options of the semantic tokens provider, there is the following option:

	/**
	 * Whether the client uses semantic tokens to augment existing
	 * syntax tokens. If set to `true` client side created syntax
	 * tokens and semantic tokens are both used for colorization. If
	 * set to `false` the client only uses the returned semantic tokens
	 * for colorization.
	 *
	 * If the value is `undefined` then the client behavior is not
	 * specified.
	 *
	 * @since 3.17.0
	 */
	augmentsSyntaxTokens?: boolean;

I set this to true, so language servers won't send us semantic tokens for obvious things like strings or numbers. In particular, here's how rust-analyzer handles that flag: https://github.com/rust-lang/rust-analyzer/blob/4ae99f0150c94f4bdf7192b4447f512ece3546fd/crates/rust-analyzer/src/lsp/to_proto.rs#L734-L752

This is best reviewed commit-by-commit. Each commit builds on the last, and they're somewhat organized so that the project should compile between each commit.

Release Notes:

  • Added Semantic Tokens.

@cla-bot
Copy link

cla-bot bot commented Oct 5, 2025

We require contributors to sign our Contributor License Agreement, and we don't have @macmv on file. You can sign our CLA at https://zed.dev/cla. Once you've signed, post a comment here that says '@cla-bot check'.

@macmv
Copy link
Author

macmv commented Oct 5, 2025

@cla-bot check

@cla-bot cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Oct 5, 2025
@cla-bot
Copy link

cla-bot bot commented Oct 5, 2025

The cla-bot has been summoned, and re-checked this pull request!

@macmv macmv marked this pull request as ready for review October 5, 2025 05:42
@finico
Copy link

finico commented Oct 8, 2025

@macmv First of all, thank you for the work done! I would really like to see this feature in zed.
Was this supposed to be with lazy initialization?
After opening the file and navigating the highlight is only basic via tree-sitter and the semantic one is rendered after any change (I tried it in several languages, it does not depend on LSP)

@macmv
Copy link
Author

macmv commented Oct 8, 2025

Right now I only wired up the semantic tokens to the Editted event. I've been tinkering with fetching semantic tokens when an editor is created as well, but that isn't always consistent when a language server is still starting, so it requires a bit more tinkering to get right.

Copy link
Contributor

@SomeoneToIgnore SomeoneToIgnore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review — I was working on inlay hints to provide an example how such range-related LSP things are planned to be cached and used in the system.

I was not too meticulous about the protocol part, as hope @Veykril will check the PR too at some point.

Overall, seems like a step forward compared to #27556 as we have delta support and the editor-rendering part is somewhat more elaborate here.

I think it's perfectly fine to ignore the proto part for now (albeit there are collab/src/editor_tests.rs and local ssh to test on locally and progress on that).

To resume my comments, I think those steps need to be done:

  • consolidate all LSP data caching and code near LspStore instead of editor to share it — consider range requests
  • iron out editor store: when things are updated, range request story part (invalidation is somewhat a pain for inlay hints)
  • add refresh support and multi buffer test

I think, we can try and consider inlay hints, chunks and whatever related logic added recently: it's not fully ironed out to be a generic storage for semantic tokens, but the LspStore part seems good enough — editor part seems ok to start off from.

Besides that, one huge gap here is the user-facing part of the feature:

  • no settings (in default.json file we have a bunch of scattered entries, we need to pick something near lsp and highlights semantically, and add a way to turn things on/off, switch between tree-sitter, etc.)
  • no documentation (we document settings mainly)
  • seems worthy to add another entry to
Image

for the feature visibility.

  • last but not least, it would be superb to be able to do things like
"editor.semanticTokenColorCustomizations": {
    "enabled": true,
    "rules": {
      "unresolvedReference": {
        "foreground": "#c93f3f",
        "fontStyle": "bold"
      }
    }
  },

as VSCode allows.
This snippet turns rust-analyzer's highlights into a quite good and useful diagnostic.


I plan to work on inlay hints (there's an invalidation bug somewhere) and rainbow brackets further, so no active work and thinking on this PR yet.
Happy to discuss things further.

}
}
}
highlight_endpoints.sort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With

impl Ord for HighlightEndpoint {
    fn cmp(&self, other: &Self) -> cmp::Ordering {
        self.offset
            .cmp(&other.offset)
            .then_with(|| self.style.is_some().cmp(&other.style.is_some()))
    }
}

this part of code starts to become more dangerous with each extra semantics layer added.

The main question here to me is: how to unite both tree-sitter and semantics highlight so that people can customize them?

It is not unrealistic to assume there are people who want 0 tree-sitter used in such highlights: augmentsSyntaxTokens was mentioned, but no code is added to the configuration level yet.

Would it make sense to use a similar approach

// Use LSP tasks over Zed language extension ones.
// If no LSP tasks are returned due to error/timeout or regular execution,
// Zed language extension tasks will be used instead.
//
// Other Zed tasks will still be shown:
// * Zed task from either of the task config file
// * Zed task from history (e.g. one-off task was spawned before)
//
// Default: true
"prefer_lsp": true

to highlights too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I figured doing what "the lsp spec suggests" would be a good start, but configurability would be best.

How does this sound: add two flags, one to enable tree-sitter highlights and one to enable semantic token highlights. Then we enable augmentSyntaxTokens only when both are enabled and layer semantic token highlights on top of tree-sitter highlights. This behavior can all be documented in default.json.

Additionally, enabling semantic token highlights would be a per-language-server setting, and toggling tree-sitter highlights would be a per-language setting (both with global defaults).

I realize users could theoretically want tree-sitter highlights on top of semantic tokens, but at that point, I feel like they should disable semantic tokens for a specific language server instead. I hesitate to add more configuration than necessary, just because all the combinations of settings become difficult to test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add two flags, one to enable tree-sitter highlights and one to enable semantic token highlights

This sounds good.

To agree more with your last paragraph, I'd try to use a single one with some enum values — the main idea here is to have some "fallback" by default, so we start with semantic highlights on always, but if those are not supported by the language's server, we can fall back to tree-sitter.

I do not insist on this much if it starts to get hard to implement, but in the end it seems reasonable to have something like this (same Zed does with prettier and those tasks mentioned above).

This behavior can all be documented in default.json.

Great that it will be done, just want to mention that this file is 50% of the docs, there's also configuring-zed.md and other docs/ entries that get published to website and might be more visible for certain kinds of people, hence we should have both updated.

enabling semantic token highlights would be a per-language-server setting

With that, I'd rather not agree: we can as well make it language-dependent and spare users from remembering server names.
Also, somewhat plays towards the unified setting, if it works out.

}
let token_range = excerpt.map_range_from_buffer(token_range);

if token_range.end <= range.start || token_range.start >= range.end {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range is an exclusive range, but it does include its start — seeing both ends considered here is a bit odd, is it supposed to be so?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point, I think < and > are correct

}

#[derive(Debug, Default)]
pub struct SemanticTokenView {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this repository, View is rather common name for entities that are displayed on-screen: there's ToolbarItemView, ModalView, StatusItemView, etc.

Here, we deal with one such "view" called editor element, where small parts of it are being colored differently, so the name seems a bit off.

Maybe, BufferSemanticTokens or SemanticTokens?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up needing BufferSemanticTokens in lsp store, so how does PositionedSemanticTokens sound?

let buffer_range = excerpt
.buffer()
.range_to_version(excerpt.map_range_to_buffer(range.clone()), &tokens.version);
for token in tokens.tokens_in_range(buffer_range) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokens_in_range

I know that certain highlights (e.g. word highlight on hover) are applied after a debounce.
Based on my experience of fixing various obscure bugs like #40738 , it's not unreasonable to assume that editing and highlights might interleave in such manner, that there is a buffer's Global version mismatch between what's highlighted and what's stored in the data.

Here, it seems relatively harmless as would not panic due to anchor/snapshot manipulations, but still potentially buggy?
If so, some extra version check could help — ideally, accessors like

/// Gets the most recent LSP data for the given buffer: if the data is absent or out of date,
/// new [`BufferLspData`] will be created to replace the previous state.
pub fn latest_lsp_data(&mut self, buffer: &Entity<Buffer>, cx: &mut App) -> &mut BufferLspData {
let (buffer_id, buffer_version) =
buffer.read_with(cx, |buffer, _| (buffer.remote_id(), buffer.version()));
let lsp_data = self
.lsp_data
.entry(buffer_id)
.or_insert_with(|| BufferLspData::new(buffer, cx));
if buffer_version.changed_since(&lsp_data.buffer_version) {
*lsp_data = BufferLspData::new(buffer, cx);
}
lsp_data
}

could help?

dynamic_registration: Some(true),
}),
semantic_tokens: Some(SemanticTokensClientCapabilities {
dynamic_registration: Some(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check: I do not see register_server_capabilities adjusted, but it should be, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point, I missed that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't entirely understand register_server_capabilities for semantic tokens. Will the client send a registration for textDocument/semanticTokens/? or textDocument/semanticTokens/full?

I can't really find how this works in the spec, and I'm not sure how to go about testing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic registrations are somewhat convoluted indeed, I can only propose to check out #41929

I expected that to be very similar to

"textDocument/documentSymbol" => {
let options = parse_register_capabilities(reg)?;
server.update_capabilities(|capabilities| {
capabilities.document_symbol_provider = Some(options);
});
notify_server_capabilities_updated(&server, cx);
}

Worst case — we can use Some(false) in the dynamic registration capability part.

// We apply semantic tokens over tree-sitter highlighting, so
// let the server know that we are augmenting existing tokens.
augments_syntax_tokens: Some(true),
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should declare workspace/refresh support too and implement it, as we do that already for inlays and other cases.

#[derive(Default, Debug, Clone)]
pub struct SemanticTokens {
/// Each value is:
/// data[5*i] - deltaLine: token line number, relative to the start of the previous token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels worth a few unit test around this logic (maybe, on a level above, but somewhere, we do have to test delta and other cases, e.g. the one described in a spec?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding tests both on the LSP part the "unit" tests on logic from this module.

Yet, I have failed to find any "unit" tests on applying delta requests on top of the partial one, and the spec seem to describe some example?

That looked somewhat complex to me, and if I'm not confusing things, would be great to add another test.


#[derive(Debug, Default)]
struct SemanticTokensData {
result_id: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd to see this optional, we do know which server did we send the request to and got the response from.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, agreed. This was just so I could #[derive(Default)]. Now that LspData exists, it should be easy to get rid of this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I got this mixed up with server_id. The result_id does need to be optional: it's an optional field in the spec. If a server doesn't support deltas, it won't send back result ids. But I have a similar server_id field in the SemanticTokens struct, which I will remove the optional from.

})
} else {
let lsp_request_task =
self.request_lsp(buffer, LanguageServerToQuery::FirstCapable, request, cx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So good to see full and delta support, thank you.

This is somewhat debatable though: in the world of frontend, a single *.ts file may have various language servers associated (tailwind-language-server, typescript-language-server, eslint, etc.) and it's possible to expect any of them may server additional tokens too?

I would repeat what current main does for inlay hints.

@macmv
Copy link
Author

macmv commented Oct 26, 2025

Thank you so much for the detailed review! It'll take me a few days to rebase/update everything, and this looks like a bunch of interesting stuff to read up on.

I'm aware of the missing user-facing configs, and I think I'll try and tackle that while I'm updating these bits. This was mostly meant as a POC, as I was still just interested to see if this was possible. Being able to customize language-server-specific tokens looks super useful.

I'd also definitely like to omit the proto part for now. This PR is getting quite long. Should I keep what I have already? Or just cut out the new protobufs entirely?

And finally, do you prefer rebases or merges?

Copy link
Contributor

@SomeoneToIgnore SomeoneToIgnore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me eons to get here, no rush (albeit now I'll try to keep up with the pace and review further changes timely).

Or just cut out the new protobufs entirely?

I'm not sure how possible this is, given that we'll have to use Lsp* infra?
We have to use LspQuery for properly made local requests, that drags some proto types one way or another.
I think we could literally copy paste document colors/inlays here: the latter is a bit more complex as involves some ranges handling, and the former is more or less the same modulo the LSP data is different.

I do not mind removing the layer/ignoring it until the very end if you find the way — in the end, we'll have to get a collab/src/tests/editor_tests.rs entry for remote functionality testing, which I can help with.

do you prefer rebases or merges

I think merges are more realistic given that this branch will be here for a while, but no preferences really — I prefer to concentrate on the feature itself and its semantics more.

}
}
}
highlight_endpoints.sort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add two flags, one to enable tree-sitter highlights and one to enable semantic token highlights

This sounds good.

To agree more with your last paragraph, I'd try to use a single one with some enum values — the main idea here is to have some "fallback" by default, so we start with semantic highlights on always, but if those are not supported by the language's server, we can fall back to tree-sitter.

I do not insist on this much if it starts to get hard to implement, but in the end it seems reasonable to have something like this (same Zed does with prettier and those tasks mentioned above).

This behavior can all be documented in default.json.

Great that it will be done, just want to mention that this file is 50% of the docs, there's also configuring-zed.md and other docs/ entries that get published to website and might be more visible for certain kinds of people, hence we should have both updated.

enabling semantic token highlights would be a per-language-server setting

With that, I'd rather not agree: we can as well make it language-dependent and spare users from remembering server names.
Also, somewhat plays towards the unified setting, if it works out.

// [2]: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#semanticTokenTypes
let choices: &[&str] = match self.token_type(token_type)? {
// Types
"namespace" => &["namespace", "module", "type"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess here I lack the context — overall, there's definitely some configuration has to happen (as I've posted that VSCode token settings snippet for r-a) so not opposed to the idea overall.


fn semantic_tokens(
&self,
buffer_handle: Entity<Buffer>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deltas only work on the entire buffer. So, if the server supports deltas, range supports are basically never useful (as ranges cannot be diffed).

Did not know that, thank you: that complicates things even more, technically, as multi buffers come into play.

I's quite sad if we do some project search that hits many large files — then, when we scroll their excerpts into view, we will have to load the entire file to show a small (<20 lines, usually) excerpt per hit.

Also, the "what if some slow server starts to choke on large files" concern is still here.


Given such strong delta separation though, I think it's fine to move on now and concentrate on full files + deltas, this is a big change alone.

This also simplifies all caching/metadata quite a lot, given that we have no ranges to deal with.

But then, let's remove the range: Some(true), from our LSP capabilities

cc @Veykril as this is quite a pivoting point then, and you know the spec shenanigans here better

@SomeoneToIgnore
Copy link
Contributor

Ok, given #39539 (comment) I think we're getting quite close:

  • let's ignore all range-related shenanigans that inlays do, same as all range-related semantic tokens queries
  • and, instead, concentrate on what document highlights do

LspStore and caching is also done for those, and update strategies seem to be similar to what colors do.

@macmv
Copy link
Author

macmv commented Oct 31, 2025

Alrighty, that sounds good. I'll polish this up and update this soon.

@macmv
Copy link
Author

macmv commented Nov 1, 2025

I've posted a new version. This debounces updates in LspStore and aggregates semantic tokens from multiple servers correctly. This largely copies the behavior of document tokens. Notably, when requesting from multiple language servers, semantic tokens are only shown once all servers reply. For each server, it decides to send a full/delta request accordingly.

I added some more tests:

  • One test has a multibuffer that shows the same buffer twice. This test shows that the deduplication in lsp store is definitely needed, and it works as expected.
  • One test has a buffer with two language servers attached, and this tests aggregating tokens from both servers. The data structures are in place to make this deterministic (there's an IndexMap<LanguageServerId, BufferSemanticTokens>), but right now the server that takes priority is the one that responds slowest (I think?).
  • One test has a multibuffer with two different languages and language servers attached to each. This is mostly just combining everything and seeing what happens.

I still need the update_semantic_tokens task in the editor so that I can aggregate all the buffer changes for multibuffers (as LspStore is only concerned with individual buffers). This task is not debounced at all, as it relies on the lsp store debouncing.

Additionally, latest_lsp_data tripped me up a bit, as it clears the previous lsp data every time the buffer changes. I changed this to keep semantic tokens around when updating the BufferLspData, but I'm not sure if that makes sense.

And next, I'm going to work on configs. I haven't started that yet.

@reflectronic
Copy link
Member

Just leaving my two cents: I hope that we are not locking ourselves out of implementing textDocument/semanticTokens/range support in the future. In particular, I know that the C# language server does not support the delta method at this time (dotnet/roslyn#70536). My understanding is that delta performs poorly with large files, because it still requires computing semantic analysis for the entire file on each edit. The range request allows the computations to be limited to the visible section of the document (e.g. if semantic analysis is implemented in a lazy manner, like it is in the C# compiler).

Copy link
Contributor

@SomeoneToIgnore SomeoneToIgnore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superb, the new tests and most of the LspStore-related changes look solid, we're definitely on the right track.

One small concern is the text.rs-related changes, feels that we can inline this all into highlights-related code?

Additionally, latest_lsp_data tripped me up a bit, as it clears the previous lsp data every time the buffer changes. I changed this to keep semantic tokens around when updating the BufferLspData, but I'm not sure if that makes sense.

This seems somewhat odd, I've commented around the potentially hacky place to discuss it more.
Just to note, there's current_lsp_data to look up what's stored inside without invalidating, so that should help with whatever remaining issues, it seems.

And next, I'm going to work on configs. I haven't started that yet.

One small note here (given the C#-related language server comment above): maybe, we should disable semantic highlights by default, until we support range requests.
If we add the docs and that editor menu item I've mentioned before, that should be discoverable enough for the time being.

};

let buffer_id = buffer.read(cx).remote_id();
// `semantic_tokens` gets debounced, so we can call this frequently.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have descended down into lsp_store.rs and did not notice any new .timer( call, can you point me to the debounce?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yeah I didn't add the .timer() call. i'll put it in lsp store.

dynamic_registration: Some(true),
}),
semantic_tokens: Some(SemanticTokensClientCapabilities {
dynamic_registration: Some(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic registrations are somewhat convoluted indeed, I can only propose to check out #41929

I expected that to be very similar to

"textDocument/documentSymbol" => {
let options = parse_register_capabilities(reg)?;
server.update_capabilities(|capabilities| {
capabilities.document_symbol_provider = Some(options);
});
notify_server_capabilities_updated(&server, cx);
}

Worst case — we can use Some(false) in the dynamic registration capability part.

false
}

pub fn offset_to_version(&self, offset: usize, version: &clock::Global) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not really used anywhere but in tests?

range_from_version is used only once in the highlights-related places, can we have a function there instead of adding any public API here?

Those .unwrap() look somewhat concerning too — let's use Option and whatever else instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure thing, I'll move it. I think it started out needing private APIs from BufferSnapshot, but it doesn't now. I'll move it up to where highlights are built.

.entry(buffer_id)
.or_insert_with(|| BufferLspData::new(buffer, cx));
if buffer_version.changed_since(&lsp_data.buffer_version) {
// Semantic tokens survive buffer version changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems odd and somewhat hacky: let's discuss things more.

Presumably, all this is needed to avoid flickering?
If so, can we do things similar to inlays/document colors, where we invalidate data & tasks in LspStore as soon as the version mismatch is found; yet we keep the old data in the Editor instead, to use in the rendering/wherever, and only update that after the next fetch task gets back the results.

Or, is it so that we're able to get proper deltas for changed documents?
Then, not hacky at all, just not obvious — let's mention that specifically instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is for deltas to work at all. It uses the result_id of the semantic tokens to send the delta request. Even if I only use current_lsp_data, other calls to latest_lsp_data (like in document colors) remove the previous result_id, and we don't end up sending delta requests.

I'll specifically call that out in a comment.

@macmv
Copy link
Author

macmv commented Nov 5, 2025

not locking ourselves out of implementing textDocument/semanticTokens/range support in the future

I don't think this is locking out of supporting range requests, but for servers that don't support deltas, I feel like this requires an entirely different approach. I'd imagine we would want an editor to query for specific ranges, as the scroll position of the editor changes the tokens it sees. That also means that as the editor scrolls, it would need to request additional ranges and then merge the tokens between multiple queries for rendering. And then, when the editor is modified, all the tokens would just get discarded. I think there is very little overlap between range logic and the delta logic that I've implemented.

Oh, and I think minimap rendering would need to be changed. If the minimap is open, requesting "only the visible range" ends up being a large portion of the file anyways.

I'm happy to disable semantic tokens until it's more complete. I don't want to bring language servers grinding to a halt with a new update. Alternatively, we could disable semantic tokens by default on servers where this would be a problem, and leave it enabled for other servers by default.

@macmv
Copy link
Author

macmv commented Nov 5, 2025

I added some configuration. I couldn't find a toggle for tree-sitter highlights, so I just added a semantic_tokens: bool on the language configuration. Then, within an editor, semantic token requests will only be made for languages that have the flag enabled. It is defaulted to false.

Additionally, in the editor controls, I added a toggle for semantic tokens within that editor:
image

If that toggle is disabled, no semantic tokens will be rendered. If it is enabled, then semantic tokens will be rendered for any languages that have the semantic_tokens setting enabled.

I added a similar toggle to the debug syntax tree:
image

This one looks a little bare, being the only setting, but I figured a toggle button would look odd.

Finally, I wired up refreshes to all the appropriate events, so now semantic tokens show up right when you load the editor.

@SomeoneToIgnore
Copy link
Contributor

I did not dive into the code too much since last review, but in general looks very close to a good first iteration on the feature, nice!

To double check: is token configuration like #42030 (and the one I've mentioned earlier, related to ra) something that will be a part of the configuration-related changes?

If not, what's the hard part there, given that we seem to have all the hardcoded values listed?

@macmv
Copy link
Author

macmv commented Nov 5, 2025

I wasn't planning on adding that, but then it turned out to be easier than I thought. I just got it working with hardcoded colors (ie, you can specify an RGB color in the config). It works quite nicely with the example you posted:

image

The above is with the following config:

  "lsp_semantic_token_customizations": {
    "enabled": true,
    "rules": [
      {
        "token_type": "unresolvedReference",
        "foreground_color": "#c93f3f"
      }
    ]
  },

Each rule has the following properties:

  • token_type: The LSP semantic token type to customize.
  • foreground_color: The foreground color to use for the token type.
  • background_color: The background color to use for the token type.
  • underline: A color to underline the type with.
  • font_style: One of "normal", "bold", "italic", or "bold_italic".

I think I'll add a token_modifiers item to each rule, so you can match on those as well. But this doesn't hook into the current highlighting theme. I suppose I could add a theme key into those rules, which would be used when colors aren't specified? I'm not quite sure what this should look like.

@SomeoneToIgnore
Copy link
Contributor

Wow, that's quite the first step for the tokens feature, I'd say.

I wonder if lsp_semantic_token_customizations could be semantic_tokens under "global_lsp_settings" key?
Otherwise, the propsal looks good, same as the theme custom key: as an alternative (not insisting) we could use null as "nothing, fallback to theme" special value?

I would also double check what VSCode does: we have vscode_import.rs module, where it should not be too hard to do the same for the tokens' settings, hopefully.
We do not have to support the whole set of those (would be curious to know the discrepancies, though), just what we have already.

The best part would be to document this both in the json and md docs — I can help with that, if needed, same as the RPC part.
If any tests start to fail suddenly, I can help with that: I think that should be straightforward, except maybe a need to write

cx.executor().advance_clock(Duration::from_secs(LSP_QUERY_TIMEOUT));
cx.run_until_parked();

in places to ensure the tokens are queried for.

Will wait a week or so, and also @Veykril 's review as we're getting close and would be great to see if we've missed anything/confirm the direction.

@macmv
Copy link
Author

macmv commented Nov 5, 2025

Thanks! I'm pretty happy with it.

Oh, yes, that's a better name.

I don't quite understand the null fallback for theme. I was suggesting adding a theme key to each rule, which would be something like ["symbol", "type.parameter"]. Each of those keys would get passed to SyntaxTheme::get_opt, and the first matching HighlightStyle is what gets used. Maybe that should be called style_key or something?

Locally, I've got vscode imports wired up for editor.semanticTokenCustomizations. That doesn't cover all the builtin tokens though, as I think we'd like some reasonable defaults. I think the rule matching in vscode is done with a regex, whereas I just opted to match token names by complete strings. Aside from that, the rules I've defined support more features than vscode. For example, these rules allow changing the underline and strikethrough colors.

To make this easy to use, can we just make setting the semantic_tokens field extend the rules from default.json? I worry that we need some sort of default mapping from semantic tokens to highlight themes, with some ways for users to change it. But I don't want users to need to copy that entire mapping over to modify it.

Oh that's good to know. I think the only thing that broke is when the editor was first created, it fetches semantic tokens now, so I had to change the tests slightly.

Yup, I'll add it to the markdown docs. I just forgot to add those.

@SomeoneToIgnore
Copy link
Contributor

Sorry, no good knowledge on themes.
Overall, whatever allows minimal changes for both user and development seems great, so the approach seems rightly aligned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The user has signed the Contributor License Agreement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support LSP Semantic Tokens

5 participants