Skip to content

Conversation

@eliasnaur
Copy link
Contributor

This change remove the allocation for Go idioms that temporarily convert a bytes slice to a string. For example string(slice) == "some string". In particular, bytes.Equal no longer allocates two strings.

This change remove the allocation for Go idioms that temporarily convert
a bytes slice to a string. For example `string(slice) == "some string"`.
In particular, `bytes.Equal` no longer allocates two strings.
@eliasnaur eliasnaur marked this pull request as draft October 10, 2025 12:05
Copy link
Member

@aykevl aykevl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to confirm this is indeed correct, I find it hard to understand what's happening here. Perhaps some refactoring would help? Or can you give a slightly higher-level overview of what this optimization does exactly (and why it is correct)?

For example, can you help me understand how the following code for example skips this optimization?

slice := []byte("foo")
str := string(slice)
if true { // new basic block
    slice[0] = 'A'
    return str == "foo"
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really nice to have some tests that are easier to read than LLVM IR. Could you add some tests like transform/testdata/allocs2.go? (This will probably require some refactoring to not duplicate code). This will also show that the optimization works for real Go code, instead of handcrafted IR (that might go out of date).

Comment on lines +194 to +197
stringEqual := mod.NamedFunction("runtime.stringEqual")
if stringEqual.IsNil() {
return
}
Copy link
Member

@aykevl aykevl Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be generalized to any function:

  • That has a function attribute like memory(argmem: read), memory(read), etc (in other words, doesn't modify memory). This should be automatically deduced for runtime.stringEqual but if not we can add it to compiler/symbol.go.
  • Where the string pointer parameter has the parameter attribute nocapture (meaning the pointer parameter is not kept across the function call).

For details, see: https://llvm.org/docs/LangRef.html#fnattrs and https://llvm.org/docs/LangRef.html#paramattrs.

inst = llvm.NextInstruction(inst)
if inst.IsNil() {
// There are uses beyond this basic block.
continue uses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this to be hard to read. Can you refactor this code to avoid the continue uses? Perhaps by moving some of it to a separate function?

@eliasnaur
Copy link
Contributor Author

Thanks for the review, but I think this is redundant; -opt 2 already seems to optimize safe string versions.

I confused myself by an unfortunate combination of forgetting -opt 2 in desktop test runs, and having this guy being heap allocated on my board because MaxStackSize was too low.

@eliasnaur eliasnaur closed this Oct 10, 2025
@aykevl
Copy link
Member

aykevl commented Oct 10, 2025

It does optimize stuff in at least some cases:

 308256  308024   -232  -0.08%   15924   15924      0   0.00% tinygo build -size short -o ./build/test.hex -target=nano-rp2040 -stack-size 8kb ./examples/net/websocket/dial/
 338104  337912   -192  -0.06%   21740   21740      0   0.00% tinygo build -size short -o ./build/test.hex -target=wioterminal -stack-size 8kb ./examples/net/mqttclient/paho/
 347024  346856   -168  -0.05%   16764   16764      0   0.00% tinygo build -size short -o ./build/test.hex -target=pyportal -stack-size 8kb ./examples/net/http-get/
 388388  388228   -160  -0.04%   18772   18772      0   0.00% tinygo build -size short -o ./build/test.hex -target=matrixportal-m4 -stack-size 8kb ./examples/net/webstatic/
 337948  337788   -160  -0.05%   21816   21816      0   0.00% tinygo build -size short -o ./build/test.hex -target=wioterminal -stack-size 8kb ./examples/net/webserver/
 166268  166180    -88  -0.05%    9992    9992      0   0.00% tinygo build -size short -o ./build/test.hex -target=arduino-mkrwifi1010 -stack-size 8kb ./examples/net/tlsclient/
 174580  174516    -64  -0.04%   13840   13840      0   0.00% tinygo build -size short -o ./build/test.hex -target=elecrow-rp2040 -stack-size 8kb ./examples/net/tlsclient/
 122152  122104    -48  -0.04%    8068    8068      0   0.00% tinygo build -size short -o ./build/test.hex -target=nucleo-wl55jc ./examples/lora/lorawan/atcmd/
  67000   66960    -40  -0.06%    9020    9020      0   0.00% tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/flash/console/qspi
  70144   70128    -16  -0.02%    6980    6980      0   0.00% tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/flash/console/spi
  68256   68248     -8  -0.01%    6504    6504      0   0.00% tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/gps/uart/main.go
 263868  263860     -8  -0.00%   46772   46772      0   0.00% tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/slideshow
 103536  103528     -8  -0.01%   10000   10000      0   0.00% tinygo build -size short -o ./build/test.hex -target=metro-m4-airlift -stack-size 8kb ./examples/net/socket/
  21760   21756     -4  -0.02%    3556    3556      0   0.00% tinygo build -size short -o ./build/test.hex -target=bluepill ./examples/ds1307/time/main.go
  18472   18472      0   0.00%    6236    6236      0   0.00% tinygo build -size short -o ./build/test.hex -target=feather-rp2040 ./examples/adafruit4650
  61580   61580      0   0.00%    6180    6180      0   0.00% tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/adt7410/main.go
   8748    8748      0   0.00%    4748    4748      0   0.00% tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/adxl345/main.go
  13556   13556      0   0.00%    6796    6796      0   0.00% tinygo build -size short -o ./build/test.hex -target=pybadge ./examples/amg88xx
   8912    8912      0   0.00%    4748    4748      0   0.00% tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/apa102/main.go
  11868   11868      0   0.00%    6580    6580      0   0.00% tinygo build -size short -o ./build/test.hex -target=nano-33-ble ./examples/apds9960/proximity/main.go
   9172    9172      0   0.00%    4752    4752      0   0.00% tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/apa102/itsybitsy-m0/main.go
   7488    7488      0   0.00%    2320    2320      0   0.00% tinygo build -size short -o ./build/test.hex -target=microbit ./examples/at24cx/main.go
   7996    7996      0   0.00%    4740    4740      0   0.00% tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bh1750/main.go

I can't say they are correct, but it says there is something here that could be optimized.

@eliasnaur
Copy link
Contributor Author

eliasnaur commented Oct 10, 2025

All those builds run with the default -opt z, not -opt 2, right? It seems to me that it's better to add the LLVM optimization that elides allocations to -opt z.

@aykevl
Copy link
Member

aykevl commented Oct 11, 2025

All those builds run with the default -opt z, not -opt 2, right? It seems to me that it's better to add the LLVM optimization that elides allocations to -opt z.

Not sure what you mean by that?
The main difference of -opt=z compared to -opt=2 is how likely the compiler is to inline stuff. We don't intentionally skip any optimizations at -opt=z apart from that.

@eliasnaur
Copy link
Contributor Author

eliasnaur commented Oct 11, 2025

I mean that according to my experiments, this optimization has no additional effect when I build with -opt 2. You say the main difference in optimization levels is inlining. So it seems to that, say, adding //go:inline to runtime.stringFromBytes should have the same effect as this PR?

@aykevl
Copy link
Member

aykevl commented Oct 11, 2025

Yes, that could be it. Can't say for sure. But it's often the case that more inlining enables more compiler optimizations since most compiler optimizations work on a per-function basis and not across functions. (In fact, the main goal of inlining is to enable more of these optimizations, the reduced call overhead is just an added benefit).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants