Skip to content

pkg/discovery: systemd discoverer with less allocs#1426

Merged
javierhonduco merged 2 commits intoparca-dev:mainfrom
marselester:systemd-perf
Mar 14, 2023
Merged

pkg/discovery: systemd discoverer with less allocs#1426
javierhonduco merged 2 commits intoparca-dev:mainfrom
marselester:systemd-perf

Conversation

@marselester
Copy link
Contributor

@marselester marselester commented Mar 8, 2023

Hi! I would like to share some of the results I got with https://github.com/marselester/systemd. There are still things I need to finish though...

The current version of discovery.SystemdDiscoverer makes use of 1.1 MB/op and 32,332 allocs/op to get a list of systemd services along with their PIDs. That accounts for 40% of CPU cycles in the agent, excluding the GC time, see https://pprof.me/e711789. The proposed change gets 13.96 KB/op and 30 allocs/op.

BenchmarkSystemd-2    	       1	1001314916 ns/op	 1163552 B/op	   32332 allocs/op
BenchmarkSystemd2-2   	       1	1003780290 ns/op	   14304 B/op	      30 allocs/op

I ran the benchmarks as follows:

  • temporarily removed pkg/discovery/kubernetes.go or else tests won't run github.com/parca-dev/parca-agent/pkg/cgroup: build constraints exclude all Go files in /vagrant/parca-agent/pkg/cgroup
  • placed the benchmarks into pkg/discovery/systemd_test.go
  • sudo go test -run=^$ -bench=. -benchmem .
Details
// The benchmarks are not pretty, but at least they give a rough idea.

func BenchmarkSystemd(b *testing.B) {
	ctx := context.Background()
	conf := NewSystemdConfig()
	d, err := conf.NewDiscoverer(DiscovererOptions{})
	if err != nil {
		b.Fatal(err)
	}

	up := make(chan<- []*Group)
	ctx, cancel := context.WithTimeout(ctx, time.Second)
	b.Cleanup(cancel)

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		if err = d.Run(ctx, up); err != nil && err != context.DeadlineExceeded {
			b.Fatal(err)
		}
	}
}

func BenchmarkSystemd2(b *testing.B) {
	ctx := context.Background()
	conf := NewSystemd2Config()
	d, err := conf.NewDiscoverer(DiscovererOptions{})
	if err != nil {
		b.Fatal(err)
	}

	up := make(chan<- []*Group)
	ctx, cancel := context.WithTimeout(ctx, time.Second)
	b.Cleanup(cancel)

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		if err = d.Run(ctx, up); err != nil && err != context.DeadlineExceeded {
			b.Fatal(err)
		}
	}
}

I've added fmt.Printf to see whether the current implementation and a new one show the same services/PIDs.
I also tried to stop and start a cron service to make sure that was picked up in the updates.
The output matches.

It would be cool if you guys could have a look and give it a go in a safe environment.

$ make && sudo ./dist/parca-agent 2>&1 | grep --invert-match 'libbpf'
systemd-timesyncd.service 459
[email protected] 1468
systemd-networkd.service 607
unattended-upgrades.service 748
systemd-logind.service 675
dbus.service 663
systemd-journald.service 364
systemd-udevd.service 405
irqbalance.service 668
snapd.service 673
udisks2.service 676
cron.service 36684
polkit.service 32180
systemd-resolved.service 609
packagekit.service 15442
multipathd.service 402
networkd-dispatcher.service 669
[email protected] 686
[email protected] 683
ModemManager.service 32184
ssh.service 711
rsyslog.service 671

@marselester marselester force-pushed the systemd-perf branch 2 times, most recently from a3f9b52 to 9c256cd Compare March 10, 2023 21:47
@marselester marselester marked this pull request as ready for review March 10, 2023 22:14
@marselester marselester requested a review from a team as a code owner March 10, 2023 22:14
@marselester
Copy link
Contributor Author

I've finished working on the systemd package and updated the PR, please review.

The current version makes use of 1163552 B/op, 32332 allocs/op
to get a list of systemd services along with their PIDs.
That accounts for 40% of CPU cycles, excluding the GC time.
The proposed change gets 14304 B/op, 30 allocs/op.
Copy link
Contributor

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ❤️ 💚 💙 💛 💜

I need more time to read the code in https://github.com/marselester/systemd

However, it seems like we have the parity and the benchmark numbers look great 🚀

@javierhonduco
Copy link
Contributor

Looks amazing! Thanks so much for the very thorough work here.

There are still things I need to finish though...

I assume this comment is not up to date, right? :D

Happy to land this change and test it in our machines and in our Demo environment 😄

@brancz
Copy link
Member

brancz commented Mar 14, 2023

Since the scope is dramatically reduced, would it make sense to move the systemd package into the parca-agent repo? Happy to discuss that as a follow up though ...

@marselester
Copy link
Contributor Author

I assume this comment is not up to date, right? :D

@javierhonduco that's right :)

@marselester
Copy link
Contributor Author

@brancz sure, that can be done. It would be good to keep README.md and bench-old.txt within a package to retain a context. Another option is to fork the repository.

@javierhonduco
Copy link
Contributor

Let's merge this to include it in the release we are cutting today and we can fork/move the code later on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants