Conversation
Leeaandrob
left a comment
There was a problem hiding this comment.
PR #191 Deep Review — @ukpratik
Hey @ukpratik, thanks for identifying a real concurrency bug in the logger. The original code had an unprotected read of logger.file which is a data race when DisableFileLogging() is called concurrently. Your fix moves in the right direction, but introduces a new race condition that's actually more dangerous than the original.
Roadmap Context
This fix indirectly relates to Roadmap Item #346 ("Prevent memory bloat", priority: MEDIUM) — goroutine contention on a broken logger can cause goroutine pileup and memory bloat under high concurrency.
No conflict with any In Progress roadmap items.
Summary of Findings
CRITICAL (Must Fix)
- C1: Use-after-close race condition (TOCTOU) —
DisableFileLogging()can close the file between the pointer snapshot and theWriteStringcall (see inline comment on line 122)
HIGH (Should Fix)
- H1: No concurrent test — the fix claims to solve a concurrency bug but adds zero tests exercising concurrent paths
- H2: The
writeMuadds complexity when extendingmu.RLock()scope would be simpler
MEDIUM (Pre-existing, not introduced by this PR)
- M1:
currentLeveldata race atlogMessage()line 101 — readscurrentLevelwithout holdingmu.RLock(), butSetLevel()writes it undermu.Lock(). This is technically a data race per Go's memory model. Since you're improving the concurrency model here, consider fixing it too by acquiringmu.RLock()at the top oflogMessage()and holding it through the file write — this kills both the TOCTOU race (C1) and the currentLevel race in one shot.
POSITIVE
- Correct problem identification — the original
logger.fileaccess was indeed unprotected - Error handling added for
WriteStringfailure (previously silent) - Small, focused change — only 1 file modified
Verdict: REQUEST CHANGES
What needs to change before merge:
- Fix the use-after-close race by holding
mu.RLock()through the entire write operation (see C1 inline comment for the exact fix) - Add a concurrent test with
go test -racethat validates multiple goroutines can log simultaneously whileEnableFileLogging/DisableFileLoggingis called - Consider simplifying the two-mutex design
Estimated effort: ~1-2 hours. The fix is straightforward once the race is understood. Happy to re-review.
pkg/logger/logger.go
Outdated
| if logger.file != nil { | ||
| mu.RLock() | ||
| fileptr := logger.file | ||
| mu.RUnlock() |
There was a problem hiding this comment.
[C1 — CRITICAL: Use-After-Close Race (TOCTOU)] @ukpratik — This is the core problem with the fix. By releasing mu.RUnlock() here before the actual WriteString call (line 128), you create a Time-of-Check-Time-of-Use race:
Race Timeline:
1. Goroutine A (logging): mu.RLock() -> fileptr = logger.file (non-nil) -> mu.RUnlock()
2. Goroutine B (disable): mu.Lock() -> logger.file.Close() -> logger.file = nil -> mu.Unlock()
3. Goroutine A (logging): writeMu.Lock() -> fileptr.WriteString(...) -> WRITES TO CLOSED FD
On POSIX systems, after Close() the file descriptor number gets recycled. If another goroutine opens a file between steps 2 and 3, fileptr.WriteString could write to a completely different file — this is worse than the original bug.
The error check on line 130 would catch os.ErrClosed, but the damage (writing to a recycled fd) happens before the check.
The correct fix — hold mu.RLock() through the entire write:
mu.RLock()
if logger.file != nil {
jsonData, err := json.Marshal(entry)
if err == nil {
logger.writeMu.Lock()
_, err = logger.file.WriteString(string(jsonData) + "\n")
logger.writeMu.Unlock()
if err != nil {
log.Println("Failed to write to file:", err.Error())
}
}
}
mu.RUnlock()This way, DisableFileLogging() (which takes mu.Lock()) blocks until all in-flight writes complete. Since mu.RLock() allows multiple concurrent readers, log writes don't block each other — only Enable/Disable/SetLevel (rare operations) would wait.
Ref: This pattern follows Go's log.Logger.Output() from the standard library.
There was a problem hiding this comment.
@Leeaandrob that's nice observation, thanks for pointing it out.
I have fixed it now
pkg/logger/logger.go
Outdated
| type Logger struct { | ||
| file *os.File | ||
| file *os.File | ||
| writeMu sync.Mutex |
There was a problem hiding this comment.
[H2 — HIGH: Consider Simplifying] @ukpratik — With the correct fix (holding mu.RLock() through the write), the question is: do we still need writeMu?
With mu.RLock() held during write, multiple goroutines can enter the write section simultaneously (RLock allows concurrent readers). Their WriteString calls could theoretically interleave.
In practice, Go's os.File.Write maps to a single write(2) syscall. On POSIX, individual write() calls to regular files are atomic for reasonable sizes. For typical JSON log lines (< 4KB), interleaving won't happen.
However, keeping writeMu as a belt-and-suspenders measure is reasonable for portability (Windows doesn't guarantee atomic writes). If you keep it, please document why:
type Logger struct {
file *os.File
writeMu sync.Mutex // serializes writes to prevent interleaving on non-POSIX systems
}There was a problem hiding this comment.
@Leeaandrob Thanks for this.
I have now added comment for using this writeMu, and I think we should keep this mutex since for writeMu and mu both have different purpose and not every time the write operation on file can be atomic, which might introduce issue in future.
pkg/logger/logger.go
Outdated
| if err == nil { | ||
| logger.file.WriteString(string(jsonData) + "\n") | ||
| logger.writeMu.Lock() | ||
| _, err = fileptr.WriteString(string(jsonData) + "\n") |
There was a problem hiding this comment.
[Positive] @ukpratik — Good addition: capturing the return value of WriteString and logging the error. The original code completely ignored write failures, which could silently lose log data. This is a genuine improvement regardless of the concurrency issues.
pkg/logger/logger.go
Outdated
| jsonData, err := json.Marshal(entry) | ||
| if err == nil { | ||
| logger.file.WriteString(string(jsonData) + "\n") | ||
| logger.writeMu.Lock() |
There was a problem hiding this comment.
[H1 — HIGH: No Concurrent Test] @ukpratik — This PR fixes a concurrency bug but adds zero tests. The existing test suite (logger_test.go) has no concurrent tests at all — no goroutines, no race detector validation, no file logging tests.
A minimal test that would catch the race:
func TestConcurrentFileLogging(t *testing.T) {
dir := t.TempDir()
logFile := filepath.Join(dir, "test.log")
if err := EnableFileLogging(logFile); err != nil {
t.Fatal(err)
}
defer DisableFileLogging()
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
wg.Add(1)
go func(n int) {
defer wg.Done()
InfoCF("test", fmt.Sprintf("msg-%d", n), map[string]interface{}{"n": n})
}(i)
}
wg.Wait()
data, err := os.ReadFile(logFile)
if err != nil {
t.Fatal(err)
}
lines := strings.Split(strings.TrimSpace(string(data)), "\n")
if len(lines) != 100 {
t.Errorf("expected 100 log lines, got %d", len(lines))
}
}Run with go test -race ./pkg/logger/... to catch data races under the detector.
There was a problem hiding this comment.
@Leeaandrob I have added this test, and while trying to add -race in Makefile test, I found few other test were breaking, so created new make test-race for logger only as of now, later we can modify it for all the test, please check that part and do let me know what you think
|
Please re-submit using the PR template @ukpratik |
|
closing this, will raise another PR with all the fixes and using the PR template |
Added per-logger writeMu mutex to ensure safe concurrent file writes