Skip to content

fix: minor fix in logger file writing#191

Closed
ukpratik wants to merge 0 commit intosipeed:mainfrom
ukpratik:main
Closed

fix: minor fix in logger file writing#191
ukpratik wants to merge 0 commit intosipeed:mainfrom
ukpratik:main

Conversation

@ukpratik
Copy link

Added per-logger writeMu mutex to ensure safe concurrent file writes

@ukpratik ukpratik changed the title minor fix in logger file writing fix: minor fix in logger file writing Feb 15, 2026
Copy link
Collaborator

@Leeaandrob Leeaandrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #191 Deep Review — @ukpratik

Hey @ukpratik, thanks for identifying a real concurrency bug in the logger. The original code had an unprotected read of logger.file which is a data race when DisableFileLogging() is called concurrently. Your fix moves in the right direction, but introduces a new race condition that's actually more dangerous than the original.


Roadmap Context

This fix indirectly relates to Roadmap Item #346 ("Prevent memory bloat", priority: MEDIUM) — goroutine contention on a broken logger can cause goroutine pileup and memory bloat under high concurrency.

No conflict with any In Progress roadmap items.


Summary of Findings

CRITICAL (Must Fix)

  • C1: Use-after-close race condition (TOCTOU) — DisableFileLogging() can close the file between the pointer snapshot and the WriteString call (see inline comment on line 122)

HIGH (Should Fix)

  • H1: No concurrent test — the fix claims to solve a concurrency bug but adds zero tests exercising concurrent paths
  • H2: The writeMu adds complexity when extending mu.RLock() scope would be simpler

MEDIUM (Pre-existing, not introduced by this PR)

  • M1: currentLevel data race at logMessage() line 101 — reads currentLevel without holding mu.RLock(), but SetLevel() writes it under mu.Lock(). This is technically a data race per Go's memory model. Since you're improving the concurrency model here, consider fixing it too by acquiring mu.RLock() at the top of logMessage() and holding it through the file write — this kills both the TOCTOU race (C1) and the currentLevel race in one shot.

POSITIVE

  1. Correct problem identification — the original logger.file access was indeed unprotected
  2. Error handling added for WriteString failure (previously silent)
  3. Small, focused change — only 1 file modified

Verdict: REQUEST CHANGES

What needs to change before merge:

  • Fix the use-after-close race by holding mu.RLock() through the entire write operation (see C1 inline comment for the exact fix)
  • Add a concurrent test with go test -race that validates multiple goroutines can log simultaneously while EnableFileLogging/DisableFileLogging is called
  • Consider simplifying the two-mutex design

Estimated effort: ~1-2 hours. The fix is straightforward once the race is understood. Happy to re-review.

if logger.file != nil {
mu.RLock()
fileptr := logger.file
mu.RUnlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[C1 — CRITICAL: Use-After-Close Race (TOCTOU)] @ukpratik — This is the core problem with the fix. By releasing mu.RUnlock() here before the actual WriteString call (line 128), you create a Time-of-Check-Time-of-Use race:

Race Timeline:

1. Goroutine A (logging):  mu.RLock() -> fileptr = logger.file (non-nil) -> mu.RUnlock()
2. Goroutine B (disable):  mu.Lock() -> logger.file.Close() -> logger.file = nil -> mu.Unlock()
3. Goroutine A (logging):  writeMu.Lock() -> fileptr.WriteString(...) -> WRITES TO CLOSED FD

On POSIX systems, after Close() the file descriptor number gets recycled. If another goroutine opens a file between steps 2 and 3, fileptr.WriteString could write to a completely different file — this is worse than the original bug.

The error check on line 130 would catch os.ErrClosed, but the damage (writing to a recycled fd) happens before the check.

The correct fix — hold mu.RLock() through the entire write:

mu.RLock()
if logger.file != nil {
    jsonData, err := json.Marshal(entry)
    if err == nil {
        logger.writeMu.Lock()
        _, err = logger.file.WriteString(string(jsonData) + "\n")
        logger.writeMu.Unlock()
        if err != nil {
            log.Println("Failed to write to file:", err.Error())
        }
    }
}
mu.RUnlock()

This way, DisableFileLogging() (which takes mu.Lock()) blocks until all in-flight writes complete. Since mu.RLock() allows multiple concurrent readers, log writes don't block each other — only Enable/Disable/SetLevel (rare operations) would wait.

Ref: This pattern follows Go's log.Logger.Output() from the standard library.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leeaandrob that's nice observation, thanks for pointing it out.
I have fixed it now

type Logger struct {
file *os.File
file *os.File
writeMu sync.Mutex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[H2 — HIGH: Consider Simplifying] @ukpratik — With the correct fix (holding mu.RLock() through the write), the question is: do we still need writeMu?

With mu.RLock() held during write, multiple goroutines can enter the write section simultaneously (RLock allows concurrent readers). Their WriteString calls could theoretically interleave.

In practice, Go's os.File.Write maps to a single write(2) syscall. On POSIX, individual write() calls to regular files are atomic for reasonable sizes. For typical JSON log lines (< 4KB), interleaving won't happen.

However, keeping writeMu as a belt-and-suspenders measure is reasonable for portability (Windows doesn't guarantee atomic writes). If you keep it, please document why:

type Logger struct {
    file    *os.File
    writeMu sync.Mutex // serializes writes to prevent interleaving on non-POSIX systems
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leeaandrob Thanks for this.
I have now added comment for using this writeMu, and I think we should keep this mutex since for writeMu and mu both have different purpose and not every time the write operation on file can be atomic, which might introduce issue in future.

if err == nil {
logger.file.WriteString(string(jsonData) + "\n")
logger.writeMu.Lock()
_, err = fileptr.WriteString(string(jsonData) + "\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Positive] @ukpratik — Good addition: capturing the return value of WriteString and logging the error. The original code completely ignored write failures, which could silently lose log data. This is a genuine improvement regardless of the concurrency issues.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

jsonData, err := json.Marshal(entry)
if err == nil {
logger.file.WriteString(string(jsonData) + "\n")
logger.writeMu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[H1 — HIGH: No Concurrent Test] @ukpratik — This PR fixes a concurrency bug but adds zero tests. The existing test suite (logger_test.go) has no concurrent tests at all — no goroutines, no race detector validation, no file logging tests.

A minimal test that would catch the race:

func TestConcurrentFileLogging(t *testing.T) {
    dir := t.TempDir()
    logFile := filepath.Join(dir, "test.log")
    if err := EnableFileLogging(logFile); err != nil {
        t.Fatal(err)
    }
    defer DisableFileLogging()

    var wg sync.WaitGroup
    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func(n int) {
            defer wg.Done()
            InfoCF("test", fmt.Sprintf("msg-%d", n), map[string]interface{}{"n": n})
        }(i)
    }
    wg.Wait()

    data, err := os.ReadFile(logFile)
    if err != nil {
        t.Fatal(err)
    }
    lines := strings.Split(strings.TrimSpace(string(data)), "\n")
    if len(lines) != 100 {
        t.Errorf("expected 100 log lines, got %d", len(lines))
    }
}

Run with go test -race ./pkg/logger/... to catch data races under the detector.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leeaandrob I have added this test, and while trying to add -race in Makefile test, I found few other test were breaking, so created new make test-race for logger only as of now, later we can modify it for all the test, please check that part and do let me know what you think

@Leeaandrob
Copy link
Collaborator

Leeaandrob commented Feb 17, 2026

Please re-submit using the PR template @ukpratik

@ukpratik
Copy link
Author

ukpratik commented Mar 1, 2026

closing this, will raise another PR with all the fixes and using the PR template

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants