Skip to content

fix(openai_compat): clarify HTML response parse errors#1075

Merged
afjcjsbx merged 6 commits intosipeed:mainfrom
qs3c:fix/1068-html-response-error
Mar 7, 2026
Merged

fix(openai_compat): clarify HTML response parse errors#1075
afjcjsbx merged 6 commits intosipeed:mainfrom
qs3c:fix/1068-html-response-error

Conversation

@qs3c
Copy link
Contributor

@qs3c qs3c commented Mar 4, 2026

Description

Improve the error message when an OpenAI-compatible endpoint returns HTML instead of JSON with 200 OK.

Previously this surfaced only as a JSON unmarshal error such as invalid character ''<'', which made api_base or proxy misconfiguration harder to diagnose.

This change detects HTML-like responses and returns a more actionable error message with a short response preview. A regression test is included.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Refactoring

Related Issue

Closes #1068

Testing

go test ./pkg/providers/openai_compat -count=1

@afjcjsbx
Copy link
Collaborator

afjcjsbx commented Mar 4, 2026

I think this is an improved version of this PR: #1073, LGTM but there is a need to fix the lint please

@sipeed-bot sipeed-bot bot added type: bug Something isn't working domain: provider duplicate This issue or pull request already exists labels Mar 4, 2026
@qs3c
Copy link
Contributor Author

qs3c commented Mar 4, 2026

Fixed in a305c0a. The lint failure was the predeclared warning for max in responsePreview, which is now renamed to maxLen.

@qs3c
Copy link
Contributor Author

qs3c commented Mar 5, 2026

@afjcjsbx Could you help review this pr, please ~🙏

}

trimmed := strings.ToLower(strings.TrimSpace(string(body)))
return strings.HasPrefix(trimmed, "<!doctype html") ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the body contains a very large HTML page or a huge string, converting the whole byte array to string, doing TrimSpace and then ToLower on the whole body allocates a lot of memory unnecessarily, since you only care about the very first few characters. You can optimize by reading only the first N bytes ignoring the initial whitespace.

Copy link
Contributor Author

@qs3c qs3c Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 9216cd1. looksLikeHTML no longer converts the whole body; it now inspects only a trimmed prefix (leadingTrimmedPrefix, 128 bytes) before lowercase/prefix checks. I also switched responsePreview to bytes.TrimSpace plus bounded string conversion to avoid large allocations.

return nil, fmt.Errorf("failed to read response: %w", err)
}

if resp.StatusCode != http.StatusOK {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, if the server returns a 502 Bad Gateway (typical of nginx proxies) with an HTML page, the code stops there.

This means that if a proxy goes wrong by returning a 502 HTML, your new wrapResponseParseError wrapper is not invoked, because it only kicks in if the status code is 200 OK (which is rarely the case if the endpoint is broken and returns HTML).

How do you want to handle this? If the goal is to intercept HTML and show your error message even when there is an HTTP error (such as 404 Not Found, 502 Bad Gateway, 503 Service Unavailable, often accompanied by HTML), you may want to invoke your HTML check even within the resp.StatusCode != http.StatusOK block.

Copy link
Contributor Author

@qs3c qs3c Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed in 9216cd1. For non-200 responses we now run the same HTML diagnostic path via wrapHTTPResponseError(...), so 404/502/503 HTML pages also return the actionable api_base/proxy hint instead of only raw body output. Added TestProviderChat_HTMLErrorResponseReturnsHelpfulError (502 case).

}

return out, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we could simplify the code in this way:

    contentType := resp.Header.Get("Content-Type")

	// check if there is an HTTP error (caused by proxy or gateway) or if the response is HTML 
	if resp.StatusCode != http.StatusOK || strings.Contains(strings.ToLower(contentType), "text/html") {
		body, _ := io.ReadAll(resp.Body)
		return nil, wrapHTTPResponseError(resp.StatusCode, body, contentType, p.apiBase)
	}

	// directly pass the stream (resp.Body) to the JSON parser without loading everything into memory
	out, err := parseResponse(resp.Body)
	if err != nil {
		// Note: if it fails here, we do not have the full body in memory for HTML inspection, 
		// but having already checked the Content-Type above, the error is genuinely related to JSON parsing.
		return nil, fmt.Errorf("failed to parse JSON response: %w", err)
	}

	return out, nil
}

WDYT??

@qs3c
Copy link
Contributor Author

qs3c commented Mar 7, 2026

Thanks a lot for the super helpful suggestions, Professor Hubert!

Updated in 6eaa49f.

I've already tweaked this part of the code for better performance.

For non-200 status codes, I'm using io.LimitReader to read only a small prefix into memory instead of the whole body. For 200 responses, I switched to streaming JSON decoding instead of reading the entire body at once. I also use reader.Peek to inspect a small prefix and check whether the response looks like HTML.

I also tightened the tests to cover:

  • non-200 JSON errors not being reported as HTML
  • helpful HTML diagnostics for both success and error responses
  • a regression case ensuring successful responses are decoded in a streaming way

@qs3c
Copy link
Contributor Author

qs3c commented Mar 7, 2026

Fixed the lint failure in 53cba73. @afjcjsbx

@afjcjsbx afjcjsbx merged commit 440d665 into sipeed:main Mar 7, 2026
4 checks passed
dj-oyu pushed a commit to dj-oyu/picoclaw that referenced this pull request Mar 8, 2026
fix(openai_compat): clarify HTML response parse errors
fishtrees pushed a commit to fishtrees/picoclaw that referenced this pull request Mar 12, 2026
fix(openai_compat): clarify HTML response parse errors
dj-oyu pushed a commit to dj-oyu/picoclaw that referenced this pull request Mar 14, 2026
fix(openai_compat): clarify HTML response parse errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: provider duplicate This issue or pull request already exists type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Unclear error message when API returns HTML instead of JSON

2 participants