-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
What is the feature?
It should be possible to exclude results based on entire HTTP response, including headers, using text search or regex. Similar idea was suggested in issue #182, but there was no good usecase shown that time.
What is the use case?
there are some applications that use their main page as a 404 page. Some of those applications rely on obfuscated JS to determine what would be rendered. In such cases, it might be hard to determine which pages actually exist.
But the backend server might still be aware if it is a real page, or a fallback, resulting in headers being sent differently.
Server might send additional headers, ommit some headers, change their value or send them in a different order. This might be used to track backend treatment of the URLs, even if the response body is the same.
In one case I encountered, the backend used Etag headers to use caching across different URLs, but the value was malformed for 404 pages:
- Real page:
Etag: W/"123-12345abcdef" - 404 page:
Etag: W/"123-12345abcdef;67890fedcba
(two tags combined with
;and missing the last")
It can be used not only to determine if pages exist on such websites, but also to detect other differences on the backend, such as authorisation requirements or pahes having additional functionality if requested with other HTTP method or headers.