-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
When the argument to the standard C isspace or isalpha functions from <ctype.h> is neither (a) a value representable by the type unsigned char, nor (b) the value of the macro EOF, the behaviour is undefined:
In all cases the argument is an
int, the value of which shall be representable as anunsigned charor shall equal the value of the macroEOF. If the argument has any other value, the behavior is undefined.C11, Sec. 7.4 Character handling
<ctype.h>, p. 200, clause 1
The effect of this undefined behaviour may be wrong answers from dereferencing negative array offsets into other random heap data, crashes from dereferencing negative array offsets into unmapped oblivion, or demons flying out of your nose.
The function f_strptime may pass an arbitrary value of type char, which is controlled by user data input, into the <ctype.h> isspace function, leading to data-controlled undefined behaviour:
Line 1477 in e2ffb53
| if (end == NULL || (*end != '\0' && !isspace(*end))) { |
The function main may pass an arbitrary value of type char, which is controlled by user command-line input, into the <ctype.h> isalpha function, leading to command-line-controlled undefined behaviour:
Line 129 in e2ffb53
| return text[0] == '-' && (text[1] == '-' || isalpha(text[1])); |
To Reproduce
Data-controlled undefined behaviour:
echo '"20240711é"' | jq -r 'strptime("%Y%m%d")'
Command-line-controlled undefined behaviour:
jq -é
Expected behavior
jq should explicitly cast values of type char to unsigned char. This adjusts the values so that they meet the intended domain of the <ctype.h> functions (which are designed for use with functions like fgetc that return EOF-or-unsigned-char values via type int).
The other callers of <ctype.h> functions I found, in src/util.c, already cast values of type char to unsigned char, or work on the contents of unsigned char * arrays.
Environment (please complete the following information):
any
Additional context
The <ctype.h> API is confusing and almost everyone gets this wrong when trying to use it to process char * strings.