Skip to content

Undefined behaviour in use of <ctype.h> functions #3151

@riastradh

Description

@riastradh

Describe the bug
When the argument to the standard C isspace or isalpha functions from <ctype.h> is neither (a) a value representable by the type unsigned char, nor (b) the value of the macro EOF, the behaviour is undefined:

In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

C11, Sec. 7.4 Character handling <ctype.h>, p. 200, clause 1

The effect of this undefined behaviour may be wrong answers from dereferencing negative array offsets into other random heap data, crashes from dereferencing negative array offsets into unmapped oblivion, or demons flying out of your nose.

The function f_strptime may pass an arbitrary value of type char, which is controlled by user data input, into the <ctype.h> isspace function, leading to data-controlled undefined behaviour:

jq/src/builtin.c

Line 1477 in e2ffb53

if (end == NULL || (*end != '\0' && !isspace(*end))) {

The function main may pass an arbitrary value of type char, which is controlled by user command-line input, into the <ctype.h> isalpha function, leading to command-line-controlled undefined behaviour:

jq/src/main.c

Line 129 in e2ffb53

return text[0] == '-' && (text[1] == '-' || isalpha(text[1]));

To Reproduce

Data-controlled undefined behaviour:

echo '"20240711é"' | jq -r 'strptime("%Y%m%d")'

Command-line-controlled undefined behaviour:

jq -é

Expected behavior
jq should explicitly cast values of type char to unsigned char. This adjusts the values so that they meet the intended domain of the <ctype.h> functions (which are designed for use with functions like fgetc that return EOF-or-unsigned-char values via type int).

The other callers of <ctype.h> functions I found, in src/util.c, already cast values of type char to unsigned char, or work on the contents of unsigned char * arrays.

Environment (please complete the following information):
any

Additional context
The <ctype.h> API is confusing and almost everyone gets this wrong when trying to use it to process char * strings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    UBFixes for undefined behaviours

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions