feat(sm4): fm4 funcs for query and inserting #33731

stephenkgu · 2025-11-27T02:56:49Z

Description

Please briefly describe the code changes in this pull request.

Jira: https://jira.taosdata.com:18080/browse/TD-

Checklist

Please check the items in the checklist if applicable.

Is the user manual updated?
Are the test cases passed and automated?
Is there no significant decrease in test coverage?

… into fix/TS-7235-sha2

gemini-code-assist · 2025-11-27T02:58:09Z

Summary of Changes

Hello @stephenkgu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces several new cryptographic functions and libraries to the system, including SM4 and AES encryption, SHA-1 and SHA-2 hashing, and Base64 encoding/decoding. These additions enhance the system's data security capabilities and provide developers with more tools for data manipulation and protection.

Highlights

SM4 Encryption Functions: Introduces taosSm4Encrypt and taosSm4Decrypt functions for SM4 encryption and decryption, enhancing data security.
AES Encryption Library: Adds a new libaes library, providing AES encryption capabilities with ECB, CBC, and CTR modes.
SHA-1 and SHA-2 Hashing: Implements SHA-1 and SHA-2 hashing functions for data integrity and security purposes.
Base64 Encoding/Decoding: Adds tbase64_encode and tbase64_decode functions for Base64 encoding and decoding, facilitating data transmission and storage.
Function Management Integration: Integrates the new encryption and hashing functions into the function management system, making them available for use in queries and data manipulation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant amount of new functionality, including SM4, AES, SHA, base64, and masking functions. The scope is much larger than what the title "feat(sm4): fm4 funcs for query and inserting" suggests. While the new features are valuable, there are several critical issues related to security and correctness, especially in the cryptography and encoding functions. For instance, some functions modify input key buffers in-place, which can lead to memory corruption. The new base64 decoding function does not handle invalid characters correctly. There are also several maintainability and performance issues, such as very long functions that should be refactored, and inefficient memory reallocation patterns. I've provided detailed comments on these issues.

gemini-code-assist · 2025-11-27T03:02:00Z

source/libs/crypt/src/crypt.c

+int32_t taosSm4Encrypt(uint8_t *key, int32_t keylen, uint8_t *pBuf, int32_t len) {
+  int32_t    count = 0;
+  SCryptOpts opts;
+
+  pkcs7_padding_pad_buffer(key, keylen, SM4_BLOCKLEN);
+  pkcs7_padding_pad_buffer(pBuf, len, SM4_BLOCKLEN);
+
+  opts.len = tsm4_encrypt_len(len);
+  opts.source = pBuf;
+  opts.unitLen = 16;
+  opts.result = pBuf;
+
+  memset(opts.key, 0, ENCRYPT_KEY_LEN + 1);
+  memcpy(opts.key, key, ENCRYPT_KEY_LEN);
+
+  count = CBC_Encrypt(&opts);
+
+  return count;
+}


The function taosSm4Encrypt modifies the input key buffer in-place by calling pkcs7_padding_pad_buffer. This is a dangerous practice as it can lead to a buffer overflow if the caller does not provide a buffer that is large enough to hold the padded key. It's also an unexpected side effect. To make the function safer, it should create a local copy of the key, pad that copy, and then use it for encryption. This same issue is also present in taosSm4Decrypt.

gemini-code-assist · 2025-11-27T03:02:00Z

source/util/src/tbase64.c

+static char tbase64_decoding_table[256] = {0};
+
+static void tbase64_build_decoding_table() {
+  for (int i = 0; i < 64; i++) {
+    tbase64_decoding_table[(unsigned char)tbase64_encoding_table[i]] = i;
+  }
+}


The tbase64_decoding_table is initialized with 0 for all characters. This is incorrect because characters not in the base64 alphabet will be treated as 'A' (value 0), leading to silent incorrect decoding instead of an error. The table should be initialized with an error marker (e.g., -1), and the decoding logic must check for this marker.

Suggested change

static char tbase64_decoding_table[256] = {0};

static void tbase64_build_decoding_table() {

for (int i = 0; i < 64; i++) {

tbase64_decoding_table[(unsigned char)tbase64_encoding_table[i]] = i;

}

}

static char tbase64_decoding_table[256];

static void tbase64_build_decoding_table() {

memset(tbase64_decoding_table, -1, sizeof(tbase64_decoding_table));

for (int i = 0; i < 64; i++) {

tbase64_decoding_table[(unsigned char)tbase64_encoding_table[i]] = i;

}

}

gemini-code-assist · 2025-11-27T03:02:00Z

source/util/src/tbase64.c

+int32_t tbase64_decode(uint8_t *out, const uint8_t *input, size_t in_len, VarDataLenT out_len) {
+  (void)taosThreadOnce(&tbase64_decoding_table_building, tbase64_build_decoding_table);
+
+  if (in_len % 4 != 0) {
+    return TSDB_CODE_INVALID_DATA_FMT;
+  }
+
+  if (input[in_len - 1] == '=') out_len--;
+  if (input[in_len - 2] == '=') out_len--;
+
+  for (int i = 0, j = 0; i < in_len;) {
+    uint32_t sextet_a = input[i] == '=' ? 0 & i++ : tbase64_decoding_table[input[i++]];
+    uint32_t sextet_b = input[i] == '=' ? 0 & i++ : tbase64_decoding_table[input[i++]];
+    uint32_t sextet_c = input[i] == '=' ? 0 & i++ : tbase64_decoding_table[input[i++]];
+    uint32_t sextet_d = input[i] == '=' ? 0 & i++ : tbase64_decoding_table[input[i++]];
+
+    uint32_t triple = (sextet_a << 3 * 6) + (sextet_b << 2 * 6) + (sextet_c << 1 * 6) + (sextet_d << 0 * 6);
+
+    if (j < out_len) out[j++] = (triple >> 2 * 8) & 0xFF;
+    if (j < out_len) out[j++] = (triple >> 1 * 8) & 0xFF;
+    if (j < out_len) out[j++] = (triple >> 0 * 8) & 0xFF;
+  }
+
+  return TSDB_CODE_SUCCESS;
+}


The tbase64_decode function does not validate the decoded values. After fixing tbase64_build_decoding_table to use an error marker like -1, this function should check for it and return an error for invalid base64 characters. Additionally, the expression 0 & i++ is confusing and error-prone. A clearer implementation is recommended.

Example of validation:

int sextet_a = (input[i] == '=') ? 0 : tbase64_decoding_table[input[i]]; i++; // ... similar for sextet_b, sextet_c, sextet_d if (sextet_a < 0 || sextet_b < 0 || sextet_c < 0 || sextet_d < 0) { return TSDB_CODE_INVALID_DATA_FMT; }

gemini-code-assist · 2025-11-27T03:02:01Z

source/util/src/tmd5.c

+int32_t taosAesEncrypt(uint8_t *key, int32_t keylen, uint8_t *pBuf, int32_t len, const uint8_t *iv) {
+  uint32_t       encryptlen = taes_encrypt_len(len);
+  int32_t        blocks = encryptlen / AES_BLOCKLEN;
+  struct AES_ctx ctx;
+
+  pkcs7_padding_pad_buffer(key, keylen, AES_BLOCKLEN);
+  pkcs7_padding_pad_buffer(pBuf, len, AES_BLOCKLEN);
+
+  if (iv) {
+    AES_init_ctx_iv(&ctx, key, iv);
+
+    // for (int32_t block = 0; block < blocks; ++blocks) {
+    AES_CBC_encrypt_buffer(&ctx, pBuf, encryptlen);
+    //}
+  } else {
+    AES_init_ctx(&ctx, key);
+
+    for (int32_t block = 0; block < blocks; ++block) {
+      AES_ECB_encrypt(&ctx, pBuf + block * AES_BLOCKLEN);
+    }
+  }
+
+  return encryptlen;
+}


The function taosAesEncrypt modifies the input key buffer in-place by calling pkcs7_padding_pad_buffer. This is a dangerous practice as it can lead to a buffer overflow if the caller does not provide a buffer that is large enough to hold the padded key. It's also an unexpected side effect. To make the function safer, it should create a local copy of the key, pad that copy, and then use it for encryption. This same issue is also present in taosAesDecrypt.

gemini-code-assist · 2025-11-27T03:02:01Z

contrib/libaes/inc/taes.h

+void AES_ECB_encrypt(const struct AES_ctx* ctx, uint8_t* buf);
+void AES_ECB_decrypt(const struct AES_ctx* ctx, uint8_t* buf);
+
+#endif  // #if defined(ECB) && (ECB == !)


There is a typo in this comment. It should be (ECB == 1) instead of (ECB == !). While this is a minor issue in a comment, fixing it improves code clarity.

#endif // #if defined(ECB) && (ECB == 1)

gemini-code-assist · 2025-11-27T03:02:01Z

include/util/tmd5.h

+int32_t  taosAesEncrypt(uint8_t *key, int32_t keylen, uint8_t *pBuf, int32_t len, const uint8_t *iv);
+int32_t  taosAesDecrypt(uint8_t *key, int32_t keylen, uint8_t *pBuf, int32_t len, const uint8_t *iv);
+uint32_t taes_encrypt_len(int32_t len);


The AES function declarations taosAesEncrypt, taosAesDecrypt, and taes_encrypt_len are placed in tmd5.h. This is misleading and harms maintainability, as this header is for MD5 functions. These declarations should be moved to a more appropriate header file, such as taes.h or a new header for encryption utilities. The corresponding implementations in source/util/src/tmd5.c should also be moved.

gemini-code-assist · 2025-11-27T03:02:01Z

include/util/tutil.h

+static FORCE_INLINE int32_t taosCreateSHA2Hash(char *pBuf, int32_t len, uint32_t digestSize) {
+  uint8_t result[2 * SHA512_DIGEST_SIZE + 1] = {0};
+
+  switch (digestSize / 8) {
+    case SHA224_DIGEST_SIZE:
+      sha224((const uint8_t *)pBuf, len, result);
+      return sprintf(pBuf,
+                     "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%"
+                     "02x%02x%02x%02x",
+                     result[0], result[1], result[2], result[3], result[4], result[5], result[6], result[7], result[8],
+                     result[9], result[10], result[11], result[12], result[13], result[14], result[15], result[16],
+                     result[17], result[18], result[19], result[20], result[21], result[22], result[23], result[24],
+                     result[25], result[26], result[27]);
+    case SHA256_DIGEST_SIZE:
+      sha256((const uint8_t *)pBuf, len, result);
+      return sprintf(pBuf,
+                     "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%"
+                     "02x%02x%02x%02x%02x%02x%02x%02x",
+                     result[0], result[1], result[2], result[3], result[4], result[5], result[6], result[7], result[8],
+                     result[9], result[10], result[11], result[12], result[13], result[14], result[15], result[16],
+                     result[17], result[18], result[19], result[20], result[21], result[22], result[23], result[24],
+                     result[25], result[26], result[27], result[28], result[29], result[30], result[31]);
+    case SHA384_DIGEST_SIZE:
+      sha384((const uint8_t *)pBuf, len, result);
+      return sprintf(pBuf,
+                     "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%"
+                     "02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
+                     result[0], result[1], result[2], result[3], result[4], result[5], result[6], result[7], result[8],
+                     result[9], result[10], result[11], result[12], result[13], result[14], result[15], result[16],
+                     result[17], result[18], result[19], result[20], result[21], result[22], result[23], result[24],
+                     result[25], result[26], result[27], result[28], result[29], result[30], result[31], result[32],
+                     result[33], result[34], result[35], result[36], result[37], result[38], result[39], result[40],
+                     result[41], result[42], result[43], result[44], result[45], result[46], result[47]);
+    case SHA512_DIGEST_SIZE:
+      sha512((const uint8_t *)pBuf, len, result);
+      return sprintf(pBuf,
+                     "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%"
+                     "02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%"
+                     "02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
+                     result[0], result[1], result[2], result[3], result[4], result[5], result[6], result[7], result[8],
+                     result[9], result[10], result[11], result[12], result[13], result[14], result[15], result[16],
+                     result[17], result[18], result[19], result[20], result[21], result[22], result[23], result[24],
+                     result[25], result[26], result[27], result[28], result[29], result[30], result[31], result[32],
+                     result[33], result[34], result[35], result[36], result[37], result[38], result[39], result[40],
+                     result[41], result[42], result[43], result[44], result[45], result[46], result[47], result[48],
+                     result[49], result[50], result[51], result[52], result[53], result[54], result[55], result[56],
+                     result[57], result[58], result[59], result[60], result[61], result[62], result[63]);
+    default:
+      break;
+  }
+
+  return 0;
+}


The sprintf calls with very long format strings to convert a hash digest to a hex string are hard to read and error-prone. This can be refactored into a loop for better readability and maintainability. This applies to taosCreateMD5Hash, taosCreateSHA1Hash, and all cases in taosCreateSHA2Hash.

Example refactoring for a generic case:

char* p = pBuf; for (int i = 0; i < digest_len_in_bytes; ++i) { p += sprintf(p, "%02x", result[i]); } return p - pBuf;

gemini-code-assist · 2025-11-27T03:02:01Z

source/libs/parser/src/parInsertSql.c

+static int32_t parseBinary(SInsertParseContext* pCxt, const char** ppSql, SToken* pToken, SColVal* pVal,
+                           SSchema* pSchema) {
+  int32_t   bytes = pSchema->bytes;
+  uint8_t** pData = &pVal->value.pData;
+  uint32_t* nData = &pVal->value.nData;
+
+  if (TK_NK_ID == pToken->type) {
+    char*   input = NULL;
+    int32_t inputBytes = 0;
+    char    tmpTokenBuf[TSDB_MAX_BYTES_PER_ROW];
+
+    if (0 == strncasecmp(pToken->z, "from_base64(", 12)) {
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_LP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "( expected", pToken->z);
+      }
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NULL == pToken->type) {
+        NEXT_VALID_TOKEN(*ppSql, *pToken);
+        if (TK_NK_RP == pToken->type) {
+          pVal->flag = CV_FLAG_NULL;
+
+          return TSDB_CODE_SUCCESS;
+        }
+      } else if (TK_NK_STRING != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "string expected", pToken->z);
+      }
+
+      inputBytes = trimString(pToken->z, pToken->n, tmpTokenBuf, TSDB_MAX_BYTES_PER_ROW);
+      input = tmpTokenBuf;
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_RP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, ") expected", pToken->z);
+      }
+
+      int32_t outputBytes = tbase64_decode_len(inputBytes);
+      if (outputBytes + VARSTR_HEADER_SIZE > bytes) {
+        return generateSyntaxErrMsg(&pCxt->msg, TSDB_CODE_PAR_VALUE_TOO_LONG, pSchema->name);
+      }
+
+      *pData = taosMemoryMalloc(outputBytes);
+      if (NULL == *pData) {
+        return terrno;
+      }
+
+      if (TSDB_CODE_SUCCESS != tbase64_decode(*pData, (const uint8_t*)input, inputBytes, outputBytes)) {
+        taosMemoryFree(*pData);
+        return generateSyntaxErrMsg(&pCxt->msg, TSDB_CODE_INVALID_DATA_FMT);
+      }
+      *nData = outputBytes;
+    } else if (0 == strncasecmp(pToken->z, "to_base64(", 10)) {
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_LP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "( expected", pToken->z);
+      }
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NULL == pToken->type) {
+        NEXT_VALID_TOKEN(*ppSql, *pToken);
+        if (TK_NK_RP == pToken->type) {
+          pVal->flag = CV_FLAG_NULL;
+
+          return TSDB_CODE_SUCCESS;
+        }
+      } else if (TK_NK_STRING != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "string expected", pToken->z);
+      }
+
+      inputBytes = trimString(pToken->z, pToken->n, tmpTokenBuf, TSDB_MAX_BYTES_PER_ROW);
+      input = tmpTokenBuf;
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_RP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, ") expected", pToken->z);
+      }
+
+      int32_t outputBytes = tbase64_encode_len(inputBytes);
+      if (outputBytes + VARSTR_HEADER_SIZE > bytes) {
+        return generateSyntaxErrMsg(&pCxt->msg, TSDB_CODE_PAR_VALUE_TOO_LONG, pSchema->name);
+      }
+
+      *pData = taosMemoryMalloc(outputBytes);
+      if (NULL == *pData) {
+        return terrno;
+      }
+
+      tbase64_encode(*pData, input, inputBytes, outputBytes);
+      *nData = outputBytes;
+    } else if (0 == strncasecmp(pToken->z, "md5(", 4)) {
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_LP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "( expected", pToken->z);
+      }
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NULL == pToken->type) {
+        NEXT_VALID_TOKEN(*ppSql, *pToken);
+        if (TK_NK_RP == pToken->type) {
+          pVal->flag = CV_FLAG_NULL;
+
+          return TSDB_CODE_SUCCESS;
+        }
+      } else if (TK_NK_STRING != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "string expected", pToken->z);
+      }
+
+      inputBytes = trimString(pToken->z, pToken->n, tmpTokenBuf, TSDB_MAX_BYTES_PER_ROW);
+      input = tmpTokenBuf;
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_RP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, ") expected", pToken->z);
+      }
+
+      if (MD5_OUTPUT_LEN + VARSTR_HEADER_SIZE > bytes) {
+        return generateSyntaxErrMsg(&pCxt->msg, TSDB_CODE_PAR_VALUE_TOO_LONG, pSchema->name);
+      }
+
+      int32_t bufLen = TMAX(MD5_OUTPUT_LEN + VARSTR_HEADER_SIZE + 1, inputBytes);
+      *pData = taosMemoryMalloc(bufLen);
+      if (NULL == *pData) {
+        return terrno;
+      }
+
+      (void)memcpy(*pData, input, inputBytes);
+      int32_t len = taosCreateMD5Hash(*pData, inputBytes);
+      *nData = len;
+    } else if (0 == strncasecmp(pToken->z, "sha2(", 5)) {
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_LP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "( expected", pToken->z);
+      }
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NULL == pToken->type) {
+        NEXT_VALID_TOKEN(*ppSql, *pToken);
+        if (TK_NK_RP == pToken->type) {
+          pVal->flag = CV_FLAG_NULL;
+
+          return TSDB_CODE_SUCCESS;
+        }
+      } else if (TK_NK_STRING != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "string expected", pToken->z);
+      }
+
+      inputBytes = trimString(pToken->z, pToken->n, tmpTokenBuf, TSDB_MAX_BYTES_PER_ROW);
+      input = tmpTokenBuf;
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_COMMA != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, ", expected", pToken->z);
+      }
+
+      int64_t digestLen = 224;
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_INTEGER != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "integer [224, 256, 384, 512] expected", pToken->z);
+      } else {
+        if (TSDB_CODE_SUCCESS != toInteger(pToken->z, pToken->n, 10, &digestLen)) {
+          return buildSyntaxErrMsg(&pCxt->msg, "invalid integer format", pToken->z);
+        }
+
+        if (224 != digestLen && 256 != digestLen && 384 != digestLen && 512 != digestLen) {
+          return buildSyntaxErrMsg(&pCxt->msg, "sha2 digest length must be one of 224, 256, 384, 512", pToken->z);
+        }
+      }
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_RP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, ") expected", pToken->z);
+      }
+
+      if (SHA2_OUTPUT_LEN + VARSTR_HEADER_SIZE > bytes) {
+        return generateSyntaxErrMsg(&pCxt->msg, TSDB_CODE_PAR_VALUE_TOO_LONG, pSchema->name);
+      }
+
+      int32_t bufLen = TMAX(SHA2_OUTPUT_LEN + VARSTR_HEADER_SIZE + 1, inputBytes);
+      *pData = taosMemoryMalloc(bufLen);
+      if (NULL == *pData) {
+        return terrno;
+      }
+
+      (void)memcpy(*pData, input, inputBytes);
+      int32_t len = taosCreateSHA2Hash(*pData, inputBytes, digestLen);
+      *nData = len;
+    } else if (0 == strncasecmp(pToken->z, "sha(", 4) || 0 == strncasecmp(pToken->z, "sha1(", 5)) {
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_LP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "( expected", pToken->z);
+      }
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NULL == pToken->type) {
+        NEXT_VALID_TOKEN(*ppSql, *pToken);
+        if (TK_NK_RP == pToken->type) {
+          pVal->flag = CV_FLAG_NULL;
+
+          return TSDB_CODE_SUCCESS;
+        }
+      } else if (TK_NK_STRING != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "string expected", pToken->z);
+      }
+
+      inputBytes = trimString(pToken->z, pToken->n, tmpTokenBuf, TSDB_MAX_BYTES_PER_ROW);
+      input = tmpTokenBuf;
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_RP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, ") expected", pToken->z);
+      }
+
+      if (SHA1_OUTPUT_LEN + VARSTR_HEADER_SIZE > bytes) {
+        return generateSyntaxErrMsg(&pCxt->msg, TSDB_CODE_PAR_VALUE_TOO_LONG, pSchema->name);
+      }
+
+      int32_t bufLen = TMAX(SHA1_OUTPUT_LEN + VARSTR_HEADER_SIZE + 1, inputBytes);
+      *pData = taosMemoryMalloc(bufLen);
+      if (NULL == *pData) {
+        return terrno;
+      }
+
+      (void)memcpy(*pData, input, inputBytes);
+      int32_t len = taosCreateSHA1Hash(*pData, inputBytes);
+      *nData = len;
+    } else if (0 == strncasecmp(pToken->z, "aes_encrypt(", 12)) {
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_LP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "( expected", pToken->z);
+      }
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_STRING != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "string expected", pToken->z);
+      } else {
+        inputBytes = trimString(pToken->z, pToken->n, tmpTokenBuf, TSDB_MAX_BYTES_PER_ROW);
+        input = tmpTokenBuf;
+      }
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_COMMA != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, ", expected", pToken->z);
+      }
+
+      char*   key = NULL;
+      int32_t keyBytes = 0;
+      char    tmpKeyBuf[TSDB_MAX_BYTES_PER_ROW];
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_STRING != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "key expected", pToken->z);
+      } else {
+        keyBytes = trimString(pToken->z, pToken->n, tmpKeyBuf, TSDB_MAX_BYTES_PER_ROW);
+        key = tmpKeyBuf;
+      }
+
+      char*   iv = NULL;
+      int32_t ivBytes = 0;
+      char    tmpIvBuf[TSDB_MAX_BYTES_PER_ROW];
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_RP != pToken->type && TK_NK_STRING != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, ") or iv expected", pToken->z);
+      } else if (TK_NK_STRING == pToken->type) {
+        ivBytes = trimString(pToken->z, pToken->n, tmpIvBuf, TSDB_MAX_BYTES_PER_ROW);
+        iv = tmpIvBuf;
+
+        NEXT_VALID_TOKEN(*ppSql, *pToken);
+        if (TK_NK_RP != pToken->type) {
+          return buildSyntaxErrMsg(&pCxt->msg, ") expected", pToken->z);
+        }
+      }
+
+      int32_t outputBytes = taes_encrypt_len(inputBytes);
+      if (outputBytes + VARSTR_HEADER_SIZE > bytes) {
+        return generateSyntaxErrMsg(&pCxt->msg, TSDB_CODE_PAR_VALUE_TOO_LONG, pSchema->name);
+      }
+
+      int32_t bufLen = outputBytes + VARSTR_HEADER_SIZE + 1;
+      *pData = taosMemoryMalloc(bufLen);
+      if (NULL == *pData) {
+        return terrno;
+      }
+
+      (void)memcpy(*pData, input, inputBytes);
+
+      int32_t keypaddedlen = taes_encrypt_len(keyBytes);
+      char*   pKeyPaddingBuf = taosMemoryMalloc(keypaddedlen);
+      if (!pKeyPaddingBuf) {
+        return terrno;
+      }
+      (void)memcpy(pKeyPaddingBuf, key, keyBytes);
+
+      int32_t len = taosAesEncrypt(pKeyPaddingBuf, keyBytes, *pData, inputBytes, iv);
+      *nData = len;
+
+      taosMemoryFree(pKeyPaddingBuf);
+    } else if (0 == strncasecmp(pToken->z, "aes_decrypt(", 12)) {
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_LP != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "( expected", pToken->z);
+      }
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_STRING != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "string expected", pToken->z);
+      } else {
+        inputBytes = trimString(pToken->z, pToken->n, tmpTokenBuf, TSDB_MAX_BYTES_PER_ROW);
+        input = tmpTokenBuf;
+      }
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_COMMA != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, ", expected", pToken->z);
+      }
+
+      char*   key = NULL;
+      int32_t keyBytes = 0;
+      char    tmpKeyBuf[TSDB_MAX_BYTES_PER_ROW];
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_STRING != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, "key expected", pToken->z);
+      } else {
+        keyBytes = trimString(pToken->z, pToken->n, tmpKeyBuf, TSDB_MAX_BYTES_PER_ROW);
+        key = tmpKeyBuf;
+      }
+
+      char*   iv = NULL;
+      int32_t ivBytes = 0;
+      char    tmpIvBuf[TSDB_MAX_BYTES_PER_ROW];
+
+      NEXT_VALID_TOKEN(*ppSql, *pToken);
+      if (TK_NK_RP != pToken->type && TK_NK_STRING != pToken->type) {
+        return buildSyntaxErrMsg(&pCxt->msg, ") or iv expected", pToken->z);
+      } else if (TK_NK_STRING == pToken->type) {
+        ivBytes = trimString(pToken->z, pToken->n, tmpIvBuf, TSDB_MAX_BYTES_PER_ROW);
+        iv = tmpIvBuf;
+
+        NEXT_VALID_TOKEN(*ppSql, *pToken);
+        if (TK_NK_RP != pToken->type) {
+          return buildSyntaxErrMsg(&pCxt->msg, ") expected", pToken->z);
+        }
+      }
+
+      int32_t outputBytes = taes_encrypt_len(inputBytes);
+      if (outputBytes + VARSTR_HEADER_SIZE > bytes) {
+        return generateSyntaxErrMsg(&pCxt->msg, TSDB_CODE_PAR_VALUE_TOO_LONG, pSchema->name);
+      }
+
+      int32_t bufLen = outputBytes + VARSTR_HEADER_SIZE + 1;
+      *pData = taosMemoryMalloc(bufLen);
+      if (NULL == *pData) {
+        return terrno;
+      }
+
+      (void)memcpy(*pData, input, inputBytes);
+
+      int32_t keypaddedlen = taes_encrypt_len(keyBytes);
+      char*   pKeyPaddingBuf = taosMemoryMalloc(keypaddedlen);
+      if (!pKeyPaddingBuf) {
+        return terrno;
+      }
+      (void)memcpy(pKeyPaddingBuf, key, keyBytes);
+
+      int32_t len = taosAesDecrypt(pKeyPaddingBuf, keyBytes, *pData, inputBytes, iv);
+      *nData = len;
+
+      taosMemoryFree(pKeyPaddingBuf);
+    } else {
+      return buildSyntaxErrMsg(&pCxt->msg, "invalid identifier", pToken->z);
+    }
+  } else {
+    if (pToken->n + VARSTR_HEADER_SIZE > bytes) {
+      return generateSyntaxErrMsg(&pCxt->msg, TSDB_CODE_PAR_VALUE_TOO_LONG, pSchema->name);
+    }
+
+    *pData = taosMemoryMalloc(pToken->n);
+    if (NULL == *pData) {
+      return terrno;
+    }
+    memcpy(*pData, pToken->z, pToken->n);
+    *nData = pToken->n;
+  }
+
+  pVal->flag = CV_FLAG_VALUE;
+  return TSDB_CODE_SUCCESS;
+}


The new function parseBinary is very long (almost 400 lines) and handles parsing for many different functions (from_base64, md5, sha, aes_encrypt, etc.). This makes the function difficult to read, maintain, and test. It would be better to refactor this into smaller, single-purpose functions for each function call it parses. For example, parseBinaryFromBase64, parseBinaryMd5, etc.

gemini-code-assist · 2025-11-27T03:02:01Z

source/libs/scalar/src/sclfunc.c

+int32_t shaFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput) {
+  SColumnInfoData *pInputData = pInput->columnData;
+  SColumnInfoData *pOutputData = pOutput->columnData;
+  int32_t          bufLen = TMAX(SHA1_OUTPUT_LEN + VARSTR_HEADER_SIZE + 1, pInputData->info.bytes);
+  char            *pOutputBuf = taosMemoryMalloc(bufLen);
+  if (!pOutputBuf) {
+    qError("sha function alloc memory failed");
+    return terrno;
+  }
+  for (int32_t i = 0; i < pInput->numOfRows; ++i) {
+    if (colDataIsNull_s(pInputData, i)) {
+      colDataSetNULL(pOutputData, i);
+      continue;
+    }
+    char *input = colDataGetData(pInput[0].columnData, i);
+    if (bufLen < varDataLen(input) + VARSTR_HEADER_SIZE) {
+      bufLen = varDataLen(input) + VARSTR_HEADER_SIZE;
+      pOutputBuf = taosMemoryRealloc(pOutputBuf, bufLen);
+      if (!pOutputBuf) {
+        qError("sha function alloc memory failed");
+        return terrno;
+      }
+    }
+    char *output = pOutputBuf;
+    (void)memcpy(varDataVal(output), varDataVal(input), varDataLen(input));
+    int32_t len = taosCreateSHA1Hash(varDataVal(output), varDataLen(input));
+    varDataSetLen(output, len);
+    int32_t code = colDataSetVal(pOutputData, i, output, false);
+    if (TSDB_CODE_SUCCESS != code) {
+      taosMemoryFree(pOutputBuf);
+      SCL_ERR_RET(code);
+    }
+  }
+  pOutput->numOfRows = pInput->numOfRows;
+  taosMemoryFree(pOutputBuf);
+  return TSDB_CODE_SUCCESS;
+}


The function shaFunction uses taosMemoryRealloc inside a loop, which can be inefficient. The maximum required buffer size can be determined before the loop, so a single allocation would be sufficient. This would improve performance by avoiding repeated reallocations. This pattern is also present in md5Function, sha2Function, aesFunction, sm4Function and their decryption counterparts.

gemini-code-assist · 2025-11-27T03:02:01Z

source/util/src/tmd5.c

+    // for (int32_t block = 0; block < blocks; ++blocks) {
+    AES_CBC_encrypt_buffer(&ctx, pBuf, encryptlen);
+    //}


The commented-out loop for (int32_t block = 0; block < blocks; ++blocks) contains a bug (++blocks instead of ++block). Since the loop is not used and AES_CBC_encrypt_buffer handles the entire buffer, this commented code is confusing and should be removed to improve clarity. The same applies to taosAesDecrypt.

Suggested change

// for (int32_t block = 0; block < blocks; ++blocks) {

AES_CBC_encrypt_buffer(&ctx, pBuf, encryptlen);

//}

AES_CBC_encrypt_buffer(&ctx, pBuf, encryptlen);

stephenkgu and others added 30 commits October 27, 2025 16:47

feat(encoding): new security funcs

8b3608b

Merge branch '3.0' into fix/TS-7235

36b1751

fix null with {from, to}_base64

c154911

feat(md5): support md5 in inserting

25876aa

fix failed decoding cases

096c37a

new sha/sha1 digesting funcs

d6c20c2

fix CI compiling issue

1174581

feat(sha2): new sha funcs

21c59f5

Merge branch '3.0' into fix/TS-7235-sha2

ff235a7

fix string format printing

9befeea

Merge branch 'fix/TS-7235-sha2' of https://github.com/taosdata/TDengine…

964deba

… into fix/TS-7235-sha2

fix func name in error logs

3ec33b2

fix char type conversions

34b81f8

fix format strings

ab5561a

fix format string

6c8e3d5

fix

0292444

fix

c62f648

fix

3be1841

fix Mark & Mask

3620dfb

fix clang-query

fc79257

Merge branch '3.0' into fix/TS-7235-sha2

7967e6a

feat(mask): mask funcs

7c1a035

base64: new func type for from_base64

265c6c7

feat(mask): new data masking funcs: full, partial, none

126011a

Merge branch '3.0' into fix/TS-7235-mask-all

1b0968b

mask_partial: fix middle part copying

6ae3e91

Merge branch '3.0' into fix/TS-7235-mask-all

47e57ac

feat(aes): aes encrypt/decrypt funcs for querying

c29665c

writing with aes encryp/decrypt cbc & ecb

d910b18

feat(sm4): fm4 funcs for query and inserting

087e576

stephenkgu requested review from a team, dapan1121, guanshengliang and hzcheng as code owners November 27, 2025 02:56

taosdata-bot bot added the team engine label Nov 27, 2025

gemini-code-assist bot reviewed Nov 27, 2025

View reviewed changes

stephenkgu added 2 commits November 27, 2025 16:47

sm4 funcs for inserting

b62b9de

Merge branch '3.0' into fix/TD-7235-sm4

c1bc8d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(sm4): fm4 funcs for query and inserting #33731

feat(sm4): fm4 funcs for query and inserting #33731

stephenkgu commented Nov 27, 2025

Uh oh!

gemini-code-assist bot commented Nov 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(sm4): fm4 funcs for query and inserting #33731

Are you sure you want to change the base?

feat(sm4): fm4 funcs for query and inserting #33731

Conversation

stephenkgu commented Nov 27, 2025

Description

Checklist

Uh oh!

gemini-code-assist bot commented Nov 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants