Skip to content

Conversation

@seisman
Copy link
Member

@seisman seisman commented Mar 23, 2024

Description of proposed changes

To pass a list of strings into a ctypes function, currently, we have codes like the below, which is not readable:

strings_pointer = (ctp.c_char_p * len(strings))()
strings_pointer[:] = np.char.encode(strings)

So better to have a strings_to_ctypes_array function, which can hide the technical details.

The above two lines codes can be shortened into a single line of code:

strings_pointer = (ctp.c_char_p * len(strings))(*np.char.encode(strings))

Actually np.char.encode calls the str.encode element-wise, so it can be further written as:

strings_pointer = (ctp.c_char_p * len(strings))(*[s.encode() for s in strings])

The list comprehension version is faster than the np.char.encode version:

>>> import numpy as np
>>> import ctypes as ctp
>>> strings = ["ABC", "DEFGHI", "ABC123", "ABC1234566"]
>>> %timeit (ctp.c_char_p * 4)(*[s.encode() for s in strings])
1.37 µs ± 3.79 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

>>> %timeit (ctp.c_char_p * 4)(*np.char.encode(strings))
6.17 µs ± 22.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

This PR adds the strings_to_ctypes_array function to do the conversion work. It just contains one line of code and doesn't check if the strings is an empty list to avoid extra overheads.

PR #3136 is a similar work but for converting a sequence of numbers to a ctypes array. These two functions share similar codes but I prefer not to combine them into a single function to avoid too many if-else clauses in the low-level function. Better to review these two PRs back-to-back.

@seisman seisman force-pushed the strings_to_ctypes_array branch from 32c97d5 to 2fe50a9 Compare March 23, 2024 08:42
@seisman seisman added the maintenance Boring but important stuff for the core devs label Mar 23, 2024
@seisman seisman added this to the 0.12.0 milestone Mar 23, 2024
@seisman seisman added the needs review This PR has higher priority and needs review. label Mar 23, 2024
@seisman seisman added run/benchmark Trigger the benchmark workflow in PRs and removed run/benchmark Trigger the benchmark workflow in PRs labels Mar 23, 2024
@seisman seisman changed the title Add a function strings_to_ctypes_array to convert a sequence of strings into a ctypes array Add strings_to_ctypes_array to convert a sequence of strings into a ctypes array Mar 23, 2024
@michaelgrund michaelgrund added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Mar 25, 2024
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Mar 26, 2024
@seisman seisman merged commit 62eb5d6 into main Mar 26, 2024
@seisman seisman deleted the strings_to_ctypes_array branch March 26, 2024 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Boring but important stuff for the core devs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants