Skip to content

Conversation

@needs
Copy link
Contributor

@needs needs commented Oct 17, 2025

Extend run and runStreamed input to be either a string or structured input. A structured input is an array of text parts and/or image paths, which will then be fed to the CLI through the --image argument. Text parts are combined with double newlines. For instance:

const turn = await thread.run([
  { type: "text", text: "Describe these screenshots" },
  { type: "local_image", path: "./ui.png" },
  { type: "local_image", path: "./diagram.jpg" },
  { type: "text", text: "Thanks!" },
]);

Ends up launching the CLI with:

codex exec --image foo.png --image bar.png "Describe these screenshots\n\nThanks!" 

The complete Input type for both function now is:

export type UserInput =
  | {
      type: "text";
      text: string;
    }
  | {
      type: "local_image";
      path: string;
    };

export type Input = string | UserInput[];

This brings the Codex SDK closer to feature parity with the CLI. Adresses #5280 .

@github-actions
Copy link

github-actions bot commented Oct 17, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@needs
Copy link
Contributor Author

needs commented Oct 17, 2025

I have read the CLA Document and I hereby sign the CLA

@needs needs force-pushed the feat/support-images-sdk branch from b57a499 to 2f82c67 Compare October 17, 2025 13:08
github-actions bot added a commit that referenced this pull request Oct 17, 2025
@pakrym-oai
Copy link
Collaborator

pakrym-oai commented Oct 17, 2025

Hi! Thank you for the contribution.

If you don't mind let's implement this by changing input: string on run and runStreamed to be of string | UserInput[] type.

where UserInput is

type UserInput = {
   type: "text",
   text: string,
} | {
   type: "local_image",
   path: string
}

We combine all text parts into the prompt and turn all images into --image parameter.

@needs
Copy link
Contributor Author

needs commented Oct 18, 2025

Hi! Thanks for your feedback.

The Input type has been extended the suggested way. Texts parts are combined with two newlines. This is done through the new normalizeInput function which returns a combined prompt and a list of image paths.

Added a second test that make sure text parts are combined the expected way.

Available for any follow ups!

@pakrym-oai
Copy link
Collaborator

Thank you!

@pakrym-oai pakrym-oai merged commit 3282e86 into openai:main Oct 20, 2025
20 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Oct 20, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants