Questions About RT2 Model Output: Tuple Structure and Tensor Meaning

Subject: Clarification on RT2 model output structure and usage

Hi [Support Team/RT2 Developers],

I’m working on integrating the RT2 model into a ROS2-based robotic arm control system. The model is being used to generate control commands based on visual input (images) and natural language instructions. However, I’m encountering some challenges in understanding the structure and purpose of the model's output.

Here’s what I observe:
1. The model output is a `tuple` with two elements.
2. The first element is a tensor with shape `[1, 1023, 20000]`. 
3. The meaning and intended use of this tensor are unclear in my application, as the size seems too large for direct use as robotic arm joint control commands.
4. The second element in the `tuple` has not been explored yet.

Could you please clarify the following:
1. What is the meaning of each element in the model’s output `tuple`?
2. Is the first tensor (shape `[1, 1023, 20000]`) designed for use in robotic control, or does it require additional decoding or processing? If so, what is the recommended approach?
3. If the second element of the `tuple` is relevant to robotic control, could you provide guidance or examples on how to interpret and use it?

Any examples, documentation, or best practices related to using RT2 for robotic arm control would be greatly appreciated.

Thank you for your assistance!

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions About RT2 Model Output: Tuple Structure and Tensor Meaning #27

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Questions About RT2 Model Output: Tuple Structure and Tensor Meaning #27

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions