Suppose we have two kinds of device, we will have program composed of:
- Multi CPUDevice.
- Multi GPUDevice.
- Mixed Device, maybe some operators.
Currently, we only support the first two and leave 3 in the future work.
For the parallel computing design, please refer to #6394