Skip to content

[WIP] Initiate animation component#400

Closed
ctao456 wants to merge 120 commits intoopea-project:mainfrom
ctao456:ctao/animation
Closed

[WIP] Initiate animation component#400
ctao456 wants to merge 120 commits intoopea-project:mainfrom
ctao456:ctao/animation

Conversation

@ctao456
Copy link
Copy Markdown
Contributor

@ctao456 ctao456 commented Aug 2, 2024

Description

Add a new component called "Animation", first PR includes Wav2Lip+GFPGAN model

Issues

opea-project/docs#59

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

Wav2Lip-GFPGAN
opea-project/GenAIExamples#523

Tests

Once microservice starts, user can use below script to validate the running microservice.

export ip_address=$(hostname -I | awk '{print $1}')
curl http://${ip_address}:7860/v1/animation -X POST -H "Content-Type: application/json" -d @comps/animation/assets/audio/sample_question.json

or

cd comps/animation
python3 test_animation_server.py

The expected output is a message prompting

"Video generated successfully, check $OUTFILE for the result."

Please find "outputs/result.mp4" in the current directory as a reference generated video.

ctao456 and others added 30 commits June 23, 2024 16:43
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: ZhangJianyu <[email protected]>
Signed-off-by: ZhangJianyu <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Copy link
Copy Markdown
Collaborator

@louie-tsai louie-tsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. still need more time to look into annimation.py.

@ctao456 ctao456 closed this Aug 21, 2024
full_frames = [cv2.imread(args.face)]
fps = args.fps
else:
video_stream = cv2.VideoCapture(args.face)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know where CV2 runs on? do they leverage Gaudi?

Could we leverage Gaudi media pipeline?
https://docs.habana.ai/en/latest/Media_Pipeline/Media_Pipeline.html?highlight=media

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CV2 runs on cpu by default. If we want gpu support, we need to build CV2 from source. I don't see CV2 on Gaudi in the documentation.

Media pipeline is possible. Added as a TO-DO.

if args.resize_factor > 1:
frame = cv2.resize(frame, (frame.shape[1] // args.resize_factor, frame.shape[0] // args.resize_factor))
if args.rotate:
frame = cv2.rotate(frame, cv2.ROTATE_90_CLOCKWISE())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to have an estimation how long CV2 tasks take.

# one single video frame corresponds to 80/25*0.01 = 0.032 seconds (or 32 milliseconds) of audio
# 30 fps video will match closer to audio, than 25 fps
mel_chunks = []
mel_idx_multiplier = 80.0 / fps
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does 80 mean here? good to give it a variable or definition instead of magic number

Copy link
Copy Markdown
Contributor Author

@ctao456 ctao456 Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80 is the batch size of mel chunks during training, to be fused with 1 second video.
image
Paper here: https://arxiv.org/pdf/2008.10010
So 80/fps is how many mel chunks are used to select to associate one image frame.


frame_h, frame_w = full_frames[0].shape[:-1]
if args.inference_mode == "wav2clip_only":
out = cv2.VideoWriter("temp/result.avi", cv2.VideoWriter_fourcc(*"DIVX"), fps, (frame_w, frame_h))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also good to understand the time taken by cv2 here.

@ctao456 ctao456 mentioned this pull request Oct 9, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants