Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
241 changes: 241 additions & 0 deletions .github/workflows/cicd-main-speech.yml
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,247 @@ jobs:
script: L2_TTS_InferEvaluate_Magpietts_MoE_ZeroShot
- runner: self-hosted-azure
script: L2_TTS_InferEvaluatelongform_Magpietts_MoE_ZeroShot
# Model support functional tests
- runner: self-hosted-azure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pzelasko regarding scheduling this, we actually run nightly tests on main branch already. Looks like one of the CI tests needs to be fixed.
https://github.com/NVIDIA-NeMo/NeMo/actions/runs/22421911989
https://github.com/NVIDIA-NeMo/NeMo/blob/main/.github/workflows/cicd-main.yml#L17

In any case, to separate these out to run nightly only, it may be better to move the tests to a new group called e2e-nightly or whatever you want on it and apply a condition like this:
https://github.com/NVIDIA-NeMo/NeMo/blob/main/.github/workflows/cicd-main.yml#L246

The condition could be something like:

if: ${{ github.event_name == 'schedule' }}
needs: unit-tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chtruong814 added, can you check if it looks OK? Thanks!

script: L2_Model_Support_nvidia__stt_de_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_en_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_es_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_it_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_ua_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_pl_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_hr_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_be_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_fr_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_ru_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_nl_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_fa_fastconformer_hybrid_large
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_ka_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_kk_ru_fastconformer_hybrid_large
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_ka_fastconformer_hybrid_transducer_ctc_large_streaming_80ms_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_uz_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_ar_fastconformer_hybrid_large_pc_v1_0
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_hy_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_en_fastconformer_hybrid_medium_streaming_80ms_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_en_fastconformer_hybrid_medium_streaming_80ms
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_pt_fastconformer_hybrid_large_pc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_es_fastconformer_hybrid_large_pc_nc
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_ar_fastconformer_hybrid_large_pcd_v1_0
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_en_fastconformer_hybrid_large_streaming_multi
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_en_fastconformer_ctc_large
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_en_fastconformer_transducer_large
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_en_fastconformer_ctc_xlarge
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_en_fastconformer_transducer_xlarge
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_en_fastconformer_transducer_xxlarge
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_en_fastconformer_ctc_xxlarge
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__stt_en_fastconformer_tdt_large
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_stt_en_fastconformer_hybrid_large_streaming_1040ms
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_stt_multilingual_fastconformer_hybrid_large_pc_blend_eu
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_rnnt_1_1b
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_ctc_1_1b
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_rnnt_0_6b
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_ctc_0_6b
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_tdt_1_1b
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_tdt_ctc_1_1b
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_tdt_ctc_0_6b_ja
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_tdt_ctc_110m
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_tdt_0_6b_v2
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_rnnt_110m_da_dk
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_tdt_0_6b_v3
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_ctc_0_6b_Vietnamese
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__canary_1b
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__canary_1b_flash
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__canary_180m_flash
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__canary_1b_v2
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__parakeet_realtime_eou_120m_v1
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__multitalker_parakeet_streaming_0_6b_v1
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__nemotron_speech_streaming_en_0_6b
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__canary_qwen_2_5b
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__diar_sortformer_4spk_v1
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__diar_streaming_sortformer_4spk_v2
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__diar_streaming_sortformer_4spk_v2_1
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_titanet_large
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__speakerverification_en_titanet_large
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__ssl_en_nest_large_v1_0
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__ssl_en_nest_xlarge_v1_0
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_vad_multilingual_marblenet
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_vad_multilingual_frame_marblenet
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__Frame_VAD_Multilingual_MarbleNet_v2_0
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__se_den_sb_16k_small
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__se_der_sb_16k_small
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__sr_ssl_flowmatching_16k_430m
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_mel_codec_44khz_medium
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_mel_codec_22khz_fullband_medium
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__low_frame_rate_speech_codec_22khz
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__audio_codec_22khz
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__audio_codec_44khz
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__mel_codec_22khz
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__mel_codec_44khz
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__nemo_nano_codec_22khz_1_78kbps_12_5fps
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__nemo_nano_codec_22khz_1_89kbps_21_5fps
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__nemo_nano_codec_22khz_0_6kbps_12_5fps
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__tts_en_fastpitch
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__tts_hifigan
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_nvidia__magpie_tts_multilingual_357m
timeout: 15
- runner: self-hosted-azure
script: L2_Model_Support_tts_en_e2e_fastspeech2hifigan
timeout: 15
needs: [unit-tests]
runs-on: ${{ matrix.runner }}
name: ${{ matrix.is-optional && 'PLEASEFIXME_' || '' }}${{ matrix.script }}
Expand Down
22 changes: 7 additions & 15 deletions nemo/core/utils/cuda_python_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
# limitations under the License.

import contextlib
import inspect

import numpy as np
import torch
Expand Down Expand Up @@ -171,21 +170,16 @@ def with_conditional_node(while_loop_kernel, while_loop_args, while_loop_conditi
# Use driver API here because of bug in cuda-python runtime API: https://github.com/NVIDIA/cuda-python/issues/55
# TODO: Change call to this after fix goes in (and we bump minimum cuda-python version to 12.4.0):
# node, = cu_call(cudart.cudaGraphAddNode(graph, dependencies, len(dependencies), driver_params))
# depending on cuda-python version, number of parameters vary
num_cuda_graph_add_node_params = len(inspect.signature(cuda.cuGraphAddNode).parameters)
if num_cuda_graph_add_node_params == 5:
# CUDA 13 (cuda-python >= 13.0.0) adds an edgeData parameter to cuGraphAddNode and
# cudaStreamUpdateCaptureDependencies; CUDA 12 does not accept it.
_cuda13 = Version(cuda_python_version) >= Version("13.0.0")
if _cuda13:
(node,) = cu_call(cuda.cuGraphAddNode(graph, dependencies, None, len(dependencies), driver_params))
elif num_cuda_graph_add_node_params == 4:
(node,) = cu_call(cuda.cuGraphAddNode(graph, dependencies, len(dependencies), driver_params))
else:
raise NeMoCUDAPythonException("Unexpected number of parameters for `cuGraphAddNode`")
(node,) = cu_call(cuda.cuGraphAddNode(graph, dependencies, len(dependencies), driver_params))
body_graph = driver_params.conditional.phGraph_out[0]

# depending on cuda-python version, number of parameters vary
num_cuda_stream_update_capture_dependencies_params = len(
inspect.signature(cudart.cudaStreamUpdateCaptureDependencies).parameters
)
if num_cuda_stream_update_capture_dependencies_params == 5:
if _cuda13:
cu_call(
cudart.cudaStreamUpdateCaptureDependencies(
torch.cuda.current_stream(device=device).cuda_stream,
Expand All @@ -195,7 +189,7 @@ def with_conditional_node(while_loop_kernel, while_loop_args, while_loop_conditi
cudart.cudaStreamUpdateCaptureDependenciesFlags.cudaStreamSetCaptureDependencies,
)
)
elif num_cuda_stream_update_capture_dependencies_params == 4:
else:
cu_call(
cudart.cudaStreamUpdateCaptureDependencies(
torch.cuda.current_stream(device=device).cuda_stream,
Expand All @@ -204,8 +198,6 @@ def with_conditional_node(while_loop_kernel, while_loop_args, while_loop_conditi
cudart.cudaStreamUpdateCaptureDependenciesFlags.cudaStreamSetCaptureDependencies,
)
)
else:
raise NeMoCUDAPythonException("Unexpected number of parameters for `cudaStreamUpdateCaptureDependencies`")
body_stream = torch.cuda.Stream(device)
previous_stream = torch.cuda.current_stream(device=device)
cu_call(
Expand Down
Loading
Loading