Skip to content

remove language_modeling#14192

Merged
dimapihtar merged 72 commits intomainfrom
dpykhtar/remove_language_modelling
Oct 14, 2025
Merged

remove language_modeling#14192
dimapihtar merged 72 commits intomainfrom
dpykhtar/remove_language_modelling

Conversation

@dimapihtar
Copy link
Collaborator

@dimapihtar dimapihtar commented Jul 10, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Removes nlp/language_modelling.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: dimapihtar <[email protected]>
@github-actions github-actions bot added the NLP label Jul 10, 2025
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
@github-actions github-actions bot added the TTS label Jul 15, 2025
@dimapihtar dimapihtar marked this pull request as ready for review July 15, 2025 14:11

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
nemo.collections.nlp.modules.common.retro_inference_strategies
begins an import cycle.
dimapihtar and others added 6 commits July 15, 2025 07:42
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Comment on lines +29 to +32
from nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_chat_dataset import (
_get_header_conversation_type_mask_role,
get_prompt_template_example,
)

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'get_prompt_template_example' is not used.

Copilot Autofix

AI 8 months ago

To fix the problem:

  1. Remove the unused import get_prompt_template_example from the nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_chat_dataset module.
  2. Ensure that the removal does not affect the functionality of the code, as no references to get_prompt_template_example exist in the file.

Detailed steps:

  • Locate the import statement starting on line 29.
  • Remove the specific get_prompt_template_example from the import list while keeping any other imports intact (_get_header_conversation_type_mask_role).

Suggested changeset 1
nemo/collections/nlp/modules/common/text_generation_server.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/nlp/modules/common/text_generation_server.py b/nemo/collections/nlp/modules/common/text_generation_server.py
--- a/nemo/collections/nlp/modules/common/text_generation_server.py
+++ b/nemo/collections/nlp/modules/common/text_generation_server.py
@@ -28,7 +28,6 @@
 try:
     from nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_chat_dataset import (
         _get_header_conversation_type_mask_role,
-        get_prompt_template_example,
     )
 
     HAVE_NLP = True
EOF
@@ -28,7 +28,6 @@
try:
from nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_chat_dataset import (
_get_header_conversation_type_mask_role,
get_prompt_template_example,
)

HAVE_NLP = True
Copilot is powered by AI and may make mistakes. Always verify output.
Signed-off-by: dimapihtar <[email protected]>
dimapihtar and others added 2 commits July 15, 2025 16:52
Signed-off-by: dimapihtar <[email protected]>
Copy link
Collaborator

@chtruong814 chtruong814 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimapihtar this is all nemo1 code we're removing?

@chtruong814
Copy link
Collaborator

@dimapihtar I think that last test is failing because the path for helpers.cpp file was renamed.

Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
@dimapihtar
Copy link
Collaborator Author

@dimapihtar this is all nemo1 code we're removing?

no, we have nemo.nlp.modules left which will be removed in the next separate PR. It's just complicated to remove everything in a single PR.

@dimapihtar
Copy link
Collaborator Author

@dimapihtar I think that last test is failing because the path for helpers.cpp file was renamed.

It was failing because I forgot to move Makefile in addition to helpers.cpp

@github-actions
Copy link
Contributor

[🤖]: Hi @dimapihtar 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI common Multi Modal NLP r2.5.0 Cherry-pick label for the 2.5.0 release TTS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants