Skip to content

Conversation

@trvachov
Copy link
Collaborator

@trvachov trvachov commented Apr 11, 2025

Description

For the Geneformer documentation:

  1. Capitalization standardization:

    • Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace", "ReLU", "BERT MLM"
    • Corrected spelling of "Crohn's disease" (previously "Chron's disease")
    • Fixed "children" (previously "chidlren")
  2. Formatting improvements:

    • Properly formatted model version bullet points with nesting
    • Added proper headings for property categories
    • Fixed displayed values (e.g., ".5M" → "0.5M")
    • Standardized formatting of data collection/labeling methods sections
  3. Image captions:

    • Replaced low-quality image captions with descriptive, properly formatted titles
    • Made chart descriptions more professional and consistent
  4. Grammatical improvements:

    • Fixed article usage and punctuation
    • Improved sentence structure and clarity
    • Fixed section headings capitalization and consistency
  5. Fixed broken notes:

    • Corrected !! note to !!! note for proper rendering

For the ESM-2 pretraining documentation:

  1. Grammar and clarity improvements:

    • Fixed article usage ("a ESM-2" → "an ESM-2")
    • Fixed formatting of numeric values (e.g., "1." → "1.0")
    • Fixed typos ("depreciation" → "deprecation")
    • Fixed "trainiing" → "training"
  2. Consistency in terminology:

    • Standardized "BioNeMo" capitalization
    • Ensured consistent treatment of "ESM-2" references
  3. Structure and formatting:

    • Improved spacing and paragraph breaks
    • Fixed section formatting and readability

For the training-models documentation:

  1. Capitalization and consistency:

    • Standardized capitalization of model sizes (8M, 650M, 3B)
    • Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML"
    • Changed "WandB" to "Weights and Biases" consistently
  2. Formatting improvements:

    • Changed code blocks consistently to include language tags
    • Added proper spacing and improved paragraph formatting
    • Fixed punctuation in lists and note sections
  3. Grammar and clarity:

    • Added missing commas after introductory phrases
    • Fixed formatting of lists for better readability
    • Made bulleted explanations more consistent

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

  • If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
    automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
  • If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
    /ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Usage

TODO: Add code snippet

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 11, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Collaborator

@jwilber jwilber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some small changes and approved:

  • models/geneformer.md: Changed 20M of 26M cells to 20M of the 26M cells
  • pretrain.md: Changed Pott's model to Potts model (no apostrophe). Added comma here after the word tier here "we are working on a free tier so a credit card...". Remove training after pretraining (redundant) here: To load pretraining training and validation data with mapped UniRef90
    sequences to UniRef50 clusters
  • initialization-guide.md: Separated sentence here: The port number for a Jupyter Lab server, default port is 8888 -> The port number for a Jupyter Lab server. The default port is 8888
    training-models.md: Changed two word issues: context specific -> context-specific, usecase -> use case.

@trvachov trvachov enabled auto-merge April 18, 2025 17:30
@kushshah1
Copy link
Collaborator

kushshah1 commented Apr 21, 2025

@trvachov thanks for sharing this for my review. It looks like the text-related issues in the bug were fixed by the LLM, but issues with charts (outlined in point 2 of the bug report) are still remaining. What is the plan to fix these?

(Also just a note that it looks like the "!!! note" rendering issue is still there on the GitHub preview of the file - see screenshot - but don't know if this will be fine on the docs?)

Screenshot 2025-04-21 at 4 02 34 PM

@trvachov trvachov force-pushed the trvachov/docs-fix branch from 4a1b8eb to f86b91a Compare April 25, 2025 15:35
@trvachov
Copy link
Collaborator Author

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 25, 2025

/ok to test

@trvachov, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@trvachov trvachov force-pushed the trvachov/docs-fix branch from f86b91a to bfd1cee Compare April 25, 2025 15:54
@trvachov
Copy link
Collaborator Author

/ok to test bfd1cee

@trvachov trvachov added this pull request to the merge queue Apr 25, 2025
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.39%. Comparing base (192e537) to head (bfd1cee).

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #826      +/-   ##
==========================================
- Coverage   84.40%   84.39%   -0.02%     
==========================================
  Files         138      138              
  Lines        8685     8685              
==========================================
- Hits         7331     7330       -1     
- Misses       1354     1355       +1     

see 1 file with indirect coverage changes

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Apr 25, 2025
Signed-off-by: Timur Rvachov <[email protected]>
@trvachov trvachov force-pushed the trvachov/docs-fix branch from bfd1cee to 38cd661 Compare April 25, 2025 17:45
@trvachov trvachov enabled auto-merge April 25, 2025 17:45
@trvachov
Copy link
Collaborator Author

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 25, 2025

/ok to test

@trvachov, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@pstjohn
Copy link
Collaborator

pstjohn commented Apr 25, 2025

/ok to test 38cd661

@trvachov trvachov added this pull request to the merge queue Apr 25, 2025
@trvachov trvachov removed this pull request from the merge queue due to a manual request Apr 25, 2025
trvachov and others added 2 commits April 25, 2025 16:27
Co-authored-by: lvojtku <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Co-authored-by: lvojtku <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
@trvachov trvachov enabled auto-merge April 25, 2025 20:30
@trvachov
Copy link
Collaborator Author

/ok to test 8e79b71

@trvachov trvachov added this pull request to the merge queue Apr 25, 2025
Merged via the queue into main with commit effc955 Apr 25, 2025
10 checks passed
@trvachov trvachov deleted the trvachov/docs-fix branch April 25, 2025 21:56
cspades pushed a commit that referenced this pull request May 4, 2025
### Description

### For the Geneformer documentation:

1. **Capitalization standardization**:
- Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace",
"ReLU", "BERT MLM"
- Corrected spelling of "Crohn's disease" (previously "Chron's disease")
   - Fixed "children" (previously "chidlren")

2. **Formatting improvements**:
   - Properly formatted model version bullet points with nesting
   - Added proper headings for property categories
   - Fixed displayed values (e.g., ".5M" → "0.5M")
- Standardized formatting of data collection/labeling methods sections

3. **Image captions**:
- Replaced low-quality image captions with descriptive, properly
formatted titles
   - Made chart descriptions more professional and consistent

4. **Grammatical improvements**:
   - Fixed article usage and punctuation
   - Improved sentence structure and clarity
   - Fixed section headings capitalization and consistency

5. **Fixed broken notes**:
   - Corrected `!! note` to `!!! note` for proper rendering

### For the ESM-2 pretraining documentation:

1. **Grammar and clarity improvements**:
   - Fixed article usage ("a ESM-2" → "an ESM-2")
   - Fixed formatting of numeric values (e.g., "1." → "1.0")
   - Fixed typos ("depreciation" → "deprecation")
   - Fixed "trainiing" → "training"

2. **Consistency in terminology**:
   - Standardized "BioNeMo" capitalization
   - Ensured consistent treatment of "ESM-2" references

3. **Structure and formatting**:
   - Improved spacing and paragraph breaks
   - Fixed section formatting and readability

### For the training-models documentation:

1. **Capitalization and consistency**:
   - Standardized capitalization of model sizes (8M, 650M, 3B)
   - Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML"
   - Changed "WandB" to "Weights and Biases" consistently

2. **Formatting improvements**:
   - Changed code blocks consistently to include language tags
   - Added proper spacing and improved paragraph formatting
   - Fixed punctuation in lists and note sections

3. **Grammar and clarity**:
   - Added missing commas after introductory phrases
   - Fixed formatting of lists for better readability
   - Made bulleted explanations more consistent

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [x]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

---------

Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Co-authored-by: lvojtku <[email protected]>
Signed-off-by: Cory Ye <[email protected]>
farhadrgh pushed a commit that referenced this pull request May 5, 2025
### Description

### For the Geneformer documentation:

1. **Capitalization standardization**:
- Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace",
"ReLU", "BERT MLM"
- Corrected spelling of "Crohn's disease" (previously "Chron's disease")
   - Fixed "children" (previously "chidlren")

2. **Formatting improvements**:
   - Properly formatted model version bullet points with nesting
   - Added proper headings for property categories
   - Fixed displayed values (e.g., ".5M" → "0.5M")
- Standardized formatting of data collection/labeling methods sections

3. **Image captions**:
- Replaced low-quality image captions with descriptive, properly
formatted titles
   - Made chart descriptions more professional and consistent

4. **Grammatical improvements**:
   - Fixed article usage and punctuation
   - Improved sentence structure and clarity
   - Fixed section headings capitalization and consistency

5. **Fixed broken notes**:
   - Corrected `!! note` to `!!! note` for proper rendering

### For the ESM-2 pretraining documentation:

1. **Grammar and clarity improvements**:
   - Fixed article usage ("a ESM-2" → "an ESM-2")
   - Fixed formatting of numeric values (e.g., "1." → "1.0")
   - Fixed typos ("depreciation" → "deprecation")
   - Fixed "trainiing" → "training"

2. **Consistency in terminology**:
   - Standardized "BioNeMo" capitalization
   - Ensured consistent treatment of "ESM-2" references

3. **Structure and formatting**:
   - Improved spacing and paragraph breaks
   - Fixed section formatting and readability

### For the training-models documentation:

1. **Capitalization and consistency**:
   - Standardized capitalization of model sizes (8M, 650M, 3B)
   - Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML"
   - Changed "WandB" to "Weights and Biases" consistently

2. **Formatting improvements**:
   - Changed code blocks consistently to include language tags
   - Added proper spacing and improved paragraph formatting
   - Fixed punctuation in lists and note sections

3. **Grammar and clarity**:
   - Added missing commas after introductory phrases
   - Fixed formatting of lists for better readability
   - Made bulleted explanations more consistent


### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [x]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

---------

Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Co-authored-by: lvojtku <[email protected]>
Signed-off-by: Farhad Ramezanghorbani <[email protected]>
camirr-nv pushed a commit that referenced this pull request Jun 26, 2025
### Description

### For the Geneformer documentation:

1. **Capitalization standardization**:
- Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace",
"ReLU", "BERT MLM"
- Corrected spelling of "Crohn's disease" (previously "Chron's disease")
   - Fixed "children" (previously "chidlren")

2. **Formatting improvements**:
   - Properly formatted model version bullet points with nesting
   - Added proper headings for property categories
   - Fixed displayed values (e.g., ".5M" → "0.5M")
- Standardized formatting of data collection/labeling methods sections

3. **Image captions**:
- Replaced low-quality image captions with descriptive, properly
formatted titles
   - Made chart descriptions more professional and consistent

4. **Grammatical improvements**:
   - Fixed article usage and punctuation
   - Improved sentence structure and clarity
   - Fixed section headings capitalization and consistency

5. **Fixed broken notes**:
   - Corrected `!! note` to `!!! note` for proper rendering

### For the ESM-2 pretraining documentation:

1. **Grammar and clarity improvements**:
   - Fixed article usage ("a ESM-2" → "an ESM-2")
   - Fixed formatting of numeric values (e.g., "1." → "1.0")
   - Fixed typos ("depreciation" → "deprecation")
   - Fixed "trainiing" → "training"

2. **Consistency in terminology**:
   - Standardized "BioNeMo" capitalization
   - Ensured consistent treatment of "ESM-2" references

3. **Structure and formatting**:
   - Improved spacing and paragraph breaks
   - Fixed section formatting and readability

### For the training-models documentation:

1. **Capitalization and consistency**:
   - Standardized capitalization of model sizes (8M, 650M, 3B)
   - Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML"
   - Changed "WandB" to "Weights and Biases" consistently

2. **Formatting improvements**:
   - Changed code blocks consistently to include language tags
   - Added proper spacing and improved paragraph formatting
   - Fixed punctuation in lists and note sections

3. **Grammar and clarity**:
   - Added missing commas after introductory phrases
   - Fixed formatting of lists for better readability
   - Made bulleted explanations more consistent

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [x]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

---------

Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Co-authored-by: lvojtku <[email protected]>
Signed-off-by: Ubuntu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants