Skip to content

Commit b6d8016

Browse files
committed
[https://nvbugs/5501557][fix] Fix nemotron build error
* Root cause is Nemotron-nas model includes some no-op attention layers. Signed-off-by: Wanli Jiang <[email protected]>
1 parent 8197c50 commit b6d8016

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -272,8 +272,8 @@ TrtGptModelInflightBatching::TrtGptModelInflightBatching(std::shared_ptr<nvinfer
272272
auto [numKvHeadsPerLayerBegin, numKvHeadsPerLayerEnd] = modelConfig.getNumKvHeadsPerLayerLocalRange(
273273
worldConfig.getPipelineParallelism(), worldConfig.getPipelineParallelRank(), isCrossAttention);
274274
auto numKvHeadsPerLayer = std::vector<SizeType32>(numKvHeadsPerLayerBegin, numKvHeadsPerLayerEnd);
275-
auto windowSizeLayers
276-
= BaseKVCacheManager::groupLayersByWindowSize(maxAttentionWindowVec, modelConfig.getNbLayers());
275+
auto const numLayers = static_cast<SizeType32>(numKvHeadsPerLayer.size());
276+
auto const windowSizeLayers = KVCacheManager::groupLayersByWindowSize(maxAttentionWindowVec, numLayers);
277277
std::map<SizeType32, SizeType32> cacheSizeBytesPerTokenPerWindow;
278278
for (auto const& [windowSize, managedLayers] : windowSizeLayers)
279279
{

0 commit comments

Comments
 (0)