refactor mixtral moe block. by lkk12014402 · Pull Request #1635 · huggingface/optimum-habana

lkk12014402 · 2024-12-19T07:01:03Z

What does this PR do?

moe block forward regresion brought by #1511 for training

lkk12014402 · 2024-12-19T07:05:05Z

the pr fix the segmentation fault issue that caused by DynamicMoE from this #1511, when train mixtral model

lkk12014402 · 2024-12-19T09:21:39Z

training

DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 python ../gaudi_spawn.py --world_size 4 --use_deepspeed sft.py \
    --model_name_or_path mistralai/Mixtral-8x7B-Instruct-v0.1 \
    --dataset_name "philschmid/dolly-15k-oai-style" \
    --subset 'data/' \
    --streaming False \
    --deepspeed ../language-modeling/llama2_ds_zero3_config.json \
    --output_dir="./model_mixtral" \
    --do_train \
    --max_steps=500 \
    --logging_steps=10 \
    --save_steps=100 \
    --per_device_train_batch_size=2 \
    --per_device_eval_batch_size=1 \
    --gradient_accumulation_steps=2 \
    --learning_rate=1e-4 \
    --lr_scheduler_type="cosine" \
    --warmup_steps=100 \
    --weight_decay=0.05 \
    --optim="paged_adamw_32bit" \
    --lora_target_modules "q_proj" "v_proj" \
    --bf16 \
    --remove_unused_columns=False \
    --max_seq_length 512 \
    --run_name="sft_mixtral" \
    --report_to=none \
    --use_habana \
    --use_lazy_mode

lkk12014402 · 2024-12-19T09:55:35Z

inference

QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_generation.py --model_name_or_path "mistralai/Mixtral-8x7B-Instruct-v0.1" --use_hpu_graphs --use_kv_cache --limit_hpu_graphs --bucket_size 128 --max_new_tokens 128 --batch_size 1 --bf16

QUANT_CONFIG=./quantization_config/maxabs_quant_mixtral.json python run_generation.py --model_name_or_path "mistralai/Mixtral-8x7B-Instruct-v0.1" --use_hpu_graphs --use_kv_cache --limit_hpu_graphs --bucket_size 128 --max_new_tokens 128 --batch_size 1 --bf16

python run_generation.py --model_name_or_path "mistralai/Mixtral-8x7B-Instruct-v0.1" --use_hpu_graphs --use_kv_cache --limit_hpu_graphs --bucket_size 128 --max_new_tokens 512 --batch_size 4 --bf16

* Add flag to run inference with partial dataset (huggingface#1420) * Add peft generation example (huggingface#1427) * Upgrade to SynapseAI 1.18.0 (huggingface#1418) * Simplify HQT config files (huggingface#1219) * unify_measurements.py script support to unify PCQ 70B 8x (huggingface#1322) * Add misc. training args (huggingface#1346) * Add quantization config for low bs case (huggingface#1377) * Remove HQT from OHF (huggingface#1257) Co-authored-by: Adam Stachowicz <[email protected]> Co-authored-by: Adam Stachowicz <[email protected]> Co-authored-by: Yeonsil Yoon <[email protected]> * Load INC GPTQ checkpoint & rename params (huggingface#1364) Co-authored-by: Yaser Afshar <[email protected]> Co-authored-by: Harish Subramony <[email protected]> Co-authored-by: Yeonsil Yoon <[email protected]> * Enable FusedSDPA fp8 in Llama FT (huggingface#1388) Co-authored-by: Yaser Afshar <[email protected]> Co-authored-by: Harish Subramony <[email protected]> * Valid sequence length for sdpa (huggingface#1183) Co-authored-by: Harish <[email protected]> Co-authored-by: Libin Tang <[email protected]> Co-authored-by: regisss <[email protected]> * Multiple fixes (dynamo graph break, qwen-moe, multicard) (huggingface#1410) * datasets downgrade version to 2.21.0 (huggingface#1413) * Update ci sentence_transformer.sh (huggingface#1424) * Fix load INC load weights compile error due to Transformer 4.45 upgrade. (huggingface#1421) * Update language-modeling README.md, add trust_remote_code for flan-t5-xl (huggingface#1422) * Update unify_measurements.py support info (huggingface#1425) * GPT2 torch.compile fix (huggingface#1434) * Added missing allocate_kv_cache() call in CausalLM class (huggingface#1431) * Fix merge error and update text-to-speech readme (huggingface#1436) * Fix OOM error for code llama (huggingface#1437) * Fix error on 4bit checkpoint load with run_lm_eval on TF4.45.2 (huggingface#1439) * Fix scoped linear all-reduce for starcoder model (huggingface#1432) * Fixed recursion error in SentenceTransformer (huggingface#1428) * Fix Llama 3.1 generation (huggingface#1444) * Update text-gen README.md to add auto-gptq fork install steps (huggingface#1442) * Added gemma specific fp8 quantization file (huggingface#1445) * Remove cache folder from image data folder (huggingface#1446) Co-authored-by: regisss <[email protected]> * Bump dev version * Enable DeepSpeed for image-to-text example (huggingface#1455) * Fix bug when loading 4bit checkpoint quantized in INC (huggingface#1447) * Fixes 'Tokenizer does not have padding token' introduced by huggingface#1444 for Llama3.1 (huggingface#1457) * Fix facebook/hf-seamless-m4t-medium crash (huggingface#1433) Signed-off-by: Wang, Yi A <[email protected]> * Fix bias update in scoped all reduce (huggingface#1456) * Added skip for unsuported tests for mistral/mixtral (huggingface#1462) * Update sentence transformer to v3.2.1 (huggingface#1470) * Optimized inference of Cohere model on HPU (huggingface#1329) Signed-off-by: Ye, Xinyu <[email protected]> * Idefics2 (huggingface#1270) Signed-off-by: Wang, Yi A <[email protected]> * Remove deprecated Mixed precision flags (huggingface#1471) Change-Id: I1c2e2460dc2072ba7b311f239441b304694918c8 * Optimized inference of XGLM model on HPU (huggingface#1323) Signed-off-by: Ye, Xinyu <[email protected]> * Add mllama support (huggingface#1419) Signed-off-by: Wang, Yi A <[email protected]> * Enable flash attention for gemma (huggingface#1454) * Readme: replace tabs with spaces (huggingface#1485) * Move fast tests to Gaudi2 (huggingface#1498) * Support loading 4 bit Qwen2 (huggingface#1476) Signed-off-by: Mengni Wang <[email protected]> * Add textual inversion XL for Gaudi (huggingface#868) Signed-off-by: Daniel Socek <[email protected]> Co-authored-by: Iman Gohari <[email protected]> * Remove torch req from LM example (huggingface#1491) * Remove keep_input_mutations (huggingface#1492) * Fix trust_remote_code (huggingface#1493) * Upgrade ViT README with torch.compile (huggingface#1494) * Tests for text gen output text (huggingface#1411) * Corrected Throughput measure for GaudiDDPMPipeline (huggingface#1460) * Fix text generation test * Add G3 in T5-L README (huggingface#1523) * Fix tuple object error (huggingface#1354) * Add warmup time and compile time log for the eval/prediction. (huggingface#1489) * Fix style * Enable `paligemma` model for image-to-text example (huggingface#1407) Signed-off-by: Liu, Kaixuan <[email protected]> Co-authored-by: regisss <[email protected]> * Add support for MLPERF optimized pipeline from example (huggingface#1465) Co-authored-by: sushil dubey <[email protected]> * Enable Gemma2 Inference on Gaudi (huggingface#1504) Signed-off-by: Wang, Yi A <[email protected]> Signed-off-by: Ye, Xinyu <[email protected]> Signed-off-by: Mengni Wang <[email protected]> Signed-off-by: Daniel Socek <[email protected]> Co-authored-by: billishyahao <[email protected]> Co-authored-by: Harish Subramony <[email protected]> Co-authored-by: Yeonsil Yoon <[email protected]> Co-authored-by: Seunghyuk Park (shepark) <[email protected]> Co-authored-by: regisss <[email protected]> Co-authored-by: Sun Choi <[email protected]> Co-authored-by: xinhe <[email protected]> Co-authored-by: Mohit Deopujari <[email protected]> Co-authored-by: Wang, Yi <[email protected]> Co-authored-by: Soila Kavulya <[email protected]> Co-authored-by: Iman Gohari <[email protected]> Co-authored-by: ZhengHongming888 <[email protected]> Co-authored-by: XinyuYe-Intel <[email protected]> Co-authored-by: Vivek Goel <[email protected]> Co-authored-by: Akihiro Takahashi <[email protected]> Co-authored-by: Miroslav Goncharenko <[email protected]> Co-authored-by: Wang, Mengni <[email protected]> Co-authored-by: Daniel Socek <[email protected]> Co-authored-by: Adam Stachowicz <[email protected]> Co-authored-by: Vidya Galli <[email protected]> Co-authored-by: deepak-gowda-narayana <[email protected]> * Add check_neural_compressor_min_version for 4 bit behavior (huggingface#1500) Signed-off-by: Xin <[email protected]> Signed-off-by: xinhe3 <[email protected]> Co-authored-by: xinhe3 <[email protected]> * Fixed Gemma FP8 flash_attention lower throughput issue (huggingface#1510) * Pass "lazy_mode" arg to GaudiLlamaModel GaudiTrainer (huggingface#1515) Co-authored-by: Marcin Łapiński <[email protected]> * Removed workaround for NaN bug causing graph break. (huggingface#1516) Co-authored-by: Marcin Łapiński <[email protected]> * Disable default sdpa in Albert (#22) (huggingface#1517) Co-authored-by: Urszula Golowicz <[email protected]> * Implement fused sdpa for wav2vec2 (#18) (huggingface#1520) * Memory optimization for gpt_bitcode (#4) (huggingface#1513) Co-authored-by: Urszula Golowicz <[email protected]> * text_generation: improve parameters check (huggingface#1527) * transformers: fixed some typos (huggingface#1528) * Update DeepSpeed CI baselines * Update FSDP CI baseline * Optimum-Habana docs re-org (huggingface#1488) Signed-off-by: Daniel Socek <[email protected]> Co-authored-by: Greg Serochi <[email protected]> Co-authored-by: Kiangpeng Lau <[email protected]> Co-authored-by: Seethong Vang <[email protected]> Co-authored-by: regisss <[email protected]> Co-authored-by: Anastasia Uvarova <[email protected]> Co-authored-by: Mohit Deopujari <[email protected]> Co-authored-by: Chen Levkovich <[email protected]> Co-authored-by: Libin Tang <[email protected]> * Makes the with_stack of the profiler changeable (huggingface#1497) * FLUX with diffusers 0.31.0 (huggingface#1450) Signed-off-by: Daniel Socek <[email protected]> Co-authored-by: Baochen Yang <[email protected]> Co-authored-by: Huijuan Zhou <[email protected]> Co-authored-by: Sergey Plotnikov <[email protected]> Co-authored-by: Deepak Narayana <[email protected]> Co-authored-by: regisss <[email protected]> * Fix some CI baselines * Add split runners to CI (2 devices per runner for fast tests) * Fix fast CI to work with split runners (huggingface#1534) * Fix dtype issue with valid sequence length in torch.compile bs=1 (huggingface#1532) * Support beam search with reuse_cache and bucket_internal (huggingface#1472) * Add mixtral trl sft (huggingface#1349) * Enable tiiuae/falcon-11B-vlm in image_to_text example (huggingface#1490) Signed-off-by: Wang, Yi A <[email protected]> * Add Llama 3.1 ft to CI (huggingface#1529) * Migrate OH CLIP (roberta-clip) training to torch.compile (huggingface#1507) * test_text_generation: fix non-Gaudi2 case (huggingface#1530) * text-generation: improve output printing (huggingface#1486) * Text-generation, model set-up: torch.compile for attributes instead of models' types (huggingface#1452) * FLUX Fine-Tuning for Gaudi (huggingface#1482) Signed-off-by: Daniel Socek <[email protected]> * Enable fusedsdpa kernel for vision part of mllama (huggingface#1531) Signed-off-by: Wang, Yi A <[email protected]> * Minicpm enabling (huggingface#1342) Signed-off-by: Daniel Huang <[email protected]> * Fix bridgetower example (#312) (huggingface#1481) * Migrate OH Wave2Vec-AC training to torch.compile - README update (huggingface#1537) Co-authored-by: Chaojun Zhang <[email protected]> * Flux Image-To-Image pipeline (huggingface#1524) Signed-off-by: Daniel Socek <[email protected]> Co-authored-by: Iman Gohari <[email protected]> * Enable Falcon-mamba (huggingface#1480) Signed-off-by: yuanwu <[email protected]> Co-authored-by: regisss <[email protected]> * Enable dynamic compile for mpi(training) (huggingface#1509) * Migrate OH T5-large training to torch.compile (huggingface#1506) * Add support for Baichuan2 (huggingface#1479) Signed-off-by: Haihao Xiang <[email protected]> Co-authored-by: Jianqian Zhou <[email protected]> Co-authored-by: Wei Lin <[email protected]> * trainer: fixed spelling (huggingface#1538) * Create CI Eager/Lazy for Language Modeling (huggingface#1448) * Fixes for llava-next test failures in 1.19 (huggingface#1535) Co-authored-by: regisss <[email protected]> * Enable DeepSeek-V2 (huggingface#1475) Signed-off-by: Matrix YAO <[email protected]> * Refactor Qwen2 Family (huggingface#1541) * Add support for optimized SDXL pipeline (huggingface#1519) * Make style * Add the checkout parameters of falcon-mamba pytest (huggingface#1540) Signed-off-by: yuanwu <[email protected]> Co-authored-by: regisss <[email protected]> * Avoid negative values in eval metrics (huggingface#1533) * Fixes in unify_measurements (huggingface#1496) Co-authored-by: yan tomsinsky <[email protected]> Co-authored-by: Eran Geva <[email protected]> * Fix lm_eval script for starcoder and gemma (huggingface#1463) * Add option to use bf16 in PT sdp (#5) (huggingface#1514) Co-authored-by: Urszula Golowicz <[email protected]> * Fix tests.test_peft_inference failure (huggingface#1543) Signed-off-by: Wang, Yi A <[email protected]> * [wav2vec2] Remove tensor.item and dynamic slicing operations in the loop that cause graph break (huggingface#1508) * Update lm_eval version (huggingface#1473) Co-authored-by: regisss <[email protected]> * Fix lm_eval script for starcoder and gemma (huggingface#1463) * Add option to use bf16 in PT sdp (#5) (huggingface#1514) Co-authored-by: Urszula Golowicz <[email protected]> * Fix tests.test_peft_inference failure (huggingface#1543) Signed-off-by: Wang, Yi A <[email protected]> * Update lm_eval version (huggingface#1473) Co-authored-by: regisss <[email protected]> * Fix bad import in Baichuan code (huggingface#1547) * Restore performance in generate (huggingface#1546) Signed-off-by: Urszula Golowicz <[email protected]> Co-authored-by: Marcin Łapiński <[email protected]> Co-authored-by: Adam Stachowicz <[email protected]> * Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459) Co-authored-by: regisss <[email protected]> * Add HF login for 8x Gaudi2 CI * Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501) Co-authored-by: regisss <[email protected]> * Fix bad import in Baichuan code (huggingface#1547) * Restore performance in generate (huggingface#1546) Signed-off-by: Urszula Golowicz <[email protected]> Co-authored-by: Marcin Łapiński <[email protected]> Co-authored-by: Adam Stachowicz <[email protected]> * Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459) Co-authored-by: regisss <[email protected]> * Add HF login for 8x Gaudi2 CI * Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501) Co-authored-by: regisss <[email protected]> * Fix Llama CI * Fix Llama CI * Add DynamicMoE support for Mixtral (huggingface#1511) Co-authored-by: Adam Stachowicz <[email protected]> * Fix for llava models not generating text with test failures in 1.19 (huggingface#1548) * Refactor KV cache, Rope , reduce common code (huggingface#1148) Co-authored-by: regisss <[email protected]> * Adjust Qwen2-7B test case (huggingface#1551) * [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553) Signed-off-by: Focus Luo <[email protected]> * Fix for single_card llama7b and falcon40b CI errors (huggingface#1549) * Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545) Co-authored-by: Yixiu Chen <[email protected]> Co-authored-by: Libin Tang <[email protected]> * Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557) * Fix accuracy regression in Gemma (huggingface#1556) * Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562) * Add DynamicMoE support for Mixtral (huggingface#1511) Co-authored-by: Adam Stachowicz <[email protected]> * Fix for llava models not generating text with test failures in 1.19 (huggingface#1548) * Refactor KV cache, Rope , reduce common code (huggingface#1148) Co-authored-by: regisss <[email protected]> * Adjust Qwen2-7B test case (huggingface#1551) * [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553) Signed-off-by: Focus Luo <[email protected]> * Fix for single_card llama7b and falcon40b CI errors (huggingface#1549) * Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545) Co-authored-by: Yixiu Chen <[email protected]> Co-authored-by: Libin Tang <[email protected]> * Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557) * Fix accuracy regression in Gemma (huggingface#1556) * Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562) * Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563) * Update README commands for the models to use --sdp_on_bf16 (huggingface#1566) * Minicpm patch (huggingface#1567) Signed-off-by: Daniel Huang <[email protected]> * Updated gemma_2b_it CI (huggingface#1561) Co-authored-by: regisss <[email protected]> * Fixed Adalora Test for OH 1.15 (huggingface#1564) * Fixed LORACP Test for OH 1.15 (huggingface#1568) * Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563) * Update README commands for the models to use --sdp_on_bf16 (huggingface#1566) * Minicpm patch (huggingface#1567) Signed-off-by: Daniel Huang <[email protected]> * Updated gemma_2b_it CI (huggingface#1561) Co-authored-by: regisss <[email protected]> * Fixed Adalora Test for OH 1.15 (huggingface#1564) * Fixed LORACP Test for OH 1.15 (huggingface#1568) * Add requirements.txt * Update the baseline for 1.18 to reflect performance in 1.19 (huggingface#1571) * Fix prefix llama ci failure (huggingface#1570) Signed-off-by: Wang, Yi A <[email protected]> * fusedsdpa for stable diffusion xl (huggingface#1565) Co-authored-by: regisss <[email protected]> * Fix prefix llama ci failure (huggingface#1570) Signed-off-by: Wang, Yi A <[email protected]> * Add sdp_on_bf16 to tests,text-gen (huggingface#1559) * Fix mllama test (huggingface#1569) Signed-off-by: Wang, Yi A <[email protected]> * Fix lazy_mode assignment (huggingface#1558) Co-authored-by: Yaser Afshar <[email protected]> * Fix mllama test (huggingface#1569) Signed-off-by: Wang, Yi A <[email protected]> * Fix lazy_mode assignment (huggingface#1558) Co-authored-by: Yaser Afshar <[email protected]> * Fix diffusers import (huggingface#1574) * Update README commands for more models to use --sdp_on_bf16 (huggingface#1575) Co-authored-by: Libin Tang <[email protected]> * Generation utils update (minor) (huggingface#1468) * style: removed tabs (huggingface#1577) * Add chatglm (huggingface#1478) Co-authored-by: Wei Lin <[email protected]> Co-authored-by: Jianqian Zhou <[email protected]> Co-authored-by: Leo Zhao <[email protected]> * Enable num_return_sequences in beam search (huggingface#1536) * gpt_bigcode: added internal bucketing fix (huggingface#1526) * Update the Gaudi trainer with transformers 4.45.2 (huggingface#1398) * Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578) * Revert PR huggingface#1473 (huggingface#1582) * Enable num_return_sequences in beam search (huggingface#1536) * gpt_bigcode: added internal bucketing fix (huggingface#1526) * Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578) * Revert PR huggingface#1473 (huggingface#1582) * Remove deprecated env variables * Add sdp_on_bf16 argument to CI for run_image2text_lora_finetune and a… (huggingface#1585) * Remove unnecessary neural compressor fix for 1.19 release (huggingface#1584) * Make style * Fixed spelling (huggingface#1576) * Update docs for baichuan2 training (huggingface#1586) * Fixed spelling (huggingface#1576) * Update docs for baichuan2 training (huggingface#1586) * Adjust bert and roberta targets (huggingface#1588) * Update text-gen readme for autogptq (huggingface#1589) * Update README to Include Information on Performance Degradation and Mitigation Options (huggingface#1555) * Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591) * Readme update for llama-405B (huggingface#1587) Co-authored-by: Mohit Sinha <[email protected]> Co-authored-by: Seunghyuk Park (shepark) <[email protected]> Co-authored-by: regisss <[email protected]> * Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591) * Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590) * Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590) * Add sdp_on_bf16 option to diffusers and image/audio classicifation tests (huggingface#1592) * Update transformers tests generation util v4.45.2 (huggingface#1441) Co-authored-by: Gustavo <gustavo.malkomes> Co-authored-by: Yaser Afshar <[email protected]> Co-authored-by: regisss <[email protected]> * Update README.md (huggingface#1595) * Limit position embeddings in inference (huggingface#1598) Co-authored-by: Adam Stachowicz <[email protected]> * Verify model output is provided when check_output is enabled (huggingface#1597) * Limit position embeddings in inference (huggingface#1598) Co-authored-by: Adam Stachowicz <[email protected]> * Verify model output is provided when check_output is enabled (huggingface#1597) * Update README.md (huggingface#1595) * Fix scikit-learn to 1.5.2 to fix f1 evaluation crash in 1.6.0 (huggingface#1596) Signed-off-by: Wang, Yi A <[email protected]> * Revert common KVCache not to check token_idx (huggingface#1594) * Update language-modeling README file (huggingface#1599) Co-authored-by: Libin Tang <[email protected]> Co-authored-by: regisss <[email protected]> * Update readme for audio-classification example (huggingface#1602) * SDPA flag update - static code analysis (huggingface#1601) * Revert common KVCache not to check token_idx (huggingface#1594) * Remove unwanted merged changes in SD pipeline * Revert LlamaKVCache due to memory increase (huggingface#1605) * Check rope_scaling attr (huggingface#1609) * skip certain tests for G1 with empty param list (huggingface#1613) * Revert "Update transformers tests generation util v4.45.2 (huggingface#1441)" (huggingface#1614) This reverts commit 2ba520a. * audio classification readme update (huggingface#1604) * fix readme cmds for clip-roberta (huggingface#1603) * fix readme cmds for clip-roberta * comments and cleanup * Fix run_generation test commands for TRL out usage example (huggingface#1624) Fix run_generation example * Add arbitrary scales (#15) (huggingface#1625) Co-authored-by: Linoy Buchnik <[email protected]> * Modify Qwen2 TRL command to avoid OOM. (huggingface#1630) Add --use_flash_attention to avoid OOM for Qwen2 * Replace the UNET custom attention processors (huggingface#1608) Co-authored-by: Iman Gohari <[email protected]> * Falcon Model Support (huggingface#1612) Co-authored-by: leopck <[email protected]> Co-authored-by: regisss <[email protected]> * Update sdp_on_bf16 option for ST example (huggingface#1615) * Update save lora weights for diffusers with text_encoder_2 layers (huggingface#1626) * Fix `save_lora_weights` in `pipeline_utils.py` (huggingface#1643) * Refactor mixtral moe block. (huggingface#1635) * speech-recognition: downgrade datasets version (huggingface#1646) * add sdp_on_bf16 to controlnet (huggingface#1631) * add sdp_on_bf16 to controlnet * Update pipeline_controlnet.py pass sdp_on_bf16 to controlnet_pipeline * Update text_to_image_generation.py * Update text_to_image_generation.py * Quick fix for quantization/custom op list loading (huggingface#1657) Signed-off-by: Daniel Socek <[email protected]> * Update multi-node test dockerfile (huggingface#1662) * Fixes on OH 1.15 pre release (huggingface#1661) Co-authored-by: regisss <[email protected]> * Fix distributed issue for ST Trainer (huggingface#1649) * Fix distributed issue for timm (huggingface#1653) Co-authored-by: regisss <[email protected]> * Added missing parameter for llama function call (huggingface#1663) Co-authored-by: Libin Tang <[email protected]> * Add reuse_cache for llama3-405b measurement (huggingface#1664) * Update EFA dockerfile to SynapseAI 1.19.0 (huggingface#1665) Co-authored-by: Libin Tang <[email protected]> * Fix bug for GaudiMixtralAttentionLongSequence forward (huggingface#1650) Signed-off-by: kaixuanliu <[email protected]> * Update to SynapseAI v1.19 * Release: v1.15.0 * Fix style * save_model - incorrect conflict resolution * Fix style --------- Signed-off-by: Wang, Yi A <[email protected]> Signed-off-by: Ye, Xinyu <[email protected]> Signed-off-by: Mengni Wang <[email protected]> Signed-off-by: Daniel Socek <[email protected]> Signed-off-by: Liu, Kaixuan <[email protected]> Signed-off-by: Xin <[email protected]> Signed-off-by: xinhe3 <[email protected]> Signed-off-by: Daniel Huang <[email protected]> Signed-off-by: yuanwu <[email protected]> Signed-off-by: Haihao Xiang <[email protected]> Signed-off-by: Matrix YAO <[email protected]> Signed-off-by: Urszula Golowicz <[email protected]> Signed-off-by: Focus Luo <[email protected]> Signed-off-by: kaixuanliu <[email protected]> Co-authored-by: Pramod Kumar <[email protected]> Co-authored-by: Wang, Yi <[email protected]> Co-authored-by: regisss <[email protected]> Co-authored-by: Roi Tiefenbrunn <[email protected]> Co-authored-by: Yan Tomsinsky <[email protected]> Co-authored-by: Konrad Drozd <[email protected]> Co-authored-by: Uri Livne <[email protected]> Co-authored-by: Yeonsil Yoon <[email protected]> Co-authored-by: Danny Semiat <[email protected]> Co-authored-by: Yaser Afshar <[email protected]> Co-authored-by: Harish Subramony <[email protected]> Co-authored-by: Piotr Bielak <[email protected]> Co-authored-by: Sayantan Sarkar <[email protected]> Co-authored-by: Harish <[email protected]> Co-authored-by: Libin Tang <[email protected]> Co-authored-by: ZhengHongming888 <[email protected]> Co-authored-by: Jimin Ha <[email protected]> Co-authored-by: Seunghyuk Park (shepark) <[email protected]> Co-authored-by: Dmitry <[email protected]> Co-authored-by: Soila Kavulya <[email protected]> Co-authored-by: Sun Choi <[email protected]> Co-authored-by: xinhe <[email protected]> Co-authored-by: Mohit Deopujari <[email protected]> Co-authored-by: Iman Gohari <[email protected]> Co-authored-by: XinyuYe-Intel <[email protected]> Co-authored-by: Vivek Goel <[email protected]> Co-authored-by: Akihiro Takahashi <[email protected]> Co-authored-by: Miroslav Goncharenko <[email protected]> Co-authored-by: Wang, Mengni <[email protected]> Co-authored-by: Daniel Socek <[email protected]> Co-authored-by: Vidya Galli <[email protected]> Co-authored-by: deepak-gowda-narayana <[email protected]> Co-authored-by: Supreet Singh <[email protected]> Co-authored-by: kaixuanliu <[email protected]> Co-authored-by: ANSHUMAN TRIPATHY <[email protected]> Co-authored-by: sushil dubey <[email protected]> Co-authored-by: Luca Calabria <[email protected]> Co-authored-by: billishyahao <[email protected]> Co-authored-by: xinhe3 <[email protected]> Co-authored-by: KP (Edwin) Lau <[email protected]> Co-authored-by: Marcin Łapiński <[email protected]> Co-authored-by: Urszula Golowicz <[email protected]> Co-authored-by: Greg Serochi <[email protected]> Co-authored-by: Seethong Vang <[email protected]> Co-authored-by: Anastasia Uvarova <[email protected]> Co-authored-by: Mohit Deopujari <[email protected]> Co-authored-by: Chen Levkovich <[email protected]> Co-authored-by: Libin Tang <[email protected]> Co-authored-by: ranzhejiang <[email protected]> Co-authored-by: Baochen Yang <[email protected]> Co-authored-by: Huijuan Zhou <[email protected]> Co-authored-by: Sergey Plotnikov <[email protected]> Co-authored-by: Deepak Narayana <[email protected]> Co-authored-by: Witold Szczurek <[email protected]> Co-authored-by: Wei Lin <[email protected]> Co-authored-by: lkk <[email protected]> Co-authored-by: Chaojun Zhang <[email protected]> Co-authored-by: Daniel Huang <[email protected]> Co-authored-by: Yuan Wu <[email protected]> Co-authored-by: Xiang, Haihao <[email protected]> Co-authored-by: Jianqian Zhou <[email protected]> Co-authored-by: Wei Lin <[email protected]> Co-authored-by: Thanaji Rao Thakkalapelli <[email protected]> Co-authored-by: Yao Matrix <[email protected]> Co-authored-by: yan tomsinsky <[email protected]> Co-authored-by: Eran Geva <[email protected]> Co-authored-by: Alexey Belyakov <[email protected]> Co-authored-by: Bhargav <[email protected]> Co-authored-by: Krzysztof Wiśniewski <[email protected]> Co-authored-by: Abhilash Majumder <[email protected]> Co-authored-by: FocusLuo <[email protected]> Co-authored-by: Yixiu Chen <[email protected]> Co-authored-by: Nariman Piroozan <[email protected]> Co-authored-by: Edward Mascarenhas <[email protected]> Co-authored-by: Shiv Kaul <[email protected]> Co-authored-by: bmengke <[email protected]> Co-authored-by: Leo Zhao <[email protected]> Co-authored-by: Mohit Sinha <[email protected]> Co-authored-by: Harshvardhan Chauhan <[email protected]> Co-authored-by: Gustavo Malkomes <[email protected]> Co-authored-by: Linoy Buchnik <[email protected]> Co-authored-by: Alexey Fadeev <[email protected]> Co-authored-by: leopck <[email protected]>

lkk12014402 requested a review from regisss as a code owner December 19, 2024 07:01

refactor mixtral moe block.

cd6c249

fix typo

cebb9d8

libinta added the run-test Run CI for PRs from external contributors label Dec 19, 2024

ssarkar2 approved these changes Dec 20, 2024

View reviewed changes

regisss approved these changes Dec 20, 2024

View reviewed changes

regisss merged commit c8abbca into huggingface:v1.15-release Dec 20, 2024

12010486 added a commit to 12010486/optimum-habana that referenced this pull request Dec 20, 2024

Based on refactor mixtral moe block. huggingface#1635

90806f9

regisss pushed a commit that referenced this pull request Dec 23, 2024

Refactor mixtral moe block. (#1635)

f91946f

zzhang37 pushed a commit to zzhang37/optimum-habana that referenced this pull request Jan 7, 2025

Refactor mixtral moe block. (huggingface#1635)

0cd1aa7

huijuanzh pushed a commit to huijuanzh/optimum-habana that referenced this pull request Jan 7, 2025

Refactor mixtral moe block. (huggingface#1635)

e501f78

Liangyx2 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jan 20, 2025

Refactor mixtral moe block. (huggingface#1635)

71787b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor mixtral moe block.#1635

refactor mixtral moe block.#1635
regisss merged 2 commits intohuggingface:v1.15-releasefrom
lkk12014402:update_mixtral_moe_block

lkk12014402 commented Dec 19, 2024

Uh oh!

lkk12014402 commented Dec 19, 2024

Uh oh!

lkk12014402 commented Dec 19, 2024

Uh oh!

lkk12014402 commented Dec 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lkk12014402 commented Dec 19, 2024

What does this PR do?

Uh oh!

lkk12014402 commented Dec 19, 2024

Uh oh!

lkk12014402 commented Dec 19, 2024

training

Uh oh!

lkk12014402 commented Dec 19, 2024

inference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants