Xin Yan thomas-yanxin
Loading Heatmap…

thomas-yanxin synced commits to develop at thomas-yanxin/PaddleMIX from mirror

23 hours ago

thomas-yanxin synced commits to update-tokenizers-version at thomas-yanxin/transformers from mirror

1 day ago

thomas-yanxin synced commits to mi300-ci at thomas-yanxin/transformers from mirror

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/transformers from mirror

  • df96438484 Fix missing `prev_ci_results` (#30313) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
  • ce8e64fbe2 Dev version
  • 5728b5ad00 FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time (#30317) * Update awq.py * style * revert felix PR * fix * add felix comments
  • 005b957fb8 Add DBRX Model (#29921) * wip * fix __init__.py * add docs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments 1 * work on make fixup * pass configs down * add sdpa attention * remove DbrxBlock * add to configuration_auto * docstring now passes formatting test * fix style * update READMEs * add dbrx to modeling_auto * make fix-copies generated this * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * config docstring passes formatting test * rename moe_loss_weight to router_aux_loss_coef * add to flash-attn documentation * fix model-path in tests * Explicitly make `"suli"` the default `ffn_act_fn` Co-authored-by: Wing Lian <wing.lian@gmail.com> * default to using router_aux_loss_coef over ffn_config[moe_loss_weight] * fix _flash_attn_uses_top_left_mask and is_causal * fix tests path * don't use token type IDs * follow Llama and remove token_type_ids from test * init ConfigTester differently so tests pass * remove multiple choice test * remove question + answer test * remove sequence classification test * remove token classification test * copy Llama tests and remove token_type_ids from test inputs * do not test pruning or headmasking; style code * add _tied_weights_keys parameter to pass test * add type hints * fix type check * update config tester * remove masked_lm test * remove encoder tests * initialize DbrxModelTester with correct params * style * torch_dtype does not rely on torch * run make fixup, fix-copies * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py * add copyright info * fix imports and DbrxRotaryEmbedding * update DbrxModel docstring * use copies * change model path in docstring * use config in DbrxFFN * fix flashattention2, sdpaattention * input config to DbrXAttention, DbrxNormAttentionNorm * more fixes * fix * fix again! * add informative comment * fix ruff? * remove print statement + style * change doc-test * fix doc-test * fix docstring * delete commented out text * make defaults match dbrx-instruct * replace `router_aux_loss_coef` with `moe_loss_weight` * is_decoder=True * remove is_decoder from configtester * implement sdpa properly * make is_decoder pass tests * start on the GenerationTesterMixin tests * add dbrx to sdpa documentation * skip weight typing test * style * initialize smaller model Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Add DBRX to toctree * skip test_new_cache_format * make config defaults smaller again * add pad_token_id * remove pad_token_id from config * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * Update src/transformers/models/dbrx/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix typo * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs, fix configuration_auto.py * address pr comments * remove is_decoder flag * slice * fix requires grad * remove grad * disconnect differently * remove grad * enable grads * patch * detach expert * nissan al ghaib * Update modeling_dbrx.py * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * replace "Gemma" with "Dbrx" * remove # type: ignore * don't hardcode vocab_size * remove ToDo * Re-add removed idefics2 line * Update test to use tiny-random! * Remove TODO * Remove one more case of loading the entire dbrx-instruct in the tests * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address some comments * small model * add dbrx to tokenization_auto * More docstrings with add_start_docstrings * Dbrx for now * add PipelineTesterMixin * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove flash-attn2 import error * fix docstring Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add useage example * put on one line Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix ffn_act_fn Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change "dbrx" to "DBRX" for display purposes. * fix __init__.py? * fix __init__.py * fix README * return the aux_loss * remove extra spaces * fix configuration_auto.py * fix format in tokenization_auto * remove new line * add more useage examples --------- Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Eitan Turok <eitan.turok@databricks.com> Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Eitan Turok <eitanturok@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
  • 63c5e27efb Do not drop mask with SDPA for more cases (#30311) * overlooked * style * cleaner
  • Compare 26 commits »

1 day ago

thomas-yanxin synced commits to master at thomas-yanxin/lightning from mirror

  • c235f20e71 Remove the requirement for FSDPStrategy subclasses to only support GPU (#19781)

1 day ago

thomas-yanxin synced commits to dev at thomas-yanxin/BMTrain from mirror

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/lit-parrot from mirror

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/peft from mirror

  • 144b7345c2 ENH Support safetensor in multitask_prompt_tuning (#1662) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
  • bdb856786e MNT Remove dreambooth git submodule (#1660) Leftover that was not removed in BOFT PR.
  • ed865e2812 FIX Bug with handling of active adapters (#1659) There was a bug for some models like IA3, LoHa, etc., where calling set_adapter would not correctly update the active_adapter. This is now fixed. Note that this is not about the active_adapter attribute on PeftModel or layers, which are handled separately. This PR also ensures that LoraModel, IA3Model, etc. consistently use self.active_adapters, not self.active_adapter. The latter should be treated more like a private attribute (but this isn't changed for backwards compatibility).
  • Compare 3 commits »

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/bitsandbytes from mirror

  • ffd7d0db6a (docs) integrations: fix omission in bf16 related warning (#1183) * (docs) integrations: fix omission in bf16 related warning * (docs) integrations: further clarifications to prior fix * (docs) integrations: fix punctuation Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * (docs) integrations: fix omitted code formatting --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
  • 6cecb65a56 Update pandas requirement from ~=2.2.1 to ~=2.2.2 in the major group (#1182) Updates the requirements on [pandas](https://github.com/pandas-dev/pandas) to permit the latest version. Updates `pandas` to 2.2.2 - [Release notes](https://github.com/pandas-dev/pandas/releases) - [Commits](https://github.com/pandas-dev/pandas/compare/v2.2.1...v2.2.2) --- updated-dependencies: - dependency-name: pandas dependency-type: direct:development dependency-group: major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  • Compare 2 commits »

1 day ago

thomas-yanxin synced commits to docs-bf16-warning-fix at thomas-yanxin/bitsandbytes from mirror

  • 1ea5f203bd (docs) integrations: fix omitted code formatting
  • 7eb44a93d4 (docs) integrations: fix punctuation Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
  • Compare 2 commits »

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/DB-GPT from mirror

1 day ago

thomas-yanxin synced commits to master at thomas-yanxin/Langchain-Chatchat from mirror

1 day ago

thomas-yanxin synced commits to release/2.7 at thomas-yanxin/PaddleNLP from mirror

  • 904c1fb812 add checkpoint_done to last model (#8223)

1 day ago

thomas-yanxin synced commits to refactor-training-loop at thomas-yanxin/PaddleNLP from mirror

1 day ago

thomas-yanxin synced commits to develop at thomas-yanxin/PaddleNLP from mirror

  • 3bb4bb751e add a100 test ground truth (#8249) * add a100 test ground truth * add requirements * cache is_a100 result * update * update * update sp allclose * fix check_result * add ground truth for llm_gpt_dygraph
  • 909ff315d5 Add p2p_comm_overlap for Llama-2-70b benchmark. (#8276)
  • beb433a9ae [LLM] add memory stats to logger of trainer (#8269)
  • Compare 3 commits »

1 day ago

thomas-yanxin synced commits to release/2.0 at thomas-yanxin/swift from mirror

  • d1376a6ed2 bump version
  • 5347814d4e Fix loss scale (#720) (cherry picked from commit 87d24cba18125c2ee0677121fc21bdfe51b4acdc)
  • 5cbaf3d889 Merge commit '3fecc8cfa2d0181589d711aff3da5b6904c291ac' into release/2.0 * commit '3fecc8cfa2d0181589d711aff3da5b6904c291ac': support Codeqwen-7b-chat model (#718) Fix bugs (#714) Fix many bug (#716) fix (#711) [doc] Update index.md (#709) support Llava-v1.6-34b model (#708) Support mPLUG-Owl2 (#706) fix minicpm-v-v2 bug (#703) fix readme (#704) Drop data by gradient_accumulation_steps (#626) Fix stream 0415 (#702) feat(model): support minicpm-v-2 (#699) bump version # Conflicts: # docs/source/Multi-Modal/minicpm-v-2最佳实践.md # swift/llm/utils/template.py # swift/version.py
  • 3fecc8cfa2 support Codeqwen-7b-chat model (#718)
  • fde8927024 Fix bugs (#714)
  • Compare 16 commits »

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/swift from mirror

1 day ago

thomas-yanxin synced commits to nightly at thomas-yanxin/unsloth from mirror

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/data-juicer from mirror

  • 1647e3df64 Reformat API doc and generate docs automatically (#299)
  • 1d94a67f86 set availablity checking from warning to error (#297)
  • 33f72b19ec fix Bug: KeyError: 'text' Corresponding to issue #296 (#300) * fix Bug: KeyError: 'text' File data_juice/config/config.py lines 418-429 did not consider the situation when arg: text_key was initialized to 'text', resulting in arg: text_key not being updated properly and always being initialized to the value of 'text' * Fix Bug: key_text do not update correctly * Update config.py Normalize Format
  • Compare 3 commits »

1 day ago

thomas-yanxin synced commits to gh-pages at thomas-yanxin/data-juicer from mirror

  • 317799b7f0 deploy: 1647e3df64b70753f913742b870cc82183443a32

1 day ago