thomas-yanxin

thomas-yanxin synced commits to develop at thomas-yanxin/PaddleMIX from mirror

8b896d5338 ppdiffusers 0.24.0 release (#507)
487d493cb3 [ppdiffusers] AnimateAnyone Training Support (#501) Co-authored-by: xuzhang <westfish@126.com>
Compare 2 commits »

23 hours ago

thomas-yanxin synced commits to update-tokenizers-version at thomas-yanxin/transformers from mirror

d5cb129e3d fixup
d5a20b8e94 Accounting for the breaking change.
ea98377415 oups
cfccbcc513 "tokenizers": "tokenizers>=0.19,<0.20",
884a51f516 last two
Compare 11 commits »

1 day ago

thomas-yanxin synced commits to mi300-ci at thomas-yanxin/transformers from mirror

91b1769b81 Update tests/trainer/test_trainer_seq2seq.py
781dfcf476 Update tests/trainer/test_trainer_seq2seq.py
0ff348e2df Update tests/trainer/test_trainer_seq2seq.py
a89331f70d Update tests/trainer/test_trainer_seq2seq.py
8f530869ae style
Compare 7 commits »

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/transformers from mirror

df96438484 Fix missing `prev_ci_results` (#30313) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
ce8e64fbe2 Dev version
5728b5ad00 FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time (#30317) * Update awq.py * style * revert felix PR * fix * add felix comments
005b957fb8 Add DBRX Model (#29921) * wip * fix __init__.py * add docs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments 1 * work on make fixup * pass configs down * add sdpa attention * remove DbrxBlock * add to configuration_auto * docstring now passes formatting test * fix style * update READMEs * add dbrx to modeling_auto * make fix-copies generated this * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * config docstring passes formatting test * rename moe_loss_weight to router_aux_loss_coef * add to flash-attn documentation * fix model-path in tests * Explicitly make `"suli"` the default `ffn_act_fn` Co-authored-by: Wing Lian <wing.lian@gmail.com> * default to using router_aux_loss_coef over ffn_config[moe_loss_weight] * fix _flash_attn_uses_top_left_mask and is_causal * fix tests path * don't use token type IDs * follow Llama and remove token_type_ids from test * init ConfigTester differently so tests pass * remove multiple choice test * remove question + answer test * remove sequence classification test * remove token classification test * copy Llama tests and remove token_type_ids from test inputs * do not test pruning or headmasking; style code * add _tied_weights_keys parameter to pass test * add type hints * fix type check * update config tester * remove masked_lm test * remove encoder tests * initialize DbrxModelTester with correct params * style * torch_dtype does not rely on torch * run make fixup, fix-copies * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py * add copyright info * fix imports and DbrxRotaryEmbedding * update DbrxModel docstring * use copies * change model path in docstring * use config in DbrxFFN * fix flashattention2, sdpaattention * input config to DbrXAttention, DbrxNormAttentionNorm * more fixes * fix * fix again! * add informative comment * fix ruff? * remove print statement + style * change doc-test * fix doc-test * fix docstring * delete commented out text * make defaults match dbrx-instruct * replace `router_aux_loss_coef` with `moe_loss_weight` * is_decoder=True * remove is_decoder from configtester * implement sdpa properly * make is_decoder pass tests * start on the GenerationTesterMixin tests * add dbrx to sdpa documentation * skip weight typing test * style * initialize smaller model Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Add DBRX to toctree * skip test_new_cache_format * make config defaults smaller again * add pad_token_id * remove pad_token_id from config * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * Update src/transformers/models/dbrx/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix typo * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs, fix configuration_auto.py * address pr comments * remove is_decoder flag * slice * fix requires grad * remove grad * disconnect differently * remove grad * enable grads * patch * detach expert * nissan al ghaib * Update modeling_dbrx.py * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * replace "Gemma" with "Dbrx" * remove # type: ignore * don't hardcode vocab_size * remove ToDo * Re-add removed idefics2 line * Update test to use tiny-random! * Remove TODO * Remove one more case of loading the entire dbrx-instruct in the tests * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address some comments * small model * add dbrx to tokenization_auto * More docstrings with add_start_docstrings * Dbrx for now * add PipelineTesterMixin * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove flash-attn2 import error * fix docstring Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add useage example * put on one line Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix ffn_act_fn Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change "dbrx" to "DBRX" for display purposes. * fix __init__.py? * fix __init__.py * fix README * return the aux_loss * remove extra spaces * fix configuration_auto.py * fix format in tokenization_auto * remove new line * add more useage examples --------- Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Eitan Turok <eitan.turok@databricks.com> Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Eitan Turok <eitanturok@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
63c5e27efb Do not drop mask with SDPA for more cases (#30311) * overlooked * style * cleaner
Compare 26 commits »

1 day ago

thomas-yanxin synced commits to master at thomas-yanxin/lightning from mirror

c235f20e71 Remove the requirement for FSDPStrategy subclasses to only support GPU (#19781)

1 day ago

thomas-yanxin synced commits to dev at thomas-yanxin/BMTrain from mirror

5dad728f5f add grad scale for optim_manager
6670f5c889 Merge pull request #188 from OpenBMB/dev Update workflow config
5fde9e009d Merge pull request #187 from OpenBMB/dev Update Release document
dd2b5bc59d Merge pull request #182 from OpenBMB/dev BMTrain New Version Release v1.0.0
584359041a Pull request template (#158)
Compare 5 commits »

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/lit-parrot from mirror

e1df1b4cce Update README.md
d68d4271cc proposed LitGPT logo update (#1320)
c351717d70 Add deploy studio link (#1317)
21e9a0de85 Rewrite finetune command if subcommand is not provided (#1313)
c67de02705 Litgpt serve API (#1299) Co-authored-by: Luca Antiga <luca@lightning.ai>
Compare 12 commits »

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/peft from mirror

144b7345c2 ENH Support safetensor in multitask_prompt_tuning (#1662) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
bdb856786e MNT Remove dreambooth git submodule (#1660) Leftover that was not removed in BOFT PR.
ed865e2812 FIX Bug with handling of active adapters (#1659) There was a bug for some models like IA3, LoHa, etc., where calling set_adapter would not correctly update the active_adapter. This is now fixed. Note that this is not about the active_adapter attribute on PeftModel or layers, which are handled separately. This PR also ensures that LoraModel, IA3Model, etc. consistently use self.active_adapters, not self.active_adapter. The latter should be treated more like a private attribute (but this isn't changed for backwards compatibility).
Compare 3 commits »

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/bitsandbytes from mirror

ffd7d0db6a (docs) integrations: fix omission in bf16 related warning (#1183) * (docs) integrations: fix omission in bf16 related warning * (docs) integrations: further clarifications to prior fix * (docs) integrations: fix punctuation Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * (docs) integrations: fix omitted code formatting --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
6cecb65a56 Update pandas requirement from ~=2.2.1 to ~=2.2.2 in the major group (#1182) Updates the requirements on [pandas](https://github.com/pandas-dev/pandas) to permit the latest version. Updates `pandas` to 2.2.2 - [Release notes](https://github.com/pandas-dev/pandas/releases) - [Commits](https://github.com/pandas-dev/pandas/compare/v2.2.1...v2.2.2) --- updated-dependencies: - dependency-name: pandas dependency-type: direct:development dependency-group: major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Compare 2 commits »

1 day ago

thomas-yanxin synced commits to docs-bf16-warning-fix at thomas-yanxin/bitsandbytes from mirror

1ea5f203bd (docs) integrations: fix omitted code formatting
7eb44a93d4 (docs) integrations: fix punctuation Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Compare 2 commits »

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/DB-GPT from mirror

00af9fed35 fix(awel): Fix awel check for empty DataFrame data bug (#1430)
8625690107 feat(model): Support CodeQwen1.5-7B-Chat (#1431)
Compare 2 commits »

1 day ago

thomas-yanxin synced commits to master at thomas-yanxin/Langchain-Chatchat from mirror

2c7feae7bb Update README.md (#3804)
52f2a2a046 Update requirements.txt
9bf0612e3d Feat/action update readme (#3779)
Compare 3 commits »

1 day ago

thomas-yanxin synced commits to release/2.7 at thomas-yanxin/PaddleNLP from mirror

904c1fb812 add checkpoint_done to last model (#8223)

1 day ago

thomas-yanxin synced commits to refactor-training-loop at thomas-yanxin/PaddleNLP from mirror

fc0892b901 revert fix-sync 这个分支居然没有任何拦截？
eaaccdd6cd fix-sync
Compare 2 commits »

1 day ago

thomas-yanxin synced commits to develop at thomas-yanxin/PaddleNLP from mirror

3bb4bb751e add a100 test ground truth (#8249) * add a100 test ground truth * add requirements * cache is_a100 result * update * update * update sp allclose * fix check_result * add ground truth for llm_gpt_dygraph
909ff315d5 Add p2p_comm_overlap for Llama-2-70b benchmark. (#8276)
beb433a9ae [LLM] add memory stats to logger of trainer (#8269)
Compare 3 commits »

1 day ago

thomas-yanxin synced commits to release/2.0 at thomas-yanxin/swift from mirror

d1376a6ed2 bump version
5347814d4e Fix loss scale (#720) (cherry picked from commit 87d24cba18125c2ee0677121fc21bdfe51b4acdc)
5cbaf3d889 Merge commit '3fecc8cfa2d0181589d711aff3da5b6904c291ac' into release/2.0 * commit '3fecc8cfa2d0181589d711aff3da5b6904c291ac': support Codeqwen-7b-chat model (#718) Fix bugs (#714) Fix many bug (#716) fix (#711) [doc] Update index.md (#709) support Llava-v1.6-34b model (#708) Support mPLUG-Owl2 (#706) fix minicpm-v-v2 bug (#703) fix readme (#704) Drop data by gradient_accumulation_steps (#626) Fix stream 0415 (#702) feat(model): support minicpm-v-2 (#699) bump version # Conflicts: # docs/source/Multi-Modal/minicpm-v-2最佳实践.md # swift/llm/utils/template.py # swift/version.py
3fecc8cfa2 support Codeqwen-7b-chat model (#718)
fde8927024 Fix bugs (#714)
Compare 16 commits »

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/swift from mirror

94349d3e1c Fix custom dataset (#736)
5b02afe68c fix zsh install (#735)
ee70d5c224 Submit lossing files (#727)
67a6cf18a6 update link error (#733)
7851713510 [Feat] Compatible with Hugging Face & update more models (#712)
Compare 9 commits »

1 day ago

thomas-yanxin synced commits to nightly at thomas-yanxin/unsloth from mirror

70f9fc6781 Update tokenizer_utils.py
12fff19e66 Update tokenizer_utils.py
5a493d7ee1 Update llama.py
9b9a5e9f30 Update llama.py
f704c9e091 Update llama.py
Compare 13 commits »

1 day ago

thomas-yanxin synced commits to main at thomas-yanxin/data-juicer from mirror

1647e3df64 Reformat API doc and generate docs automatically (#299)
1d94a67f86 set availablity checking from warning to error (#297)
33f72b19ec fix Bug: KeyError: 'text' Corresponding to issue #296 (#300) * fix Bug: KeyError: 'text' File data_juice/config/config.py lines 418-429 did not consider the situation when arg: text_key was initialized to 'text', resulting in arg: text_key not being updated properly and always being initialized to the value of 'text' * Fix Bug: key_text do not update correctly * Update config.py Normalize Format
Compare 3 commits »

1 day ago

thomas-yanxin synced commits to gh-pages at thomas-yanxin/data-juicer from mirror

317799b7f0 deploy: 1647e3df64b70753f913742b870cc82183443a32

1 day ago