学習

大規模言語モデルを自作しよう!(Transformers+DeepSpeed+torch.compile+flash_attn2)

Mixtral 250MのpretrainingからInstruction Tuningまで

https://prod-files-secure.s3.us-west-2.amazonaws.com/9db80869-7992-4b3a-8752-893d590f311d/aae53fd7-d660-44af-a92d-cf52ec2ced2e/LLMを作る_拡張する.pdf

https://prod-files-secure.s3.us-west-2.amazonaws.com/9db80869-7992-4b3a-8752-893d590f311d/072ca1f9-d609-4435-978e-d934c1380ae6/LLM-Whitepaper-Japanese-Final.pdf

https://nlp-colloquium-jp.github.io/schedule/2023-09-20_takuya-akiba/

OSS

標準コードに使われる予定のOSSのわかりやすい資料など募集

Untitled

モデル

BitNet - Qiita

GitHub - kyegomez/BitNet: Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Mixture of Experts: How an Ensemble of AI Models Act as One | Deepgram

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

GitHub - laekov/fastmoe: A fast MoE impl for PyTorch