You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Kevin Gibbons a68a1e7ed0
metal : log more info on error (#6987)
1 day ago
.devops build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964) 2 days ago
.github build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964) 2 days ago
ci ggml : add Flash Attention (#5021) 1 day ago
cmake cmake : MSVC instruction detection (fixed up #809) (#3923) 5 months ago
common ggml : add Flash Attention (#5021) 1 day ago
docs eval-callback: Example how to use eval callback for debugging (#6576) 2 weeks ago
examples ggml : add Flash Attention (#5021) 1 day ago
ggml-cuda ggml : add Flash Attention (#5021) 1 day ago
gguf-py llama : fix BPE pre-tokenization (#6920) 2 days ago
grammars JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555) 2 weeks ago
kompute @ 4565194ed7 Nomic Vulkan backend (#4456) 3 months ago
kompute-shaders Nomic Vulkan backend (#4456) 3 months ago
media README: add graphic for matrix multiplication (#6881) 1 week ago
models llama : fix BPE pre-tokenization (#6920) 2 days ago
pocs ggml : add mmla kernels for quantized GEMM (#4966) 2 months ago
prompts llama : add Qwen support (#4281) 5 months ago
requirements llama : fix BPE pre-tokenization (#6920) 2 days ago
scripts llama : fix BPE pre-tokenization (#6920) 2 days ago
spm-headers swift : package no longer use ggml dependency (#5465) 2 months ago
tests ggml : add Flash Attention (#5021) 1 day ago
.clang-tidy cuda : refactor into multiple files (#6269) 1 month ago
.dockerignore docker : ignore Git files (#3314) 7 months ago
.ecrc Nomic Vulkan backend (#4456) 3 months ago
.editorconfig llama.swiftui : add bench functionality (#4483) 4 months ago
.flake8 Add support for BERT embedding models (#5423) 2 months ago
.gitignore Improve usability of --model-url & related flags (#6930) 2 days ago
.gitmodules Nomic Vulkan backend (#4456) 3 months ago
.pre-commit-config.yaml hooks : setting up flake8 and pre-commit hooks (#1681) 10 months ago
AUTHORS license : update copyright notice + add AUTHORS (#6405) 3 weeks ago
CMakeLists.txt cmake : restore LLAMA_LLAMAFILE_DEFAULT 6 days ago
LICENSE license : update copyright notice + add AUTHORS (#6405) 3 weeks ago
Makefile llama : fix BPE pre-tokenization (#6920) 2 days ago
Package.swift ggml : add llamafile sgemm (#6414) 2 weeks ago
README-sycl.md build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964) 2 days ago
README.md build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964) 2 days ago
SECURITY.md chore: Fix markdown warnings (#6625) 2 weeks ago
build.zig `build`: generate hex dump of server assets during build (#6661) 1 week ago
codecov.yml cov : disable comment in PRs (#2989) 8 months ago
convert-hf-to-gguf-update.py convert : use utf8 encoding (#7000) 1 day ago
convert-hf-to-gguf.py convert : use utf8 encoding (#7000) 1 day ago
convert-llama-ggml-to-gguf.py llama : fix BPE pre-tokenization (#6920) 2 days ago
convert-lora-to-ggml.py add safetensors support to convert-lora-to-ggml.py (#5062) 3 months ago
convert-persimmon-to-gguf.py llama : fix BPE pre-tokenization (#6920) 2 days ago
convert.py llama : support Llama 3 HF conversion (#6745) 1 week ago
flake.lock flake.lock: Update 3 days ago
flake.nix nix: .#windows: proper cross-compilation set-up 1 month ago
ggml-alloc.c ggml : fix calloc argument ordering. (#6820) 1 week ago
ggml-alloc.h llama : add pipeline parallelism support (#6017) 1 month ago
ggml-backend-impl.h backend : offload large batches to GPU (#6083) 1 month ago
ggml-backend.c Reset schedule earlier to allow overlap with ggml graph computation on device (#6933) 5 days ago
ggml-backend.h backend : fix typo in scheduler documentation (ggml/781) 3 weeks ago
ggml-common.h [SYCL] Disable iqx on windows as WA (#6435) 4 weeks ago
ggml-cuda.cu ggml : add Flash Attention (#5021) 1 day ago
ggml-cuda.h backend : offload large batches to GPU (#6083) 1 month ago
ggml-impl.h ggml : fix __MSC_VER -> _MSC_VER (#6977) 2 days ago
ggml-kompute.cpp ggml : add Flash Attention (#5021) 1 day ago
ggml-kompute.h Nomic Vulkan backend (#4456) 3 months ago
ggml-metal.h metal : add debug capture backend function (ggml/694) 3 months ago
ggml-metal.m metal : log more info on error (#6987) 1 day ago
ggml-metal.metal ggml : add Flash Attention (#5021) 1 day ago
ggml-mpi.c ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) 9 months ago
ggml-mpi.h mpi : add support for distributed inference via MPI (#2099) 9 months ago
ggml-opencl.cpp llama : greatly reduce output buffer memory usage (#6122) 1 month ago
ggml-opencl.h Add OpenCL add kernel (#5151) 3 months ago
ggml-quants.c add basic tensor data validation function (#6884) 5 days ago
ggml-quants.h llama : add Command R Plus support (#6491) 3 weeks ago
ggml-sycl.cpp ggml : add Flash Attention (#5021) 1 day ago
ggml-sycl.h [SYCL] offload op (#6217) 1 month ago
ggml-vulkan-shaders.hpp Vulkan k-quant mmq and ggml-backend offload functionality (#6155) 1 month ago
ggml-vulkan.cpp ggml : add Flash Attention (#5021) 1 day ago
ggml-vulkan.h Vulkan k-quant mmq and ggml-backend offload functionality (#6155) 1 month ago
ggml.c ggml : add Flash Attention (#5021) 1 day ago
ggml.h ggml : add Flash Attention (#5021) 1 day ago
ggml_vk_generate_shaders.py Vulkan k-quant mmq and ggml-backend offload functionality (#6155) 1 month ago
llama.cpp ggml : add Flash Attention (#5021) 1 day ago
llama.h ggml : add Flash Attention (#5021) 1 day ago
mypy.ini convert : partially revert PR #4818 (#5041) 3 months ago
requirements.txt llama : fix BPE pre-tokenization (#6920) 2 days ago
sgemm.cpp llamafile : use 64-bit integers in sgemm (#6928) 5 days ago
sgemm.h llamafile : use 64-bit integers in sgemm (#6928) 5 days ago
unicode-data.cpp llama : fix BPE pre-tokenization (#6920) 2 days ago
unicode-data.h llama : fix BPE pre-tokenization (#6920) 2 days ago
unicode.cpp llama : fix BPE pre-tokenization (#6920) 2 days ago
unicode.h llama : fix BPE pre-tokenization (#6920) 2 days ago