C tensor computation library for ML inference and training. Use when working with ggml graphs, GGUF model files, backend scheduling, quantization, or implementing low-level ML ops in C/C++.
From ggmlnpx claudepluginhub datathings/marketplace --plugin ggmlThis skill uses the workspace's default tool permissions.
references/api-activations.mdreferences/api-arithmetic.mdreferences/api-attention.mdreferences/api-backend.mdreferences/api-core.mdreferences/api-gguf.mdreferences/api-optimization.mdreferences/workflows.mdGuides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Generates FastAPI project templates with async routes, dependency injection, Pydantic schemas, repository patterns, middleware, and config for PostgreSQL/MongoDB backends.
ggml is a minimalistic C tensor computation library powering llama.cpp and many other ML inference engines. It provides:
Version: v0.9.7 Language: C (C++ optional) License: MIT Repo: https://github.com/ggml-org/ggml
#include "ggml.h"
#include "ggml-cpu.h"
#include "ggml-backend.h"
int main(void) {
struct ggml_init_params params = {
.mem_size = 64 * 1024 * 1024, // 64 MB scratch buffer
.mem_buffer = NULL,
.no_alloc = false,
};
struct ggml_context * ctx = ggml_init(params);
struct ggml_tensor * a = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4);
struct ggml_tensor * b = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4);
struct ggml_tensor * c = ggml_add(ctx, a, b);
struct ggml_cgraph * gf = ggml_new_graph(ctx);
ggml_build_forward_expand(gf, c);
ggml_backend_t backend = ggml_backend_cpu_init();
ggml_backend_graph_compute(backend, gf);
ggml_backend_free(backend);
ggml_free(ctx);
return 0;
}
ggml_backend_load_all() to discover available hardware| Domain | File | Description |
|---|---|---|
| Context, tensors & graphs | api-core.md | Init, create tensors, graph ops, scalar access, constants |
| Arithmetic & matrix ops | api-arithmetic.md | add/mul/matmul, reductions, loss functions, quantize |
| Activations, norms & shapes | api-activations.md | relu/gelu/silu, RMS norm, reshape/permute/concat, custom ops |
| Attention, convolution & RoPE | api-attention.md | Flash Attention, RoPE variants, 1D/2D/3D conv, pooling, padding |
| Backend, memory & scheduler | api-backend.md | Backends, buffer types, scheduler, gallocr, CPU threadpool, F16 conversions |
| GGUF file format | api-gguf.md | Read/write GGUF v3: KV metadata, tensor layout, serialization |
| Optimization & training | api-optimization.md | Datasets, AdamW/SGD optimizer, epoch loop, ggml_opt_fit |
| Working examples | workflows.md | Quick start, GGUF loading, multi-backend, attention, training, quantize |
See references/workflows.md for complete examples.
Quick reference:
mem_size generously; ggml_init fails silently if too smallne[0] is the innermost (fastest) dimension; for a [rows × cols] matrix use ne0=cols, ne1=rowsggml_backend_graph_compute() to runggml_backend_load_all() at startup; use ggml_backend_init_best() to pick the best available deviceggml_mul_mat supports mixed precision (e.g. Q4_0 weights × F32 activations) nativelyggml_add_inplace overwrites tensor a and avoids an allocation; only safe when a is not used elsewhere in the graphggml_backend_cpu_set_n_threads() or a custom threadpool