RWKV-v4 Web Demo

See the Github repo for more details about this demo. The inference speed numbers are just for my laptop using Chrome - consider them as relative numbers at most, since performance obviously varies by device. Note that opening the browser console/DevTools currently slows down inference, even after you close it.


Choose a model:
Name Params File size Inference speed Notes
rwkv-4-pile-169m.onnx 169m 679 MB ~32 tokens/sec -
rwkv-4-pile-169m-uint8.onnx 169m 171 MB ~12 tokens/sec uint8 quantized - smaller but slower
rwkv-4-pile-169m-webgl.onnx 169m 680 MB ~16 tokens/sec webgl-compatible
rwkv-4-pile-430m.onnx 430m 1.73 GB ? "RuntimeError: Aborted()" - too big to init
rwkv-4-pile-430m.with_runtime_opt.ort 430m 1.73 GB ~12 tokens/sec ORT format
rwkv-4-pile-430m-uint8.onnx 430m 434 MB ~4 tokens/sec uint8 quantized - smaller but slower
rwkv-4-pile-430m-webgl.onnx 430m 1.73 GB ~10 tokens/sec webgl-compatible
You can also load a custom model using a Hugging Face model URL. The URL must be a direct link to the model file. I.e. it must contain /resolve/, not /blob/ and must end in .onnx or .ort
Num Layers:
Embed Dim:
Backend: