RWKV-v4 Web Demo

See the Github repo for more details about this demo. The inference speed numbers are just for my laptop using Chrome - consider them as relative numbers at most, since performance obviously varies by device. Note that opening the browser console/DevTools currently slows down inference, even after you close it.

Choose a model:

Name	Params	File size	Inference speed	Notes
rwkv-4-pile-169m.onnx	169m	679 MB	~32 tokens/sec	-
rwkv-4-pile-169m-uint8.onnx	169m	171 MB	~12 tokens/sec	uint8 quantized - smaller but slower
rwkv-4-pile-169m-webgl.onnx	169m	680 MB	~16 tokens/sec	webgl-compatible
rwkv-4-pile-430m.onnx	430m	1.73 GB	?	"RuntimeError: Aborted()" - too big to init
rwkv-4-pile-430m.with_runtime_opt.ort	430m	1.73 GB	~12 tokens/sec	ORT format
rwkv-4-pile-430m-uint8.onnx	430m	434 MB	~4 tokens/sec	uint8 quantized - smaller but slower
rwkv-4-pile-430m-webgl.onnx	430m	1.73 GB	~10 tokens/sec	webgl-compatible

You can also load a custom model using a Hugging Face model URL. The URL must be a direct link to the model file. I.e. it must contain /resolve/, not /blob/ and must end in .onnx or .ort

Num Layers:

Embed Dim:

Backend: