See the Github repo for more details about this demo. The inference speed numbers are just for my laptop using Chrome - consider them as relative numbers at most, since performance obviously varies by device. Note that opening the browser console/DevTools currently slows down inference, even after you close it.
Name | Params | File size | Inference speed | Notes | ||
rwkv-4-pile-169m.onnx | 169m | 679 MB | ~32 tokens/sec | - | ||
rwkv-4-pile-169m-uint8.onnx | 169m | 171 MB | ~12 tokens/sec | uint8 quantized - smaller but slower | ||
rwkv-4-pile-169m-webgl.onnx | 169m | 680 MB | ~16 tokens/sec | webgl-compatible | ||
rwkv-4-pile-430m.onnx | 430m | 1.73 GB | ? | "RuntimeError: Aborted()" - too big to init | ||
rwkv-4-pile-430m.with_runtime_opt.ort | 430m | 1.73 GB | ~12 tokens/sec | ORT format | ||
rwkv-4-pile-430m-uint8.onnx | 430m | 434 MB | ~4 tokens/sec | uint8 quantized - smaller but slower | ||
rwkv-4-pile-430m-webgl.onnx | 430m | 1.73 GB | ~10 tokens/sec | webgl-compatible |