Viewing a single comment thread. View all comments

killver t1_ixi5dns wrote

I actually only tried dynamic quantization by using onnxruntime.quantization.quantize_dynamic - is there anything better?

1

fxmarty OP t1_ixi7sge wrote

Not that I know of (at least in the ONNX ecosystem). I would recommend tuning the available arguments: https://github.com/microsoft/onnxruntime/blob/9168e2573836099b841ab41121a6e91f48f45768/onnxruntime/python/tools/quantization/quantize.py#L414

If you are dealing with a canonical model, feel free to fill an issue as well!

1

killver t1_ixiah49 wrote

Thanks a lot for all these replies. I have one more question if you do not mind: Sometimes I have huggingface models as a backbone in my model definitions, how would I go along to only apply the transformer based quantization on only the backbone? Usually these are called on the full model, but if my full model is already in onnx format it is complicated.

1