sobagood t1_iu6ucfo wrote on October 29, 2022 at 12:15 AM

If you intend to run on CPU, and other intel hardware, OpenVINO is a great choice. They optimised it for their hardware and it is indeed faster than others on their hardware

whata_wonderful_day t1_iu81vzp wrote on October 29, 2022 at 7:41 AM

I tried OpenVINO ~1.5 years back and it didn't match ONNXRuntime on transformers. For CNNs it's the fastest though. I also found OpenVINO to be pretty buggy and not user friendly. I needed to fix their internal transformer conversion script

big_dog_2k OP t1_iu6yc7b wrote on October 29, 2022 at 12:47 AM

Thanks! Does it work with non-intel chipsets and how easy have you found it to use?

sobagood t1_iu6zuhk wrote on October 29, 2022 at 12:59 AM

If you mean nvidia gpu, it has cuda plugin to run it on nvidia gpu but i have never tried. It has several other plugins so you could check it out. It also provides its own deploy server. Nvidia triton also supports openvino runtime without gpu support with an obvious reason. They have similar process like onnx that transform graph to their intermediate representation with ‘model optimizer’ which could go wrong. If you could successfully create this representation, there should be no new bottleneck.

big_dog_2k OP t1_iu7paw3 wrote on October 29, 2022 at 4:51 AM

Thanks. I might need to take a closer look. I was also thinking AMD and arm based cpu. I was surprised at how good the cpu based inference can be for some models these days.

sobagood t1_iu801e4 wrote on October 29, 2022 at 7:14 AM

I dont think they support AMD as they are rival to each other.