unofficialmerve OP t1_iz1nurr wrote on December 5, 2022 at 8:28 PM

Reply to comment by link0007 in [P] Save your sklearn models securely using skops by unofficialmerve

I'm not sure but I was told a lot of times that ONNX support for sklearn was sub-par. I haven't researched that one yet. I can ask to maintainers if you're interested.

unofficialmerve OP t1_iz1jpn9 wrote on December 5, 2022 at 8:02 PM

Reply to comment by -bb_ in [P] Save your sklearn models securely using skops by unofficialmerve

no worries, I hope you'll find it useful! 🙂

unofficialmerve OP t1_iz1iybe wrote on December 5, 2022 at 7:57 PM

Reply to comment by ReginaldIII in [P] Save your sklearn models securely using skops by unofficialmerve

h5 and SavedModel of TF are the safest options, yet you can still inject code through Lambda layers or subclassed models (that's why Keras developed a new format too!) AFAIK. What SavedModel does is that it reconstructs the architecture and loads weights into it, and this architecture part is essentially code (loading the weights is never the problem for any framework anyway, it's the code part!). so again, you shouldn't deserialize it. (safest code is no code) if you can see the architecture and confirm that it doesn't have any custom layers, you should be fine. (this is also essentially what we do with skops (we audit the model) (or reconstruct it yourself and load weights into it but it's a little tricky, you might have custom objects or e.g. preprocessing layers for keras)

>The architecture of subclassed models and layers are defined in the methods __init__ and call. They are considered Python bytecode, which cannot be serialized into a JSON-compatible config -- you could try serializing the bytecode (e.g. via pickle), but it's completely unsafe and means your model cannot be loaded on a different system. (in model subclassing guide)
>
>WARNING: tf.keras.layers.Lambda layers have (de)serialization limitations! (in lambda layers guide)

Hugging Face also introduced a new format called safetensors if you're interested: https://github.com/huggingface/safetensors in README there's a detailed explanation & comparison.

unofficialmerve OP t1_iz19n6x wrote on December 5, 2022 at 6:57 PM

Reply to comment by -bb_ in [P] Save your sklearn models securely using skops by unofficialmerve

it uses pickle's way of serialization under the hood. the difference between pickle and joblib is that one is performing better with numpy objects AFAIK. you shouldn't deserialize any joblib file on your local.

see one of the sklearn core developers' neat response on difference between them: https://stackoverflow.com/a/12617603

unofficialmerve OP t1_iz12non wrote on December 5, 2022 at 6:12 PM

Reply to comment by acamara in [P] Save your sklearn models securely using skops by unofficialmerve

this is a good explanation 🤌🏼💜

unofficialmerve OP t1_iz12b00 wrote on December 5, 2022 at 6:10 PM

Reply to comment by link0007 in [P] Save your sklearn models securely using skops by unofficialmerve

I think it's because this was raised recently and people really didn't know! since it was raised by Yannic Kilcher couple of months ago, François Chollet has announced a format specific to Keras models, Pytorch folks are cooking something for this too. hopefully this will be solved 🙂

unofficialmerve OP t1_iz10ekx wrote on December 5, 2022 at 5:58 PM

Reply to comment by Massive_bull_worm in [P] Save your sklearn models securely using skops by unofficialmerve

It can execute arbitrary code as others said. Other ML frameworks (TF/Keras, PyTorch) are also researching alternative solutions to this at the moment. you should never deserialize a pickle on your local unless it's made by you. pickle is made for python in general, not specifically for machine learning. this format is used to serialize sklearn models/pipelines avoiding pickle.

unofficialmerve t1_isvspif wrote on October 19, 2022 at 1:39 AM

Reply to [D] How frustrating are the ML interviews these days!!! TOP 3% interview joke by Mogady

you dodged a bullet!