Massive_bull_worm t1_iz03ol6 wrote on December 5, 2022 at 2:11 PM

Why would I care about security when using pickle?

RoadsideCookie t1_iz06pow wrote on December 5, 2022 at 2:35 PM

Because pickle is so easy to attack. It's a format that can be deserialized to pretty much any Python code, and Python can do pretty much anything, so if you unpickle a compromised payload, it's free game.

MustachedLobster t1_iz09bzl wrote on December 5, 2022 at 2:55 PM

Because some people make processed data/pretrained models available online as pickle files.

It'd be nice to be able to open them without having to worry about bad actors nuking my home directory.

acamara t1_iz0ziae wrote on December 5, 2022 at 5:52 PM

Pickle objects can be (almost) anything. Including arbitrary code. Now, imagine a bad actor claiming to be publishing a SOTA Random Forests model. However, embedded in their .pkl file is a statement like import shutils; shutils.rmtree(‘./’);.

Pickle will happily execute this code. There is nothing checking whether or not the pickle file is safe or not.

P.S. of course the syntax is not that simple, but I hope you get it (and I’m on mobile, yada yada…)

unofficialmerve OP t1_iz12non wrote on December 5, 2022 at 6:12 PM

this is a good explanation 🤌🏼💜

acamara t1_iz136an wrote on December 5, 2022 at 6:16 PM

Thanks Merve! (Btw, love your HF notebooks. 😀)

WERE_CAT t1_iz0l6h7 wrote on December 5, 2022 at 4:19 PM

I have found it to be about dependance more than security. Dépendance created by pickle are rather strict. I have found it to render significant part of my work unusable when upgrading python version.

unofficialmerve OP t1_iz10ekx wrote on December 5, 2022 at 5:58 PM

It can execute arbitrary code as others said. Other ML frameworks (TF/Keras, PyTorch) are also researching alternative solutions to this at the moment. you should never deserialize a pickle on your local unless it's made by you. pickle is made for python in general, not specifically for machine learning. this format is used to serialize sklearn models/pipelines avoiding pickle.