Submitted by unofficialmerve t3_zd3n8s in MachineLearning
Massive_bull_worm t1_iz03ol6 wrote
Why would I care about security when using pickle?
RoadsideCookie t1_iz06pow wrote
Because pickle is so easy to attack. It's a format that can be deserialized to pretty much any Python code, and Python can do pretty much anything, so if you unpickle a compromised payload, it's free game.
MustachedLobster t1_iz09bzl wrote
Because some people make processed data/pretrained models available online as pickle files.
It'd be nice to be able to open them without having to worry about bad actors nuking my home directory.
acamara t1_iz0ziae wrote
Pickle objects can be (almost) anything. Including arbitrary code.
Now, imagine a bad actor claiming to be publishing a SOTA Random Forests model. However, embedded in their .pkl file is a statement like import shutils; shutils.rmtree(‘./’);
.
Pickle will happily execute this code. There is nothing checking whether or not the pickle file is safe or not.
P.S. of course the syntax is not that simple, but I hope you get it (and I’m on mobile, yada yada…)
unofficialmerve OP t1_iz12non wrote
this is a good explanation 🤌🏼💜
acamara t1_iz136an wrote
Thanks Merve! (Btw, love your HF notebooks. 😀)
WERE_CAT t1_iz0l6h7 wrote
I have found it to be about dependance more than security. Dépendance created by pickle are rather strict. I have found it to render significant part of my work unusable when upgrading python version.
unofficialmerve OP t1_iz10ekx wrote
It can execute arbitrary code as others said. Other ML frameworks (TF/Keras, PyTorch) are also researching alternative solutions to this at the moment. you should never deserialize a pickle on your local unless it's made by you. pickle is made for python in general, not specifically for machine learning. this format is used to serialize sklearn models/pipelines avoiding pickle.
Viewing a single comment thread. View all comments