Viewing a single comment thread. View all comments

Massive_bull_worm t1_iz03ol6 wrote

Why would I care about security when using pickle?

15

RoadsideCookie t1_iz06pow wrote

Because pickle is so easy to attack. It's a format that can be deserialized to pretty much any Python code, and Python can do pretty much anything, so if you unpickle a compromised payload, it's free game.

32

MustachedLobster t1_iz09bzl wrote

Because some people make processed data/pretrained models available online as pickle files.

It'd be nice to be able to open them without having to worry about bad actors nuking my home directory.

22

acamara t1_iz0ziae wrote

Pickle objects can be (almost) anything. Including arbitrary code. Now, imagine a bad actor claiming to be publishing a SOTA Random Forests model. However, embedded in their .pkl file is a statement like import shutils; shutils.rmtree(‘./’);.

Pickle will happily execute this code. There is nothing checking whether or not the pickle file is safe or not.

P.S. of course the syntax is not that simple, but I hope you get it (and I’m on mobile, yada yada…)

21

WERE_CAT t1_iz0l6h7 wrote

I have found it to be about dependance more than security. Dépendance created by pickle are rather strict. I have found it to render significant part of my work unusable when upgrading python version.

4

unofficialmerve OP t1_iz10ekx wrote

It can execute arbitrary code as others said. Other ML frameworks (TF/Keras, PyTorch) are also researching alternative solutions to this at the moment. you should never deserialize a pickle on your local unless it's made by you. pickle is made for python in general, not specifically for machine learning. this format is used to serialize sklearn models/pipelines avoiding pickle.

3