Submitted by Ok-Teacher-22 t3_znndnb in MachineLearning

https://arxiv.org/pdf/2112.13314.pdf

​

Deep Learning (DL) frameworks are now widely used, simplifying the creation of complex models as well as their integration to various applications even to non DL experts. However, like any other programs, they are prone to bugs. This paper deals with the subcategory of bugs named silent bugs: they lead to wrong behavior but they do not cause system crashes or hangs, nor show an error message to the user. Such bugs are even more dangerous in DL applications and frameworks due to the “black-box” and stochastic nature of the systems (the end user can not understand how the model makes decisions). This paper presents the first empirical study of Keras and TensorFlow silent bugs, and their impact on users’ programs. We extracted closed issues related to Keras from the TensorFlow GitHub repository. Out of the 1,168 issues that we gathered, 77 were reproducible silent bugs affecting users’ programs. We categorized the bugs based on the effects on the users’ programs and the components where the issues occurred, using information from the issue reports. We then derived a threat level for each of the issues, based on the impact they had on the users’ programs. To assess the relevance of identified categories and the impact scale, we conducted an online survey with 103 DL developers. The participants generally agreed with the significant impact of silent bugs in DL libraries and acknowledged our findings (i.e., categories of silent bugs and the proposed impact scale). Finally, leveraging our analysis, we provide a set of guidelines to facilitate safeguarding against such bugs in DL frameworks.

99

Comments

You must log in or register to comment.

Mars_rocket t1_j0ila6s wrote

What if the bugs are what’s making the model work?

71

peepeeECKSDEE t1_j0imv5y wrote

If Bethesda made Tensorflow

50

j-bot1 t1_j0mmk4s wrote

model performance only improves when NPCs clip through walls while t-posing.

2

Acceptable-Cress-374 t1_j0kg05m wrote

There's a very interesting anecdote on this subject, that I remember reading a while ago. A researcher was working with FPGAs and genetic algorithms. They were solving for a simple low-pass function, and after running the algorithm for a couple of generations they had a working solution. The problem was that, when looking at the resulting network, there were some number of nodes looped to themselves but not in anyway connected to the input. When the researcher tried to remove the nodes, the algorithm stopped working. Turns out that the algorithm only worked on that specific hardware unit, and they figured that the looped nodes were somehow affecting the overall architecture by setting some bits in the underlying hardware. Replication was impossible on other hardware units, because they didn't have that specific hardware "bug". (listing this from memory, might have missed something, but this was the gist of the story)

12

Marha01 t1_j0l7s3u wrote

3

Acceptable-Cress-374 t1_j0l9exg wrote

Haha, thanks! It wasn't exactly how I remembered it, but close enough.

> A field-programmable gate array, or FPGA for short, is a special type of circuit board with an array of logic cells, each of which can act as any type of logic gate, connected by flexible interlinks which can connect cells. Both of these functions are controlled by software, so merely by loading a special program into the board, it can be altered on the fly to perform the functions of any one of a vast variety of hardware devices.

> Dr. Adrian Thompson has exploited this device, in conjunction with the principles of evolution, to produce a prototype voice-recognition circuit that can distinguish between and respond to spoken commands using only 37 logic gates - a task that would have been considered impossible for any human engineer. He generated random bit strings of 0s and 1s and used them as configurations for the FPGA, selecting the fittest individuals from each generation, reproducing and randomly mutating them, swapping sections of their code and passing them on to another round of selection. His goal was to evolve a device that could at first discriminate between tones of different frequencies (1 and 10 kilohertz), then distinguish between the spoken words "go" and "stop".

> This aim was achieved within 3000 generations, but the success was even greater than had been anticipated. The evolved system uses far fewer cells than anything a human engineer could have designed, and it does not even need the most critical component of human-built systems - a clock. How does it work? Thompson has no idea, though he has traced the input signal through a complex arrangement of feedback loops within the evolved circuit. In fact, out of the 37 logic gates the final product uses, five of them are not even connected to the rest of the circuit in any way - yet if their power supply is removed, the circuit stops working. It seems that evolution has exploited some subtle electromagnetic effect of these cells to come up with its solution, yet the exact workings of the complex and intricate evolved structure remain a mystery

5

DisWastingMyTime t1_j0ksuw5 wrote

I have a repeatable setup in which fixing an issue with how I encoded/decoded an image reduced my performance, has a lot to do with jpg artifacts augmentation

3

monkeyofscience t1_j0j38hc wrote

This is great! Would be nice to have one for PyTorch as well.

12

puppet_pals t1_j0jafew wrote

Awesome work! I work on KerasCV and guarding against silent bugs is my #1 priority in API design. I'll read through this paper, thanks a lot for the hard work in gathering all of these in one place!

8

Ok-Teacher-22 OP t1_j0jbr18 wrote

How about fixing complex datatypes then. Right now, if you pass a complex datatype to a metric, keras quietly chops off the imag part.

−1

puppet_pals t1_j0kjvqn wrote

is there a bug report for this? definitely file one if there is not.

​

I encountered a similar silent failure years ago where the gradient some operation on a complex # was 0. https://lukewood.xyz/blog/complex-deep-learning

2

Ok-Teacher-22 OP t1_j0mwqk7 wrote

there have been many filed. just go search and stop being lazy

−4

puppet_pals t1_j0p5ai3 wrote

> stop being lazy

please keep in mind that people on reddit are usually browsing in their free time and might be on mobile.

---

I dug into this for you...

The issue is that the complex numbers are casted to the default data type of each individual metric, which is usually floats. This is consistent with the behavior of all Keras components. Each component has a `compute_dtype` attribute, which all inputs and outputs are casted to. This allows for mixed precision computation.

Complex numbers are a weird case. The complex numbers get casted to the metric's native dtype, which is float by default, causing the imaginary components to be dropped. For most dtypes theres a logical translation from one to another; i.e. 1->1.0, 2->2.0, etc. There is not one of these to go from complex->float.

In my opinion TensorFlow should raise an error when you cast complex->float, but this is not the case in TensorFlow. I have a strong feeling that we can't change this due to backwards compat, but would have to dig deeper to verify this.

In short this is not really a Keras bug but is rather a weird interaction between Keras' mixed precision support and TensorFlow.

I hope this helps - maybe we can make a push to raise an error when casting from complex->real numbers and force users to call another function ? (i.e tf.real()). I don't know what the "Right" solution is here, but that is the history of why this issue exists.

2

waffles2go2 t1_j0imtk6 wrote

Yup and excellent and much needed work!!!

7

kc3w t1_j0k4ywz wrote

I think using python instead of strongly typed languages does not really help it either.

4

drcopus t1_j0kno5d wrote

I think you mean statically typed. Python is strongly typed.

3

I_will_delete_myself t1_j0m6je9 wrote

>I think using python instead of strongly typed languages does not really help it either.

The things is that most of the framework was implemented in C++ (63.2% according to GitHub). There is a ton of memory bugs caused with how difficult it is to manage memory.

That's why Google has been experimenting with C++ alternative so they can secure their new OS's they want to come out with and it may probably trickle down to TensorFlow to fix the bugs.

2

DisWastingMyTime t1_j0ktrl2 wrote

Is there a compiled list of these bugs somewhere?

3

DigThatData t1_j0kjvw6 wrote

Relevant reference I think you should include for your discussion: a summary of some especially pernicious silent bugs in scikit-learn that were deliberate design choices made by the library authors and whose bug impact was a consequence of opaque documentation or deceptive/non-obvious naming choices, in some cases even in spite of complaints about undesirable behavior by users. - https://www.reddit.com/r/statistics/comments/8de54s/is_r_better_than_python_at_anything_i_started/dxmnaef/?context=3

EDIT: also this - https://www.reddit.com/r/MachineLearning/comments/aryjif/d_alternatives_to_scikitlearn/egrctzk/?context=3

like you say, "do not blindly trust the framework"

--- full disclosure: I wrote that under my old account. If you choose to add that comment as a reference, please attribute it to David Marx

2

TensorDudee t1_j0kv5wt wrote

Does TensorFlow 1.15 also has some bugs. I don't like tf.keras but I'm a great fan of TF1.15. I think that version pf TensorFlow was pretty solid.

2

Ok-Teacher-22 OP t1_j0nn4ix wrote

I used tensorflow 1.15 for a LONG time because of all the bugs in the 2.x series. It was my version of Windows 2k which worked great and I used it until 2012!

1

Parzival_007 t1_j0jpi3a wrote

This is actually needed more.

0