Submitted by shahaff32 t3_y22rk0 in MachineLearning

Our NeurIPS 2022 paper "Wavelet Feature Maps Compression for Image-to-Image CNNs" is now available.

In this paper, we propose a novel approach to compress CNNs using a modified wavelet compression technique.

Abstract:

>Convolutional Neural Networks (CNNs) are known for requiring extensive computational resources, and quantization is among the best and most common methods for compressing them. While aggressive quantization (i.e., less than 4-bits) performs well for classification, it may cause severe performance degradation in image-to-image tasks such as semantic segmentation and depth estimation. In this paper, we propose Wavelet Compressed Convolution (WCC) -- a novel approach for high-resolution activation maps compression integrated with point-wise convolutions, which are the main computational cost of modern architectures. To this end, we use an efficient and hardware-friendly Haar-wavelet transform, known for its effectiveness in image compression, and define the convolution on the compressed activation map. We experiment with various tasks that benefit from high-resolution input. By combining WCC with light quantization, we achieve compression rates equivalent to 1-4bit activation quantization with relatively small and much more graceful degradation in performance.

​

​

Cityscapes semantic segmentation with different compressions.

​

KITTI depth prediction with different compressions.

152

Comments

You must log in or register to comment.

londons_explorer t1_is0rn7l wrote

This is the kind of research that makes companies with hardware accelerators (google, nvidia, tesla, etc.) suddenly have to redesign and re-buy their very expensive hardware accelerators...

16

shahaff32 OP t1_is0ths5 wrote

This is aimed mostly at edge devices, where an accelerator is not available (e.g. mobile phones), or you want to design a cheaper chip for a product that requires running such networks (e.g. autonomous vehicles)

This work was, in fact, partially supported by AVATAR consortium, aimed at smart vehicles. https://avatar.org.il/

23

londons_explorer t1_is11x8p wrote

Sure this work was aimed at that, but these same techniques can be used to make a datacenter-scale inference machine into an even more powerful one.

And presumably if a way can be found to do backpropagation in 'wavelet domain', then training could be done like this too.

9

shahaff32 OP t1_is13c2c wrote

We are in fact doing the backpropagation in the wavelet domain :)

The gradient simply goes through the inverse wavelet transform

​

See WCC/util/wavelet.py in our GitHub repo, lines 52-83 define the forward/backward of WT and IWT.

18

hughperman t1_is1qyob wrote

So. Since wavelets here are just filter banks, equivalent to fixed/non-varying convolution+downsampling blocks. Could you learn an improved set of wavelet filters to improve this result?

14

shahaff32 OP t1_is1tuuf wrote

That is indeed possible, though at a computational cost. The Haar wavelet can be implemented very efficiently because of its simplicity.

Please see Appendix F, where we shortly discuss other wavelets and their added computational costs.

13

Ecclestoned t1_is2mf31 wrote

Nice work, will definitely check it out. You're lucky that you didn't get dinged by reviewers for not citing recent works. Some examples:

GACT: Activation Compressed Training for Generic Network Architectures

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

[AC-GC: Lossy Activation Compression with Guaranteed Convergence] (https://proceedings.neurips.cc/paper/2021/hash/e655c7716a4b3ea67f48c6322fc42ed6-Abstract.html)

7

SearchAtlantis t1_is3pnyq wrote

Hey my favorite wavelet! It's what I use to explain wavelets before getting into more complex things like daubechies or others.

The compression and depending on task dimension reduction you can get with wavelets is pretty impressive.

3

shahaff32 OP t1_is4bbd7 wrote

Haar wavelet is also very efficient, as it can be implemented using additions and subtractions (and maybe a few bit manipulations) :)

You can also see Appendix F where we tested several others :)

1

danny_fel t1_is4bv5g wrote

This sounds great! I'd like to try your method on a small nvidia jetson setup. Do I still need to convert the "minimized" model to TFlite? Or it should be good as it is?

2

shahaff32 OP t1_is4jcv4 wrote

Thanks :)

In the current state the implementation is using only standard Pytorch operations, therefore it is not as optimal as it can be, and the overhead of the wavelet transforms can outweighs the speedup of the convolution.

We are currently working on a CUDA implementation to overcome that :) see Appendix H for more details

1