- Model Compression
- Network Pruning
- Knowledge Distillation
- Parameter Quantization
- Arichitecture Design
- Dynamic Network
1. Network Pruning
只要它是 NN 就 OK
Network can be pruned
- Networks are typecally over-parameterized(there is significant redundant weight or neurons)
- Importance of a weight
- Importance of a neuron: the number of times it wasn’t zero on a given data set…
- After pruning, the accuracy will drop(hopefully not too much)
- Fine-tuning on training data for recover
- Don’t prune too much at once, or the network won’t recover
要反复进行剪枝和微调,直到满足要求为止
- How about simply train a smaller network?
- It is widely known that smaller networks is more difficult to learn successfully.
- Larger network is easier to optimize?
这里有两篇观点完全相反的 *paper: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks and Rethinking the Value of Network Pruning*,进行探讨小模型是否可以直接 train
Network Pruning-Practical Issue
Weight pruning
- *The network architecture becomes irregular. Hard to implement, hard to speedup(GPU不好加速)…*.(为解决这个问题,一般而言不会直接把 weight 剪掉,而是将其设置为 0,这样可以保持网络结构的完整性)
- 详细可参考 Learning Structured Sparsity in Deep Neural Networks,扔掉 95% 的 weight 都没关系
Neuron pruning
- The network architecture is regular. Easy to implement, easy to speedup(GPU好加速)…
- A three-stage pipeline to reduce the storage requirement of neural nets
- Showed a 35x decrease in size of AlexNet from 240MB to 6.9MB with no loss in accuracy
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
2. Knowledge Distillation
目前限定在分类问题
3. Parameter Quantization
- 1. Using less bits to represent a value
- 2. Weight clustering
- K-means clustering
- 3. Represent frequent clusters by less bits, represent rare clusters by more bits
- e.g. Huffman encoding
BinaryConnect 有时候效果反而更好,因为这本质上属于一种正则
4. Arichitecture Design
- 对于 FC, 可以将其分解为两个 FC,这样可以减少参数量。本质上还是和矩阵分解有些关系
- 对于 CNN
Depthwise Separable Convolution
参数对比,*参数量可以降低 kernel_size $\times$ kernel_size,存在着参数复用*
Learn more …
- SqueezeNet
- SqueezeDet: Fully Convolutional Network for fast object detection
- MobileNet
- ShuffleNet
- Xception
- SEP-Net: Transforming k × k convolution into binary patterns for reducing model size
5. Dynamic Network
Can network adjust the computation power it need?
Possible solutions:
- 1. Train multiple classifiers
- 2. Classifier at the intermedia layer
参考 Multi-Scale Dense Networks for Resource Efficient Image Classification
- 使用训练并剪枝之后的网络权重,去初始化一个更小的网络
- weight pruning 置 0,保存非零权重,利用蒸馏的方法,将剪枝后的网络的知识蒸馏到更小的网络中
- 家教 + 教授
- 万能的 NiN
Further Studies
- Can we find winning tickets early on in training?(You et al, 2020)
- Do wining tickets generalize across datasets and optimizer?(Morcos et al, 2019)
- Can this hypothesis hold in other domains like text processing/NLP?(Yu et al, 2019)
Reading
- Robert T. Lange, Lottery Ticket Hypothesis: A Survey, 2020
- Cheng et al., A Survey of Model Compression and Acceleration for Deep Neural Networks, 2017
Song Han, Lecture 10 - Knowledge Distillation | MIT 6.S965
-------------本文结束感谢您的阅读-------------
本文链接: http://corner430.github.io/2023/06/26/Model-Compression/
版权声明: 本作品采用 知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议 进行许可。转载请注明出处!
