엑셀로 업데이트하고 필터링 해둘 것
# |
저자(1,교신) |
제목 |
학회연도 | 비고 |
1 |
Du, Zidong, et al |
ShiDianNao: shifting vision processing closer to the sensor. |
ACM, 2015. | Proceedings of the 42nd Annual International Symposium on Computer Architecture. |
2 |
Gong, Yunchao, et al. |
Compressing deep convolutional networks using vector quantization. |
arXiv preprint arXiv:1412.6115 (2014). | |
3 | Du, Zidong, et al | Neuromorphic accelerators: a comparison between neuroscience and machine-learning approaches. | ACM, 2015. | Proceedings of the 48th International Symposium on Microarchitecture |
4 | Song Han, William J.Dally | EIE: Efficient Inference Engine on Compressed Deep Neural Network | ISCA2016 | |
5 | Yu-Hsin Chen, Joel Emer and Vivienne Sze | Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks | MIT, NVIDIA | |
6 | Y LeCun, Y Bengio, G.Hinton | Deep Learning | Nature | |
7 | Lixye Xia | MNSIM : Simulation Platform for Memristor-based Neuromorphic Computing System | DATE2016 | |
8 | Ishwar Bhati, Bruce Jacob | Flexible Auto-Refresh: Enabling Scalable and Energy-Efficient DRAM Refresh Reductions | ISCA 2016 | DRAM 관련 |
9 | IMPACT 2016 | DRAM 3D architecture, heat dissipation, | ||
10 | Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal | In-Datacenter Performance Analysis of a Tensor Processing Unit | ISCA 2017 | TPU, Deep learning accelerator |
11 | Yufei Ma | End-to-End Scalable FPGA Accelerator for Deep Residual Networks | ISCA 2017 | CNN FPGA Accelerator |
12 | Kaiming He | Identiti Mappings in Deep residual networks | 2016 | |
13 | Rich feature hierarchies for accurate object detection and semantic segmentation |
| R-CNN | |
14 |
| Spatial Pyramid Pooling in Deep Convolutional networks for Visual Recognition |
| SPPNet |
15 | Shaowing Ren, Kaiming He, Ross Girshick | Faster R-CNN : Towards Real-time Object Detection with Region Proposal Networks |
| object detection/localization 성능 개선 |
16 | Jan Hosang | What makes for effective detection proposals? | 2015 | Region proposal 관련 |
17 |
| Object Detection Networks on Convolutional Feature Maps |
|
|
18 |
| Rethinking the Inception Architecture for Computer Vision | 2015 | GoogLeNet Inception-V3 convolution factorization + label smoothing + auxiliary classifier + BN |
19 |
| Inception-V4, Inception-ResNet and the Impact of Residual Connections on Learning | 2016 | Inception + ResNet |
20 | C.Y.Lee, S.Xie | Deeplly supervised nets |
| Auxiliary classifier (GoogLeNet) |
21 | Liwei Wang, Chen-Yu Lee | Training Deeper Convolutional networks with Deep SuperVision |
| Auxiliary classifier (GoogLeNet) |
22 | Sergey Ioffe, Christian Szegedy | Barch Normalization : Accelerating Deep Network Training by Reducing Internal Covariate Shift | 2015 | Batch Normalization to improve training performance. (vanishing and exploding gradient) |
23 | Vivenne Sze, Yu-Hsin Chen, Tien-Ju Yang | Efficient Processing of Deep Neural Networks : A Tutorial and Survey | 2017 | |
ISCA 2016 Neural Network Session Paper Skim Review
1 |
Cnvlutin: Ineffectual-Neuron-Free Deep Convolutional Neural Network Computing |
Jorge Albericio |
2 |
ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars |
Ali Shafiee |
3 |
PRIME : A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory |
Ping Chi |
4 |
EIE: Efficient Inference Engine on Compressed Deep Neural Network |
Song Han |
5 |
RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision |
Robert LiKamWa |
6 |
Minerva: Enabling Low-Power, High-Accuracy Deep Neural Network Accelerators |
Brandon Reagen |
7 |
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks |
Yu-Hsin Chen |
8 |
Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory |
Duckhwan Kim |
9 |
An Instruction Set Architecture for Neural Networks |
Shaoli Liu |
1. (7) Eyeriss
Abstract
CNN을 함에 있어 data movement에 의한 energy consumption등이 크기 때문에, data movement를 최소화하는 것이 중요하다. Row Stationary(RS)라는 data movement를 최소화할 수 있는 data flow를 제안함. 이는 local data를 재사용 함으로써 가능함. 이전에 고안된 data flow기법들과는 다르게, 다른 형태의 CNN에도 적용이 가능하고, Processing engine(PE)의 local storage의 사용과 direct inter-PE communication 과 spatial parallelism을 최대로 이용하여 모든 type의 data movement를 감소시켰다. Energy efficiency를 측정하기 위해 분석 framework를 제시했다. Alexnet의 여러 설정값들을 바꾸어가며 실험을 한듯하며, energy analysis가 가능한 fabricated chip에서도 demonstration을 수행하였음.
2. (6) Minerva
This paper presents Minerva, a highly automated co-design approach across the algorithm, architecture, and circuit levels to optimize DNN hardware accelerators. 기존의 fixed-point accelerators와 비교하여, heterogeneous data-type optimization이 power 를 1.5배 감소시킨다는 것을 보인다. inline predication and pruning of small activity values further reduces power by 2x. and active hardware fault detection coupled with domain-aware error mitigation eliminates an additional 2:7 through lowering SRAM voltages. 5개의 dataset으로 실험을 했으며 평균 8.1배의 전력감소를 확인했다. Minerva enables highly accurate, ultra-low power DNN accelerators.(미네르바가 뭘 하는 것인지 확인이 필요하다.)
3. (3) PRIME
Processing-in-memory (PIM) is a promising solution to address the “memory wall” challenges for future computer systems. 많은 PIM들이 있지만, ReRAM이 main memory로써 가능성이 가장 높고, crossbar array structure는 matrix-vector multiplication을 계산하는 데에 유용하다. 이는 NN application에 적합하다. 이 논문에서 PRIME이라 불리는 ReRAM기반 PIM을 제시한다. Micro architecture와 circuit design을 제공한다. ReRAM기반 PIM을 이용하여 NN implementation이 쉽도록 SW Platform도 제공한다. 가장 최신의 NPU에 비해서, PRIME이 2360배 성능이 좋고 895배 전력소모를 덜한다.
4. (2) ISAAC
DaDianNao를 만든 팀인듯. Memristor crossbar arrays를 이용해서, analog방식으로 dot product를 함. 이 논문에서는 1. crossbar를 이용해서 pipelined architecture를 만듬, eDRAM을 pipeline중간 buffer로 사용함. 2. analog방식으로 data를 encoding할 수 있는 방식을 고안함. Analog to Digital overhead를 줄여줌. 3. analog CNN을 만드는데 필요한 Many digital components를 정의 함. DaDianNao에 비해서 14.8배의 throughput, 5.5배 energy, 7.5배의 computational density에서 이득이 있음.
5. (8) INsight
DNN을 위해서 폰노이만 구조는 external memory에 weight para를 저장하고, processing elements는 timed-shared한다. 이 논문에서는 neuromorphic computing system을 제시함. non-conventional compiler, neuromorphic architecture, microarchitecture로 구성되어 있음. The compiler factorizes a trained, feedforward network into a sparsely connected network, compresses the weights linearly, and generates a time delay neural network reducing the number of connections. FPGA에 구현했으며 MNIST classification에서 97.64%가 나왔다.