Study/Paper list

LIST

와와치 2016. 4. 14. 22:58

엑셀로 업데이트하고 필터링 해둘 것

 #

저자(1,교신)

제목 

학회연도

 비고

 1

 Du, Zidong, et al

 ShiDianNao: shifting vision processing closer to the sensor.

 ACM, 2015.

 Proceedings of the 42nd Annual International Symposium on Computer Architecture.

 2

 Gong, Yunchao, et al.

 Compressing deep convolutional networks using vector quantization.

 arXiv preprint arXiv:1412.6115 (2014).

 
 3 Du, Zidong, et al Neuromorphic accelerators: a comparison between neuroscience and machine-learning approaches. ACM, 2015.Proceedings of the 48th International Symposium on Microarchitecture 

 4

 Song Han, William J.Dally

 EIE: Efficient Inference Engine on Compressed Deep Neural Network ISCA2016 
 5

 Yu-Hsin Chen, Joel Emer and Vivienne Sze 

Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks 

 ISCA2016 

MIT, NVIDIA

Slides URL

 6

 Y LeCun, Y Bengio, G.Hinton

Deep Learning 

Nature 

 
 7

 Lixye Xia

MNSIM : Simulation Platform for Memristor-based Neuromorphic Computing System 

DATE2016 

 
 8

Ishwar Bhati, Bruce Jacob 

 Flexible Auto-Refresh: Enabling Scalable and Energy-Efficient DRAM Refresh Reductions

 ISCA 2016

 DRAM 관련

 9

  

 IMPACT 2016

 DRAM 3D architecture, heat dissipation,  

 10

 Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal

 In-Datacenter Performance Analysis of a Tensor Processing Unit​

 ISCA 2017

TPU, Deep learning accelerator 

 11

 Yufei Ma

End-to-End Scalable FPGA Accelerator for Deep Residual Networks 

 ISCA 2017

CNN FPGA Accelerator 

 12

Kaiming He 

Identiti Mappings in Deep residual networks 

2016  

 13

 

 Rich feature hierarchies for accurate object detection and semantic segmentation

 

R-CNN 

14 

 

Spatial Pyramid Pooling in Deep Convolutional networks for Visual Recognition 

 

SPPNet 

 15

Shaowing Ren, Kaiming He, Ross Girshick 

Faster R-CNN : Towards Real-time Object Detection with Region Proposal Networks 

 

object detection/localization 성능 개선 

 16

Jan Hosang 

What makes for effective detection proposals? 

2015 

Region proposal 관련 

 17

 

Object Detection Networks on Convolutional Feature Maps 

 

 

18 

 

Rethinking the Inception Architecture for Computer Vision 

2015 

GoogLeNet Inception-V3 

convolution factorization + label smoothing + auxiliary classifier + BN

 19

 

Inception-V4, Inception-ResNet and the Impact of Residual Connections on Learning 

 2016 Inception + ResNet

 20

C.Y.Lee, S.Xie 

Deeplly supervised nets 

 

Auxiliary classifier (GoogLeNet) 

21 

Liwei Wang, Chen-Yu Lee 

Training Deeper Convolutional networks with Deep SuperVision 

 

Auxiliary classifier (GoogLeNet) 

 22

Sergey Ioffe, Christian Szegedy 

Barch Normalization : Accelerating Deep Network Training by Reducing Internal Covariate Shift 

2015 

Batch Normalization to improve training performance. (vanishing and exploding gradient) 

 23

Vivenne Sze, Yu-Hsin Chen, Tien-Ju Yang 

Efficient Processing of Deep Neural Networks : A Tutorial and Survey 

2017 

 
     





ISCA 2016 Neural Network Session Paper Skim Review

1

 Cnvlutin: Ineffectual-Neuron-Free Deep Convolutional Neural Network Computing

 Jorge Albericio

2

 ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars

Ali Shafiee

3

 PRIME : A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory

 Ping Chi

4

 EIE: Efficient Inference Engine on Compressed Deep Neural Network

 Song Han

5

 RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision

 Robert LiKamWa

6

 Minerva: Enabling Low-Power, High-Accuracy Deep Neural Network Accelerators

 Brandon Reagen

7

 Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks

 Yu-Hsin Chen

8

 Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory

Duckhwan Kim

9

 An Instruction Set Architecture for Neural Networks

 Shaoli Liu


1. (7) Eyeriss

Abstract

CNN을 함에 있어 data movement에 의한 energy consumption등이 크기 때문에, data movement를 최소화하는 것이 중요하다. Row Stationary(RS)라는 data movement를 최소화할 수 있는 data flow를 제안함. 이는 local data를 재사용 함으로써 가능함. 이전에 고안된 data flow기법들과는 다르게, 다른 형태의 CNN에도 적용이 가능하고, Processing engine(PE)의 local storage의 사용과 direct inter-PE communication 과 spatial parallelism을 최대로 이용하여 모든 type의 data movement를 감소시켰다. Energy efficiency를 측정하기 위해 분석 framework를 제시했다. Alexnet의 여러 설정값들을 바꾸어가며 실험을 한듯하며, energy analysis가 가능한 fabricated chip에서도  demonstration을 수행하였음.

2. (6) Minerva

This paper presents Minerva, a highly automated co-design approach across the algorithm, architecture, and circuit levels to optimize DNN hardware accelerators. 기존의 fixed-point accelerators와 비교하여, heterogeneous data-type optimization이 power 를 1.5배 감소시킨다는 것을 보인다. inline predication and pruning of small activity values further reduces power by 2x. and active hardware fault detection coupled with domain-aware error mitigation eliminates an additional 2:7 through lowering SRAM voltages. 5개의 dataset으로 실험을 했으며 평균 8.1배의 전력감소를 확인했다. Minerva enables highly accurate, ultra-low power DNN accelerators.(미네르바가 뭘 하는 것인지 확인이 필요하다.)

3. (3) PRIME

Processing-in-memory (PIM) is a promising solution to address the “memory wall” challenges for future computer systems. 많은 PIM들이 있지만, ReRAM이 main memory로써 가능성이 가장 높고, crossbar array structure는 matrix-vector multiplication을 계산하는 데에 유용하다. 이는 NN application에 적합하다. 이 논문에서 PRIME이라 불리는 ReRAM기반 PIM을 제시한다. Micro architecture와 circuit design을 제공한다. ReRAM기반 PIM을 이용하여 NN implementation이 쉽도록 SW Platform도 제공한다. 가장 최신의 NPU에 비해서, PRIME이 2360배 성능이 좋고 895배 전력소모를 덜한다. 

4. (2) ISAAC

DaDianNao를 만든 팀인듯. Memristor crossbar arrays를 이용해서, analog방식으로 dot product를 함. 이 논문에서는 1. crossbar를 이용해서  pipelined architecture를 만듬, eDRAM을 pipeline중간 buffer로 사용함. 2. analog방식으로 data를 encoding할 수 있는 방식을 고안함. Analog to Digital overhead를 줄여줌. 3. analog CNN을 만드는데 필요한 Many digital components를 정의 함. DaDianNao에 비해서 14.8배의 throughput, 5.5배 energy, 7.5배의 computational density에서 이득이 있음.

5. (8) INsight

DNN을 위해서 폰노이만 구조는 external memory에 weight para를 저장하고, processing elements는 timed-shared한다. 이 논문에서는 neuromorphic computing system을 제시함. non-conventional compiler, neuromorphic architecture, microarchitecture로 구성되어 있음. The compiler factorizes a trained, feedforward network into a sparsely connected network, compresses the weights linearly, and generates a time delay neural network reducing the number of connections. FPGA에 구현했으며 MNIST classification에서 97.64%가 나왔다.