LIST

Study/Paper list

LIST

와와치 2016. 4. 14. 22:58

엑셀로 업데이트하고 필터링 해둘 것

#	저자(1,교신)	제목	학회연도	비고
1	Du, Zidong, et al	ShiDianNao: shifting vision processing closer to the sensor.	ACM, 2015.	Proceedings of the 42nd Annual International Symposium on Computer Architecture.
2	Gong, Yunchao, et al.	Compressing deep convolutional networks using vector quantization.	arXiv preprint arXiv:1412.6115 (2014).
3	Du, Zidong, et al	Neuromorphic accelerators: a comparison between neuroscience and machine-learning approaches.	ACM, 2015.	Proceedings of the 48th International Symposium on Microarchitecture
4	Song Han, William J.Dally	EIE: Efficient Inference Engine on Compressed Deep Neural Network	ISCA2016
5	Yu-Hsin Chen, Joel Emer and Vivienne Sze	Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks	ISCA2016	MIT, NVIDIA Slides URL
6	Y LeCun, Y Bengio, G.Hinton	Deep Learning	Nature
7	Lixye Xia	MNSIM : Simulation Platform for Memristor-based Neuromorphic Computing System	DATE2016
8	Ishwar Bhati, Bruce Jacob	Flexible Auto-Refresh: Enabling Scalable and Energy-Efficient DRAM Refresh Reductions	ISCA 2016	DRAM 관련
9			IMPACT 2016	DRAM 3D architecture, heat dissipation,
10	Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal	In-Datacenter Performance Analysis of a Tensor Processing Unit	ISCA 2017	TPU, Deep learning accelerator
11	Yufei Ma	End-to-End Scalable FPGA Accelerator for Deep Residual Networks	ISCA 2017	CNN FPGA Accelerator
12	Kaiming He	Identiti Mappings in Deep residual networks	2016
13		Rich feature hierarchies for accurate object detection and semantic segmentation		R-CNN
14		Spatial Pyramid Pooling in Deep Convolutional networks for Visual Recognition		SPPNet
15	Shaowing Ren, Kaiming He, Ross Girshick	Faster R-CNN : Towards Real-time Object Detection with Region Proposal Networks		object detection/localization 성능 개선
16	Jan Hosang	What makes for effective detection proposals?	2015	Region proposal 관련
17		Object Detection Networks on Convolutional Feature Maps
18		Rethinking the Inception Architecture for Computer Vision	2015	GoogLeNet Inception-V3 convolution factorization + label smoothing + auxiliary classifier + BN
19		Inception-V4, Inception-ResNet and the Impact of Residual Connections on Learning	2016	Inception + ResNet
20	C.Y.Lee, S.Xie	Deeplly supervised nets		Auxiliary classifier (GoogLeNet)
21	Liwei Wang, Chen-Yu Lee	Training Deeper Convolutional networks with Deep SuperVision		Auxiliary classifier (GoogLeNet)
22	Sergey Ioffe, Christian Szegedy	Barch Normalization : Accelerating Deep Network Training by Reducing Internal Covariate Shift	2015	Batch Normalization to improve training performance. (vanishing and exploding gradient)
23	Vivenne Sze, Yu-Hsin Chen, Tien-Ju Yang	Efficient Processing of Deep Neural Networks : A Tutorial and Survey	2017

ISCA 2016 Neural Network Session Paper Skim Review

1	Cnvlutin: Ineffectual-Neuron-Free Deep Convolutional Neural Network Computing	Jorge Albericio
2	ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars	Ali Shafiee
3	PRIME : A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory	Ping Chi
4	EIE: Efficient Inference Engine on Compressed Deep Neural Network	Song Han
5	RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision	Robert LiKamWa
6	Minerva: Enabling Low-Power, High-Accuracy Deep Neural Network Accelerators	Brandon Reagen
7	Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks	Yu-Hsin Chen
8	Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory	Duckhwan Kim
9	An Instruction Set Architecture for Neural Networks	Shaoli Liu

1. (7) Eyeriss

Abstract

CNN을 함에 있어 data movement에 의한 energy consumption등이 크기 때문에, data movement를 최소화하는 것이 중요하다. Row Stationary(RS)라는 data movement를 최소화할 수 있는 data flow를 제안함. 이는 local data를 재사용 함으로써 가능함. 이전에 고안된 data flow기법들과는 다르게, 다른 형태의 CNN에도 적용이 가능하고, Processing engine(PE)의 local storage의 사용과 direct inter-PE communication 과 spatial parallelism을 최대로 이용하여 모든 type의 data movement를 감소시켰다. Energy efficiency를 측정하기 위해 분석 framework를 제시했다. Alexnet의 여러 설정값들을 바꾸어가며 실험을 한듯하며, energy analysis가 가능한 fabricated chip에서도 demonstration을 수행하였음.

2. (6) Minerva

This paper presents Minerva, a highly automated co-design approach across the algorithm, architecture, and circuit levels to optimize DNN hardware accelerators. 기존의 fixed-point accelerators와 비교하여, heterogeneous data-type optimization이 power 를 1.5배 감소시킨다는 것을 보인다. inline predication and pruning of small activity values further reduces power by 2x. and active hardware fault detection coupled with domain-aware error mitigation eliminates an additional 2:7 through lowering SRAM voltages. 5개의 dataset으로 실험을 했으며 평균 8.1배의 전력감소를 확인했다. Minerva enables highly accurate, ultra-low power DNN accelerators.(미네르바가 뭘 하는 것인지 확인이 필요하다.)

3. (3) PRIME

Processing-in-memory (PIM) is a promising solution to address the “memory wall” challenges for future computer systems. 많은 PIM들이 있지만, ReRAM이 main memory로써 가능성이 가장 높고, crossbar array structure는 matrix-vector multiplication을 계산하는 데에 유용하다. 이는 NN application에 적합하다. 이 논문에서 PRIME이라 불리는 ReRAM기반 PIM을 제시한다. Micro architecture와 circuit design을 제공한다. ReRAM기반 PIM을 이용하여 NN implementation이 쉽도록 SW Platform도 제공한다. 가장 최신의 NPU에 비해서, PRIME이 2360배 성능이 좋고 895배 전력소모를 덜한다.

4. (2) ISAAC

DaDianNao를 만든 팀인듯. Memristor crossbar arrays를 이용해서, analog방식으로 dot product를 함. 이 논문에서는 1. crossbar를 이용해서 pipelined architecture를 만듬, eDRAM을 pipeline중간 buffer로 사용함. 2. analog방식으로 data를 encoding할 수 있는 방식을 고안함. Analog to Digital overhead를 줄여줌. 3. analog CNN을 만드는데 필요한 Many digital components를 정의 함. DaDianNao에 비해서 14.8배의 throughput, 5.5배 energy, 7.5배의 computational density에서 이득이 있음.

5. (8) INsight

DNN을 위해서 폰노이만 구조는 external memory에 weight para를 저장하고, processing elements는 timed-shared한다. 이 논문에서는 neuromorphic computing system을 제시함. non-conventional compiler, neuromorphic architecture, microarchitecture로 구성되어 있음. The compiler factorizes a trained, feedforward network into a sparsely connected network, compresses the weights linearly, and generates a time delay neural network reducing the number of connections. FPGA에 구현했으며 MNIST classification에서 97.64%가 나왔다.

현재글LIST

리얼지니어스

AVX, Deep learning, NPU, D2D, Advanced packaging, WLCSP, EMIB, Intel, Foveros, Sapphire Rapids, ucie, 2.5D Packaging, die-to-die, AMX, Info, Chiplet, AI, SoIC, 3D Fabric, CoWoS,

Today :
Yesterday :

리얼지니어스

LIST

'Study/Paper list'의 다른글

티스토리툴바

« 2025/10 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31