[4]
Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K. Sequence to sequence-video to text. In Proc. the Int. Conf. Computer Vision, Dec. 2015, pp.4534-4542.
[5]
Abdel-Hamid O, Mohamed A R, Jiang H, Deng L, Penn G, Yu D. Convolutional neural networks for speech recognition. In Proc. IEEE/ACM Trans. Audio Speech and Language Processing, July 2014, pp.1533-1545.
[6]
Eriguchi A, Hashimoto K, Tsuruok Y. Tree-to-sequence attentional neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, Aug. 2016, pp.823-833.
[7]
Farabet C, Poulet C, Han J Y, LeCun Y. CNP: An FPGA-based processor for convolutional networks. In Proc. Int. Conf. Field Programmable Logic and Applications, Aug. 31-Sept. 2, 2009, pp.32-37.
[8]
Zhang C, Li P, Sun G Y, Guan Y J, Xiao B J, Cong J. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proc. the ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, Feb. 2015, pp.161-170.
[9]
Chen T S, Du Z D, Sun N H, Wang J, Wu C Y, Chen Y J, Temam O. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proc. the 19th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2014, pp.269-284.
[10]
Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops, June 2011, pp.109-116.
[11]
Han S, Liu X Y, Mao H Z, Pu J, Pedram A, Horowitz M A, Dally W J. EIE: Efficient inference engine on compressed deep neural network. In Proc. the 43rd Int. Symp. Computer Architecture, June 2016, pp.243-254.
[12]
Bienia C, Kumar S, Singh J P, Li K. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. Int. Conf. Parallel Architectures and Compilation Techniques, Oct. 2008, pp.72-81.
[13]
Alwani M, Chen H, Ferdman M, Milder P. Fused-layer CNN accelerators. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016.
[14]
Judd P, Albericio J, Hetherington T, Aamodt T M, Moshovos A. Stripes: Bit-serial deep neural network computing. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016.
[15]
Rhu M, Gimelshein N, Clemons J, Zulfiqar A, Keckler S W. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016.
[16]
Zhang S J, Du Z D, Zhang L, Lan H Y, Liu S L, Li L, Guo Q, Chen T, Chen Y J. Cambricon-x: An accelerator for sparse neural networks. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, Oct. 2016.
[17]
Ji Y, Zhang Y H, Li S C, Chi P, Jiang C H, Qu P, Xie Y, Chen W G. NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, Oct. 2016.
[18]
Kim D, Kung J, Chai S, Yalamanchili S, Mukhopadhyay S. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.380-392.
[19]
LiKamWa R, Hou Y H, Gao Y, Polansky M, Zhong L. Red-Eye: Analog convNet image sensor architecture for continuous mobile vision. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.255-266.
[20]
Albericio J, Judd P, Hetherington T, Aamodt T, Jerger N E, Moshovos A. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016.
[21]
Chi P, Li S C, Xu C, Zhang T, Zhao J S, Liu Y P, Wang Y, Xie Y. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.27-39.
[22]
Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan J P, Hu M, Williams R S, Srikumar V. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.14-26.
[23]
Liu S L, Du Z D, Tao J H, Han D, Luo T, Xie Y, Chen Y J, Chen T S. Cambricon: An instruction set architecture for neural networks. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.393-405.
[24]
Chen Y H, Emer J, Sze V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.367-379.
[25]
Reagen B, Whatmough P, Adolf R, Rama S, Lee H, Lee S K, Hernández-Lobato J M, Wei G Y, Brooks D. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.267-278.
[26]
Song L H, Qian X H, Li H, Chen Y R. PipeLayer: A pipelined reRAM-based accelerator for deep learning. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017, pp.541-552.
[27]
Lu W Y, Yan G H, Li J J, Gong S J, Han Y H, Li X W. FlexFlow: A flexible dataflow accelerator architecture for convolutional neural networks. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017, pp.553-564.
[28]
Song M C, Hu Y, Chen H X, Li T. Towards pervasive and user satisfactory CNN across GPU microarchitectures. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017.
[31]
Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. the 28th Int. Conf. Neural Information Processing Systems, Dec. 2015, pp.91-99.
[32]
Parkhi O M, Vedaldi A, Zisserman A. Deep face recognition. In Proc. the British Machine Vision Conf., September 2015, pp.41: 1-41: 12.
[34]
Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In Proc. IEEE Int. Conf. Computer Vision, Dec. 2015, pp.1520-1528.
[35]
Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2013, pp.6645-6649.
[37]
Chen Y J, Luo T, Liu S L, Zhang S J, He L Q, Wang J, Li L, Chen T S, Xu Z W, Sun N H, Temam O. DaDianNao: A machine-learning supercomputer. In Proc. the 47th Annual IEEE/ACM Int. Symp. Microarchitecture, Dec. 2014, pp.609-622.
[38]
Du Z D, Fasthuber R, Chen T S, Ienne P, Li L, Feng X B, Chen Y J, Temam O. ShiDianNao: Shifting vision processing closer to the sensor. In Proc. the 42nd Annual Int. Symp. Computer Architecture, June 2015, pp.92-104.
[39]
Chen T S, Chen Y J, Duranton M, Guo Q, Hashmi A, Lipasti M, Nere A, Qiu S, Sebag M, Temam O. BenchNN: On the broad potential application scope of hardware neural network accelerators. In Proc. IEEE Int. Symp. Workload Characterization, Nov. 2012, pp.36-45.
[40]
Adolf R, Rama S, Reagen B, Wei G Y, Brooks D. Fathom: Reference workloads for modern deep learning methods. In Proc. IEEE Int. Symp. Workload Characterization, Sept. 2016.
[45]
Karpathy A, Li F F. Deep visual-semantic alignments for generating image descriptions. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3128-3137.
[46]
He K M, Zhang X Y, Ren S Q, Sun J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proc. IEEE Int. Conf. Computer Vision, Dec. 2015, pp.1026-1034.
[47]
Taigman Y, Yang M, Ranzato M, Wolf L. DeepFace: Closing the gap to human-level performance in face verification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.1701-1708.
[48]
Le Q V. Building high-level features using large scale unsupervised learning. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2013, pp.8595-8598.
[50]
Phansalkar A, Joshi A, John L K. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. In Proc. the 34th Annual Int. Symp. Computer Architecture, June 2007, pp.412-423.
[51]
McCalpin J D. Memory bandwidth and machine balance in current high performance computers. In Proc. the IEEE Computer Society Technical Committee on Computer Architecture, Dec. 1995, pp.19-25.
[53]
Graves A, Jaitly N. Towards end-to-end speech recognition with recurrent neural networks. In Proc. the 31st Int. Conf. Machine Learning, June 2014, pp.1764-1772.
[58]
Chen D L, Dolan W B. Collecting highly parallel data for paraphrase evaluation. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 2011, pp.190-200.
[59]
Mucci P J, Browne S, Deane C, Ho G. PAPI: A portable interface to hardware performance counters. In Proc. Department of Defense HPCMP Users Group Conf., June 1999, pp.7-10.
[60]
Ding C, Zhong Y T. Predicting whole-program locality through reuse distance analysis. In Proc. the ACM SIGPLAN Conf. Programming Language Design and Implementation, June 2003, pp.245-257.
[61]
Pawlowski J T. Hybrid memory cube: Breakthrough dram performance with a fundamentally re-architected dram subsystem. In Proc. the 23rd Hot Chips Symp., August 2011.
[64]
Denkowski M, Lavie A. Meteor universal: Language specific translation evaluation for any target language. In Proc. the 9th Workshop on Statistical Machine Translation, June 2014, pp.376-380.