Tags - Title Authors Affiliations
Inference; SIMD High-Performance Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs
paper note
Glenn Henry; Parviz Palangpour Centaur Technology
Inference; dataflow Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workload
paper note
Dennis Abts; Jonathan Ross Groq Inc.
Spiking; dataflow; Sparsity SpinalFlow: An Architecture and Dataflow Tailored for Spiking Neural Networks
paper note
Surya Narayanan; Karl Taht University of Utah
Inference; benchmarking MLPerf Inference Benchmark
paper note
Vijay Janapa Reddi; Lingjie Xu, etc.
GPU; Compression Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs
paper note
Esha Choukse; Michael Sullivan University of Texas at Austin; NVIDIA
Inference; runtime A Multi-Neural Network Acceleration Architecture
paper note
Eunjin Baek; Dongup Kwon; Jangwoo Kim Seoul National University
Inference; Dynamic fixed-point DRQ: Dynamic Region-Based Quantization for Deep Neural Network Acceleration
paper note
Zhuoran Song; Naifeng Jing; Xiaoyao Liang Shanghai Jiao Tong University
Training; LSTM; GPU Echo: Compiler-Based GPU Memory Footprint Reduction for LSTM RNN Training
paper note
Bojian Zheng; Nandita Vijaykumar University of Toronto
Inference DeepRecSys: A System for Optimizing End-to-End At-Scale Neural Recommendation
paper note
Udit Gupta; Samuel Hsia; Vikram Saraph Harvard University; Facebook Inc


Tags - Title Authors Affiliations
Inference, Dataflow 3D-based Video Recognition Acceleration by Leveraging Temporal Locality
paper note
Huixiang Chen; Tao Li University of Florida
Inference; Quantumn A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron Superconducting Technology
paper note
Ruizhe Cai; Ao Ren; Nobuyuki Yoshikawa; Yanzhi Wang Northeastern University
Training; Reinforcement Learning; Distributed training Accelerating Distributed Reinforcement Learning with In-Switch Computing
paper note
Youjie Li; Jian Huang UIUC
Training; Sparsity Eager Pruning: Algorithm and Architecture Support for Fast Training of Deep Neural Networks
paper note
Jiaqi Zhang; Tao Li University of Florida
Inference; Sparsity; Bit-serial Laconic Deep Learning Inference Acceleration
paper note
Sayeh Sharify; Andreas Moshovos University of Toronto
Inference; Memory; bandwidth-saving; large-scale networks; compression MnnFast: A Fast and Scalable System Architecture for Memory-Augmented Neural Networks
paper note
Hanhwi Jang; Jangwoo Kim POSTECH; Seoul National University
Inference; ReRAM; Sparsity Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks
paper note
Tzu-Hsien Yang National Taiwan University; Academia Sinica; Macronix International.
Infernce; Redundant computing TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network
paper note
Chunhua Deng; Bo Yuan Rutgers University
Training; CNN; floating point FloatPIM_ in-memory acceleration of deep neural network training with high precision
paper note
Mohsen Imani; Tajana Rosing UC San Diego
Training; Programming model Cambricon-F_ machine learning computers with fractal von neumann architecture
paper note
Yongwei Zhao; Yunji Chen ICT; Cambricon


Tags - Title Authors Affiliations
Training;CNN; RNN A Configurable Cloud-Scale DNN Processor for Real-Time AI
paper note
Jeremy Fowers; Doug Burger Microsoft
Inference; ReRAM PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine- Learning Algorithms
paper note
Prakalp Srivastava; Mingu Kang University of Illinois at Urbana-Champaign; IBM
Inference; Dataflow Computation Reuse in DNNs by Exploiting Input Similarity
paper slides note
Marc Riera; Antonio Gonza ?lez Universitat Polite ?cnica de Catalunya
Spiking Flexon: A Flexible Digital Neuron for Efficient Spiking Neural Network Simulations
paper note slides
Dayeol Lee; Jangwoo Kim Seoul National University; University of California
Space-time computing Space-Time Algebra: A Model for Neocortical Computation
paper slides note
James E. Smith University of Wisconsin-Madison
Inference; Cross-module optimization RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM
paper note
Fengbin Tu; Shaojun Wei Tsinghua University
Inference;Datapath: bit-serial Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
paper note
Charles Eckert; Reetuparna Das University of Michigan; Intel Corporation
Inference;Cross-module optimization EVA2: Exploiting Temporal Redundancy in Live Computer Vision
paper note slides
Mark Buckler; Adrian Sampson Cornell University
Inference;CNN; Cross-module optimization; Power optimization Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision
paper slides note
Yuhao Zhu; Paul Whatmough University of Rochetster; ARM Research
Inference;GAN; Sparsity; MIMD; SIMD GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks
paper note
Amir Yazdanbakhsh; Hadi Esmaeilzadeh Georgia Institute of Technology; UC San Diego; Qualcomm Technologies
Inference; CNN; Approximate SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks
paper note
Vahideh Akhlaghi; Hadi Esmaeilzadeh Georgia Institute of Technology; UC San Diego; Qualcomm .
Inference;CNN; Sparsity; UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition
paper note
Kartik Hegde; Christopher W. Fletche University of Illinois at Urbana-Champaign; NVIDIA
Inference; Non-uniform Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation
paper note
Eunhyeok Park; Sungjoo Yoo Seoul National University
Inference; Dataflow: Dynamic Prediction Based Execution on Deep Neural Networks
paper note
Mingcong Song; Tao Li University of Flirida
Inference; Datapath: bit-serial Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network
paper note
Hardik Sharma; Hadi Esmaeilzadeh Georgia Institute of Technology; University of California
Training; memory: bandwith-saving Gist: Efficient Data Encoding for Deep Neural Network Training
paper note
Animesh Jain; Gennady Pekhimenko Microsoft Research; University of Toronto; Univerity of Michigan
Inference; Cross-module optimization The Dark Side of DNN Pruning
paper note
Reza Yazdani; Antonio Gonza ?lez Universitat Polite ?cnica de Catalunya


Tags - Title Authors Affiliations
Inference In-Datacenter Performance Analysis of a Tensor Processing Unit
paper note
Norman P. Jouppi Google
Inference; Dataflow Maximizing CNN Accelerator Efficiency Through Resource Partitioning
paper note
Yongming Shen Stony Brook University
Training SCALEDEEP: A Scalable Compute Architecture for Learning and Evaluating Deep Networks
paper note
Swagath Venkataramani; Anand Raghunathan Purdue University; Parallel Computing Lab; Intel Corporation
Inference; Algorithm-architecture-codesign Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
paper note
Jiecao Yu; Scott Mahlke University of Michigan; ARM
Inference; Sparsity SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
paper note
Angshuman Parashar; William J. Dally NVIDIA; MIT; UC-Berkeley; Stanford University
Training; Low-bit Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent
paper note
Christopher De Sa; Kunle Olukotun Stanford University


Tags - Title Authors Affiliations
Inference;Sparsity Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
paper note
Jorge Albericio; Tayler Hetheringto University of Toronto; University of British Columbia
Inference; Analog ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars
paper note
Ali Shafiee; Vivek Srikumar University of Utah,Hewlett Packard Labs
Inference; PIM PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory
paper note
Ping Chi; Yuan Xie University of California
Inference; Sparsity EIE: Efficient Inference Engine on Compressed Deep Neural Network
paper note
Song Han; William J. Dally Stanford University; NVIDIA
Inference; Analog RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile
paper note
Robert LiKamWa; Lin Zhong Rice University
Inference; Architecture-Physical-Co-design Minerva: Enabling Low-Power; Highly-Accurate Deep Neural Network Accelerators
paper note
Brandon Reagen; David Brooks Harvard University
Inference; Dataflow Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
paper note
Yu-Hsin Chen; Vivienne Sze MIT; NVIDIA
Inference; 3D integration Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory
paper note
Duckhwan Kim; Saibal Mukhopadhyay Georgia Institute of Technology
Inference Cambricon: An Instruction Set Architecture for Neural Networks
paper note
Shaoli Liu; Tianshi Chen CAS; Cambricon Ltd.


Tags - Title Authors Affiliations
Inference; Cross-module optimization ShiDianNao: Shifting Vision Processing Closer to the Sensor
paper note
Zidong Du ICT



Tags - Title Authors Affiliations
Inference; Security Shredder: Learning Noise Distributions to Protect Inference Privacy
paper note
Fatemehsadat Mireshghallah; Mohammadkazem Taram; UCSD
Algorithm-Architecture co-design; Security DNNGuard: An Elastic Heterogeneous DNN Accelerator Architecture against Adversarial Attacks
paper note
Xingbin Wang; Rui Hou; Boyan Zhao; CAS; USC
programming model; Algorithm-Architecture co-design Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
paper note
Xuan Yang; Mark Horowitz; Stanford; THU
Algorithm-Architecture co-design; security DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints
paper note codes
Xing Hu; Yuan Xie; UCSB
Training; distributed computing Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training
paper note
Qinyi Luo; Jiaao He; Youwei Zhuo; Xuehai Qian USC
compression PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning
Wei Niu; Xiaolong Ma; Sheng Lin; College of William and Mary; Northeastern ; USC
Power optimization; compute-memory trade-off Capuchin: Tensor-based GPU Memory Management for Deep Learning
paper note
Xuan Peng; Xuanhua Shi; Hulin Dai; HUST; MSRA; USC
Compute-memory trade-off NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Bongjoon Hyun; Youngeun Kwon; Yujeong Choi; KAIST
Algorithm-Architecture co-design FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System
paper note codes
Size Zheng; Yun Liang; Shuo Wang; PKU


Tags - Title Authors Affiliations
Inference, ReRAM PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference
paper note
Aayush Ankit; Dejan S Milojičić; Purdue; UIUC; HP
Reinforcement Learning FA3C: FPGA-Accelerated Deep Reinforcement Learning
paper note
Hyungmin Cho; Pyeongseok Oh; Jiyoung Park; Hongik University; SNU
Inference, ReRAM FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture
paper note
Yu Ji; Yuan Xie; THU; UCSB
Inference, Bit-serial Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks
paper note
Alberto Delmas Lascorz; Andreas Ioannis Moshovos; Toronto; NVIDIA
Inference, Dataflow TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators
paper note codes
Mingyu Gao; Xuan Yang; Jing Pu; Stanford
Inference, CNN, Systolic, Sparsity Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization
paper codes note
Hsiangtsung Kung;Bradley McDanel; Saiqian Zhang Harvard
Training, CNN, Distributed computing Split-CNN: Splitting Window-based Operations in Convolutional Neural Networks for Memory System Optimization
paper note
Tian Jin; Seokin Hong IBM; Kyungpook National University
Training, Distributed computing HOP: Heterogeneity-Aware Decentralized Training
paper note
Qinyi Luo; Jinkun Lin; Youwei Zhuo; Xuehai Qian USC; THU
Training, Compiler Astra: Exploiting Predictability to Optimize Deep Learning
paper note
Muthian Sivathanu; Tapan Chugh; Sanjay S Singapuram; Lidong Zhou Microsoft
Training, Quantization, Compression ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers
paper note
Ao Ren; Tianyun Zhang; Shaokai Ye; Northeastern; Syracuse; SUNY; Buffalo; USC
Security DeepSigns: An End-to-End Watermarking Framework for Protecting the Ownership of Deep Neural Networks
paper note
Bita Darvish Rouhani; Huili Chen; Farinaz Koushanfar UCSD


Tags - Title Authors Affiliations
Compiler Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler
paper slides note
Yu Ji; Youhui Zhang; Wenguang Chen; Yuan Xie Tsinghua; UCSB
Inference, Dataflow, NoC MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects
paper note slides
Hyoukjun Kwon; Ananda Samajdar; Tushar Krishna Georgia Tech
Bayesian VIBNN: Hardware Acceleration of Bayesian Neural Networks
paper note
Ruizhe Cai; Ao Ren; Ning Liu; Syracuse University; USC


Tags - Title Authors Affiliations
Dataflow, 3D Integration Tetris: Scalable and Efficient Neural Network Acceleration with 3D Memory
paper note
Mingyu Gao; Jing Pu; Xuan Yang Stanford University
CNN; Algorithm-Architecture co-design SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
paper note
Ao Ren; Zhe Li; Caiwen Ding Syracuse University; USC; The City College of New York


Tags - Title Authors Affiliations
Tags - Title Authors Affiliations
Inference DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
paper note
Tianshi Chen; Zidong Du; Ninghui Sun CAS; Inria



Tags - Title Authors Affiliations
PIM/CIM; systolic Look-Up Table based Energy Efficient Processing in Cache Support for Neural Network Acceleration
paper note
Akshay Krishna Ramanathan1 The Pennsylvania State University ; Intel
PIM; cache; reconfigurable FReaC Cache: Folded-Logic Reconfigurable Computing in the Last Level Cache
paper note
Ashutosh Dhar University of Illinois; Urbana-Champaign; †IBM Research;
Bayesian; sparsity Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks
paper note
Qiyu Wan ECOMS Lab; University of Houston
low-bit Non-Blocking Simultaneous Multithreading: Embracing the Resiliency of Deep Neural Networks
paper note
Gil Shomron; Uri Weiser Faculty of Electrical Engineering; Technion — Israel Institute of Technology
compiler ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
paper note
Sheng-Chun Kao; Geonhwa Jeong; Tushar Krishna Georgia Institute of Technology
algorithm-architecture co-design; cross-module optimization VR-DANN: Real-Time Video Recognition via Decoder-Assisted Neural Network Acceleration
paper note
Zhuoran Song; Feiyang Wu; Xueyuan Liu1 Shanghai Jiao Tong University; Biren Research
PIM/CIM Newton: A DRAM-Maker's Accelerator-in-Memory (AiM) Architecture for Machine Learning
paper note
Mingxuan He Purdue University
Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks
paper note
Soroush Ghodrati ;Byung Hoon Ahn ;Joon Kyung Kim Bigstream Inc. ;Kansas University;University of Illinois Urbana-Champaign;NVIDIA Research;Google Inc.
training; sparsity Procrustes: A Dataflow and Accelerator for Sparse Deep Neural Network Training
paper note
Dingqing Yang; Amin Ghasemazar; Xiaowei Ren The University of British Columbia; Microsoft Corporation
GPU; tensor core; compiler; bandwidth saving Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores
paper note
Hyeonjin Kim; Sungwoo Ahn; Yunho Oh Yonsei University; EcoCloud
algorithm-architecture co-design; compute-memory tradeoff DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture
paper note
Liu Liu UC Santa Barbara
inference; compression TFE: Energy-Efficient Transferred Filter-Based Engine to Compress and Accelerate Convolutional Neural Networks
paper note
Huiyu Mo; Leibo Liu; Wenjing Hu Tsinghua University;Intel
training; sparsity TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training
paper note
Mostafa Mahmoud; Isak Edo; Ali Hadi Zadeh University of Toronto;Cerebras Systems;Vector Institute
training; inference; sparsity; CPU SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs
paper note
Zhangxiaowen Gong; Houxiang Ji University of Illinois at Urbana-Champaign; Intel
NLP; sparsity; bandwidth saving GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
paper note
Ali Hadi Zadeh; Isak Edo; Omar Mohamed Awad University of Toronto
training; cross-module optimization TrainBox: An Extreme-Scale Neural Network Training Server Architecture by Systematically Balancing Operations
paper note
Pyeongsu Park; Heetaek Jeong; Jangwoo Kim Seoul National University


Tags - Title Authors Affiliations
compute-memory trade-off; Dataflow Wire-Aware Architecture and Dataflow for CNN Accelerators
paper note
Sumanth Gudaparthi; Surya Narayanan; Rajeev Balasubramonian ; Edouard Giacomin ; Hari Kambalasubramanyam; Pierre-Emmanuel Gaillardon Utah
security; compute-memory trade-off ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning
paper note
Shang-Tse Chen; Cory Cornelius; Jason Martin; Duen Horng Chau Georgia tech; intel
Inference; NoC; Cross-Module optimization Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture
paper note slides
Yakun Sophia Shao;Jason Clemons; Rangharajan Venkatesan; et. al. NVIDIA
compression; ISA; Cross-Module optimization ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions
paper note
Berkin Akin; Zeshan A. Chishti; Alaa R. Alameldeen Google; Intel
Algorithm-Architecture co-design Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating
paper note
Weizhe Hua; Yuan Zhou; Christopher De Sa; Cornell
Sparsity SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks
paper note
Ashish Gondimalla; Noah Chesnu; Noah Chesnu; Purdue
Power-optimization; Approximate; EDEN: Enabling Approximate DRAM for DNN Inference using Error-Resilient Neural Networks
paper note
Skanda Koppula; Lois Orosa; A. Giray Yağlıkçı; ETHZ
inference; CNN eCNN: a Block-Based and Highly-Parallel CNN Accelerator for Edge Inference
paper note
Chao-Tsung Huang; Yu-Chun Ding;Huan-Ching Wang; et. al. NTHU
Architecture-Physical co-design TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
paper note
Youngeun Kwon; Yunjae Lee; Minsoo Rhu KAIST
Architecture-Physical co-design; dataflow Understanding Reuse; Performance; and Hardware Cost of DNN Dataflows: A Data-Centric Approach
paper note
Hyoukjun Kwon; Prasanth Chatarasi; Michael Pellauer; Georgia Tech; NVIDIA
sparsity; inference; MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation
paper note
Lillian Pentecost, Marco Donato, Brandon Reagen; Harvard; Facebook
RNN; Special operation; Neuron-Level Fuzzy Memoization in RNNs
paper note
Franyell Silfa;Gem Dot; Jose-Maria Arnau; UPC
inference; Algorithm-Architecture co-design; Manna: An Accelerator for Memory-Augmented Neural Networks
paper note
Jacob R. Stevens; Ashish Ranjan; Dipankar Das; Purdue; Intel
PIM eAP: A Scalable and Efficient In-Memory Accelerator for Automata Processing
paper note
Elaheh Sadredini; Reza Rahimi; Vaibhav Verma; Virginia
Sparsity ExTensor: An Accelerator for Sparse Tensor Algebra
paper note
Kartik Hegde; Hadi Asghari-Moghaddam; Michael Pellauer UIUC; NVIDIA
Sparsity; Algorithm-Architecture co-design Efficient SpMV Operation for Large and Highly Sparse Matrices Using Scalable Multi-Way Merge Parallelization
paper note
Fazle Sadi; Joe Sweeney; Tze Meng Low; CMU
sparsity; Algorithm-Architecture co-design; compression Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs
paper note
Maohua Zhu; Tao Zhang; Tao Zhang; Yuan Xie UCSB; Alibaba
special operation; inference ASV: Accelerated Stereo Vision System
paper note codes1 codes2
Yu Feng; Paul Whatmough; Yuhao Zhu Rochester
Algorithm-Architecture co-design; special operation Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach
paper note
Mingyu Yan;Xing Hu; Shuangchen Li; UCSB; ICT


Tags - Title Authors Affiliations
Sparsity Cambricon-s: Addressing Irregularity in Sparse Neural Networks: A Cooperative Software/Hardware Approach
paper note
Xuda Zhou ; Zidong Du ; Qi Guo ; Shaoli Liu ; Chengsi Liu ; Chao Wang ; Xuehai Zhou ; Ling Li ; Tianshi Chen ; Yunji Chen USTC; CAS
Inference; CNN; spatial correlation Diffy: a Deja vu-Free Differential Deep Neural Network Accelerator
paper note
Mostafa Mahmoud ; Kevin Siu ; Andreas Moshovos University of Toronto
Distributed computing Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
paper note
Youngeun Kwon; Minsoo Rhu KAIST
RNN Towards Memory Friendly Long-Short Term Memory Networks(LSTMs) on Mobile GPUs
paper note
Xingyao Zhang; Chenhao Xie; Jing Wang; University of Houston; Capital Normal University
Training, distributed computing, compression A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks
paper note
Youjie Li; Jongse Park; Mohammad Alian; UIUC; THU; SJTU; Intel; UCSD
Inference, sparsity, compression PermDNN: Efficient Compressed Deep Neural Network Architecture with Permuted Diagonal Matrices
paper note
Chunhua Deng; Siyu Liao; Yi Xie; City University of New York; University of Minnesota; USC
Reinforcement Learning, algorithm-architecture co-design GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware
paper note
Ananda Samajdar; Parth Mannan; Kartikay Garg; Tushar Krishna Georgia Tech
Training, PIM Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach
paper note
Jiawen Liu; Hengyu Zhao; UCM; UCSD; UCSC
GAN, PIM LerGAN: A Zero-free; Low Data Movement and PIM-based GAN Architecture
paper note
Haiyu Mao; Mingcong Song; Tao Li; THU; University of Florida
Training, special operation, dataflow Multi-dimensional Parallel Training of Winograd Layer on Memory-centric Architecture
paper note
Byungchul Hong; Yeonju Ro; John Kim KAIST
PIM/CIM SCOPE: A Stochastic Computing Engine for DRAM-based In-situ Accelerator
paper note
Shuangchen Li; Alvin Oliver Glova; Xing Hu; UCSB; Samsung
Inference, algorithm-architecture co-design Morph: Flexible Acceleration for 3D CNN-based Video Understanding
paper note
Kartik Hegde; Rohit Agrawal; Yulun Yao; Christopher W Fletcher UIUC


Tags - Title Authors Affiliations
Bit-serial Bit-Pragmatic Deep Neural Network Computing
paper note
Jorge Albericio; Alberto Delmás; Patrick Judd; NVIDIA; University of Toronto
CNN, Special computing CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices
paper note
Caiwen Ding; Siyu Liao; Yanzhi Wang; Syracuse University; City University of New York; USC; California State University; Northeastern University
PIM DRISA: A DRAM-based Reconfigurable In-Situ Accelerator
paper note
Shuangchen Li; Dimin Niu; UCSB; Samsung
Distributed computing Scale-Out Acceleration for Machine Learning
paper note
Jongse Park; Hardik Sharma; Divya Mahajan; Georgia Tech; UCSD
DNN, Sparsity, Bandwidth saving DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission
paper note
Parker Hill; Animesh Jain; Mason Hill; Univ. of Michigan; Univ. of Nevada


Tags - Title Authors Affiliations
DNN, compiler, Dataflow From High-Level Deep Neural Models to FPGAs
paper note
Hardik Sharma; Jongse Park; Divya Mahajan; Georgia Institute of Technology; Intel
DNN, Runtime, training vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design
paper note
Minsoo Rhu; Natalia Gimelshei; Jason Clemons; NVIDIA
Bit-serial Stripes: Bit-Serial Deep Neural Network Computing
paper note
Patrick Judd; Jorge Albericio; Tayler Hetherington; University of Toronto; University of British Columbia
Sparsity Cambricon-X: An Accelerator for Sparse Neural Networks
paper note
Shijin Zhang; Zidong Du; Lei Zhang; Chinese Academy of Sciences
Neuromorphic, Spiking, programming model NEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints
paper note
Yu Ji; YouHui Zhang; ShuangChen Li; Tsinghua University; UCSB
Cross Module optimization Fused-Layer CNN Accelerators
paper note
Manoj Alwani; Han Chen; Michael Ferdman; Peter Milder Stony Brook University
power optimization, cross module optimization A Patch Memory System For Image Processing and Computer Vision
paper note
Jason Clemons; Chih-Chi Cheng; Iuri Frosio; Daniel Johnson; Stephen W. Keckler NVIDIA; Qualcomm
power optimization An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
paper note
Reza Yazdani; Albert Segura; Jose-Maria Arnau; Antonio Gonzalez Universitat Politecnica de Catalunya


Tags - Title Authors Affiliations
Inference, CNN DaDianNao: A Machine-Learning Supercomputer
paper note
Yunji Chen; Tao Luo; Shaoli Liu; CAS; Inria; Inner Mongolia University



Tags - Title Authors Affiliations
ReRam Deep Learning Acceleration with Neuron-to-Memory Transformation
Paper note
Mohsen Imani; Mohammad Samragh Razlighi; Yeseong Kim; UCSD
graph network HyGCN: A GCN Accelerator with Hybrid Architecture
Paper note
Mingyu Yan; Lei Deng; Xing Hu; ICT; UCSB
training; sparsity SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
Paper note Slides
Eric Qin; Ananda Samajdar; Hyoukjun Kwon; Georgia Tech
Programming model; DNN PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible NPUs
Paper note
Yujeong Choi; Minsoo Rhu KAIST
sparsity; compute-memory trade-off ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator
Paper note
Bahar Asgari; Ramyad Hadidi; Tushar Krishna; Georgia Tech
sparsity;Algorithm-Architecture co-design SpArch: Efficient Architecture for Sparse Matrix Multiplication
Paper note Project
Zhekai Zhang; Hanrui Wan; Song Han ; William J. Dally MIT; NVIDIA
Algorithm-Architecture co-design; Approximation A3: Accelerating Attention Mechanisms in Neural Networks with Approximation
Paper note
Tae Jun Ham; Sung Jun Jung; Seonghak Kim; SNU
training; Architecture-Physical co-design AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerator Arrays
Paper note
Linghao Song; Fan Chen; Youwei Zhuo; Duke; USC
Special operation, architecture-physical co-design PIXEL: Photonic Neural Network Accelerator
Paper note
Kyle Shiflett; Dylan Wright; Avinash Karanth; Ahmed Louri Ohio; George Washington
Capasule; PIM Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design
Paper note
Xingyao Zhang; Shuaiwen Leon Song; Chenhao Xie; Houston
Bandwidth saving Communication Lower Bound in Convolution Accelerators
Paper note
Xiaoming Chen; Yinhe Han; Yu Wang ICT; THU
Training, Distributed computing; algorithm-architecture co-design EFLOPS: Algorithm and System Co-design for a High Performance Distributed Training Platform
Paper note
Jianbo Dong; Zheng Cao; Tao Zhang; Alibaba
NoC; Experiences with ML-Driven Design: A NoC Case Study
Paper note
Jieming Yin; Subhash Sethumurugan; Yasuko Eckert; AMD
sparsity Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations
Paper note
Nitish Srivastava; Hanchen Jin; Shaden Smith; Cornell; Intel
algorithm-architecture co-design A Hybrid Systolic-Dataflow Architecture for Inductive Matrix Algorithms
Paper note
Jian Weng; Sihao Liu; Zhengrong Wang; UCLA
Reinforcement Learning; NoC; algorithm-architecture co-design A Deep Reinforcement Learning Framework for Architectural Exploration: A Routerless NoC Case Study
Paper note
Ting-Ru Lin; Drew Penney; Massoud Pedram; Lizhong Chen USC; OSU
power optimization Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices
Paper note
Jawad Haj-Yahya; Yanos Sazeides; Mohammed Alser; ETHZ; Cyprus; CMU


Tags - Title Authors Affiliations
training; compute-memory trade-off HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
paper note
Linghao Song; Jiachen Mao; Yiran Chen; Duke; USC
RNN; algorithm-architecture co-design E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs
paper note
Zhe Li; Caiwen Ding; Siyue Wang Syracuse University; Northeastern University; Florida International University; USC; University at Buffalo
CNN, Bit-serial, Sparsity Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks
paper note
Xiaowei Wang; Jiecao Yu; Charles Augustine; Michigan; Intel
cross-Module optimization Shortcut Mining: Exploiting Cross-layer Shortcut Reuse in DCNN Accelerators
paper note
Arash Azizimazreah; Lizhong Chen OSU
PIM/CIM, low-bit, binary NAND-Net: Minimizing Computational Complexity of In-Memory Processing for Binary Neural Networks
paper note
Hyeonuk Kim; Jaehyeong Sim; Yeongjae Choi; Lee-Sup Kim KAIST
Accuracy-Latency trade-off Kelp: QoS for Accelerators in Machine Learning Platforms
paper note
Haishan Zhu; David Lo; Liqun Cheng Microsoft; Google; UT Austin
inference Machine Learning at Facebook: Understanding Inference at the Edge
paper note
Carole-Jean Wu; David Brooks; Kevin Chen; Facebook
Architecture-Physical co-design The Accelerator Wall: Limits of Chip Specialization
paper note codes
Adi Fuchs; David Wentzlaff Princeton


Tags - Title Authors Affiliations
special operation; approximate Making Memristive Neural Network Accelerators Reliable
paper note
Ben Feinberg; Shibo Wang; Engin Ipek University of Rochester
Algorithm-Architecture co-design; GAN Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-based Deep Learning
Mingcong Song; Jiaqi Zhang; Huixiang Chen; Tao Li University of Florida
compression; sparsity Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks
paper note
Minsoo Rhu; Mike O'Connor; Niladrish Chatterjee; POSTECH; NVIDIA; UT-Austin
architecture-psychical co-design; inference In-situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems
paper note
Mingcong Song; Kan Zhong; Tao li; et.a. University of Florida; Chongqing University; Capital Normal University
Special operation; ReRam GraphR: Accelerating Graph Processing Using ReRAM
paper note
Linghao Song; Youwei Zhuo; Xuehai Qian Duke; USC;
pim; Special operation; datafow GraphP: Reducing Communication of PIM-based Graph Processing with Efficient Data Partition
paper note
Mingxing Zhang; Youwei Zhuo; Chao Wang; THU; USC; Stanford
Power optimization; PIM PM3: Power Modeling and Power Management for Processing-in-Memory
paper note
Chao Zhang; Tong Meng; Guangyu Sun PKU


Tags - Title Authors Affiliations
Inference, CNN, Dataflow FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks
paper note
Wenyan Lu; Guihai Yan; Jiajun Li; Chinese Academy of Sciences
Inference, ReRAM PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning
paper note
Linghao Song; Xuehai Qian; Hai Li; Yiran Chen University of Pittsburgh; University of Southern California
Training Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures
paper note
Mingcong Song; Yang Hu; Huixiang Chen; Tao Li University of Florida


Tags - Title Authors Affiliations
Programming model, training TABLA: A Unified Template-based Architecture for Accelerating Statistical Machine Learning
paper note
Divya Mahajan; Jongse Park; Emmanuel Amaro Georgia Institute of Technology
ReRam; Boltzmann Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning
paper note
Mahdi Nazm Bojnordi; Engin Ipek University of Rochester


