place-and-route physical design of the actual fabricated die that incorporates the CSC-AP, and Figure 5(b) shows the die (packaged) along with a U.S. quarter coin for a scaled comparison. Evaluation Figure 5(c) shows the evaluation setup to measure the characteristics of the chip given various voltage and frequency ranges. The setup includes a dual-tracking supply voltage to pow er up the chip with core (0.62-1 V) and I/O voltages (1.8 V), a VC707 evaluation board that sends test signals through a field-programmable gate array mezzanine card cable with a high pin count interfaced with the chip and monitors the response sent back from the chip. An oscilloscope and a multimeter are also used for the verification and precise measurements of signals. The peak performance and efficiency in a wellengineered accelerator with the number of MAC units at clock rate freq are calculated as Performance2 MAC Efficiency true true = ## = # freq Performancetrue Power . (12) PE Array Fast Router Based on the CSC Architecture IFMAP Memory ... PE 1 R R R Mem 1 PE 2 ... PE 1 R R R Mem 2 ... PE 16 ... R R R Mem 16 Partitioned Weights Memory Fast Router with Bidirectional Switches Partitioned OFMAP Memory Partitioned IFMAP Memory Shift and Truncate 8 b 8 b Accumulator 32 b 8 b ... Quantizer FIGURE 3: An accelerator hardware design to implement cyclic dilated matrix-vector multiplication configurable with the number of PEs and dataflow bit width, highlighting the array of PEs and IFMAP memory units interlinked with a high-bandwidth router. IFMAP: input feature map; Mem: memory; OFMAP: output feature map; PE: processing engine; R: router. IEEE SOLID-STATE CIRCUITS MAGAZINE FALL 2021 71 ... ... ... ... ... ... ... ... 1,024 Entries 512 Entries 512 Entries