John Hennessy and David Patterson 2017 ACM Turing Award Lecture : Among many interesting points: - Moore’s Law running out of steam: for past five years, only 3.5% Y/Y increase in SPECint scores - but 65K speedup possible for Python programs (e.g., matrix mult) using techniques/technologies available today. - pointing at the need for domain specific hardware approaches (back to the 1960s-1970s again!) - integrated hw/sw design will lead to a new “golden age” for research


Python folks already know that of course, and pretty much always call out to C libraries for the inner loops if they have heavy lifting to do. GPUs TensorCPUs and other deep learning inner loop acceleration hardware is widely used already, and at least a few people are looking for more such inner loops to turn into hardware.


