A new computing approach called "simultaneous and heterogeneous multithreading (SHMT)" could potentially double the processing speed of devices like phones and laptops without replacing any components. This method allows different processing units to work on the same code region simultaneously, resulting in faster performance and 51% less energy consumption in tests. The approach could also reduce hardware costs, carbon emissions, and water usage in data centers, but further research is needed to determine practical implementation and specific use cases.
Researchers from MIT and NVIDIA have developed two techniques to accelerate the processing of sparse tensors, a type of data structure used in high-performance computing tasks. The first technique, called HighLight, allows hardware accelerators to efficiently find nonzero values for a wider variety of sparsity patterns, resulting in better energy efficiency. The second technique, called Tailors and Swiftiles, maximizes the utilization of on-chip memory by effectively "overbooking" the tile size, reducing off-chip memory traffic and improving processing speed. Both methods enhance the performance and energy efficiency of hardware accelerators used in massive AI models.