DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: This paper describes a novel framework, called InTensLi (“intensely”), for producing fast single-node implementations of dense tensor-times-matrix multiply (Ttm) of arbitrary dimension. Whereas conventional implementations of Ttm rely on explicitly converting the input tensor operand into a matrix—in order to be able to use any available and fast general matrix-matrix multiply (Gemm) implementation— our framework’s strategy is to carry out the Ttm in-place, avoiding this copy. As the resulting implementations expose tuning parameters, this paper also describes a heuristic em- pirical model for selecting an optimal configuration based on the Ttm’s inputs. When compared to widely used Ttm implementations that are available in the Tensor Tool- box and Cyclops Tensor Framework (Ctf), InTensLi’s in-place and input-adaptive Ttm implementations achieve 4X and 13X speedups, showing Gemm-like performance on a variety of input sizes.
SUMMARY:An Input-Adaptive and In-Place Approach to Dense Tensor-Times-Matrix Multiply
