Comparison of The Effectiveness of Auto-Tuning Algorithms In Configuring Gpu Kernels
GPU kernels are computer programs which are commonly used today because of their high degree of parallelism. However the efficiency of GPU kernel's execution depends of the configuration utilized because it determines the utilization of the hardware resources. But the complexity of modern programs and hardware systems make manual configuration of programs unfeasible, so auto-tuning systems are used to automatically find efficient configurations. Several solutions have been proposed, with different application areas and respective outcomes. In this work we compare six auto-tuning approaches, Simulated Annealing, Basin Hopping, Differential Evolution, Particle Swarm, Genetic Algorithm and Bayesian Optimization, to configure GPU kernels. The workload used in our comparison is the CLBlast's kernel for matrix multiplication, allowing simple variation in the actual workload that had to be tuned. The tests involved the repetition of the GPU kernel execution to analyze how the tuning algorithms explored the configuration space. Our experiments showed that auto-tuning results depend in how many kernel executions are allowed to explore the configuration space. In particular, Particle Swarm worked better with more executions, while Simulated Annealing, Basin Hopping and Differential Evolution generated the best results for less executions of the kernel.
