Automating Test Class Refactoring For Setup Reuse: A Clustering-Based Approach
Code duplication is a pervasive issue in software maintenance, and automated test suites are particularly susceptible to it. As software systems grow in complexity, maintaining well-structured test code becomes essential for sustainable development. Excessive duplication among test cases not only hinders the addition of new tests but also degrades the overall quality of the test suite if not strictly managed. Reducing this duplication facilitates updates as requirements evolve and improves the maintainability of the testing infrastructure. This paper presents a comprehensive method that automatically reorganizes test classes to maximize code reuse. We employ similarity metrics to quantify the resemblance between test cases and utilize hierarchical clustering algorithms to group tests with shared setups. These groups are then refactored using the implicit setup strategy. We also present a supporting tool that implements this pipeline for Java projects. An experiment conducted with 16 participants demonstrates that the tool effectively restructures test classes, reducing statement duplication by 30.2% and eliminating the general fixture test smell by isolating common configurations into cohesive units.
