Genai As A Testing Ally: Increasing Confidence In Production Codebases
Can Large Language Models (LLMs) effectively generate unit tests for industrial software? This paper presents an empirical study applying LLM-based test generation to production codebases of a SaaS platform, reaching coverage of up to 100%. Unlike prior studies relying on academic benchmarks, our evaluation targets diverse industrial codebases written in Go, JavaScript/TypeScript, and Python. We systematically compare three prompt engineering strategies with a complementary reinforcement approach across four LLM models, finding that all models achieve similar effectiveness when guided by well-designed prompts. This work provides the first multi-language, multi-model empirical evaluation of LLM-based testing on industrial production codebases, alongside practical guidelines for prompt design and model selection.
