Comparative Energy Evaluation For Continuous Audio Processing: Neuromorphic Xylo Audio 3 Vs. Conventional Smartphone
This paper presents a comparative experimental evaluation of energy consumption for continuous audio processing on two platforms with distinct computational paradigms: the neuromorphic Xylo Audio 3 hardware and a Google Pixel 9 smartphone. To ensure methodological comparability, we used the same reference neural topology on both platforms, namely a Linear–LIF–Linear–LIF pipeline with multichannel temporal input derived from spike-inspired encoding. The protocol covers 107 audio tracks (approximately 4.85 hours of cumulative processing) and reports both aggregate and normalized metrics, including total energy, average and peak power, total inference time, energy per inference, and energy per computational activity. Results show a consistent energy advantage for the neuromorphic platform across all core indicators: lower total energy, lower average power, and lower energy cost per unit of work, without relevant degradation in temporal performance. In aggregate terms, energy metrics improve by up to two orders of magnitude relative to the general-purpose mobile platform. The sparsity analysis further supports coherence between neural dynamics and system-level efficiency, indicating that event-driven execution reduces non-informative computational activity during prolonged workloads. Beyond quantitative reporting, the paper discusses practical implications for designing always-on edge-audio applications, emphasizing energy autonomy, power predictability, and continuous-operation feasibility. From an applied perspective, the findings support dedicated neuromorphic architectures as a technically robust option for ultra-low-power audio inference workloads.
