Comparative Evaluation of Structural and Appearance-Based Gait Representations Under Viewpoint and Appearance Variations
Gait recognition remains challenging under viewpoint changes and appearance-related covariates such as clothing variation and carried objects. While silhouette-based representations perform strongly in controlled conditions, their robustness across mismatched scenarios is limited. This work evaluates three gait representations on a multi-view benchmark: the Gait Energy Image and two pose-derived motion templates encoding normalized joint trajectories. Sixteen training--testing configurations were defined to assess cross-view performance and generalization under clothing and carrying variations. All experiments used an identical convolutional neural network to isolate representation effects. Results show that the silhouette-based representation achieves the highest accuracy in matched conditions, especially at lateral viewpoints, whereas pose-derived templates exhibit smoother degradation under appearance shifts and moderate view changes. Diversified training improves robustness for all methods but does not remove intrinsic representation differences. The findings highlight a trade-off between discriminative strength and covariate resilience, supporting the complementary role of structural motion representations in realistic gait recognition systems.
