A Structural Analysis of Intrusion Detection System Datasets and Their Practical Implications
Intrusion Detection Systems (IDS) rely heavily on the quality and representativeness of the datasets used for training and evaluation. However, many publicly available IDS datasets present significant challenges, such as extreme class imbalance, redundant records, and inconsistencies in feature representation. This paper presents an exploratory and comparative analysis of multiple IDS datasets, focusing on data quality, feature characterization, class distribution, and inherent limitations that may impact machine learning-based detection approaches. Our findings highlight critical issues that must be considered before applying these datasets in real-world IDS studies.
