数据集
数据集网址:CAR-HACKING DATASET
包含三种攻击,Dos攻击,模糊攻击,Spoofing Attack (RPM/gear)
数据集包含:Timestamp, CAN ID, DLC, DATA[0], DATA[1], DATA[2], DATA[3], DATA[4], DATA[5], DATA[6], DATA[7], Flag
- Timestamp:时间戳
- CAN ID:CAN 报文的 HEX 标识符 (ex. 043f)
- DLC:数据字节数,从 0 到 8
- DATA[0~7] :数据值(字节)
- 标志:T或R,T代表注入消息,R代表正常消息
数据集如下图所示
数据集处理部分
将数据集合并
将上述的四个文件合并。
import os
filepath = 'can_data/'
outpath = 'can_data/data_merge.csv'
allfile = os.listdir(filepath)
features = pd.DataFrame()
for file in allfile:
# print(file.split('_')[0])
feature = pd.read_csv(filepath + file, names=['Timestamp', 'ID', 'DLC', 'DATA[0]', 'DATA[1]', 'DATA[2]', 'DATA[3]', 'DATA[4]', 'DATA[5]', 'DATA[6]', 'DATA[7]', 'Label'])
feature = feature.drop(feature[feature['DLC'] != 8].index)
feature['ID'] = list(map(lambda x : int(x,16), feature['ID']))
for i in range(8):
feature['DATA[' + str(i) + ']'] = list(map(lambda x : int(x,16), feature['DATA[' + str(i) + ']']))
feature['Label'][feature['Label'] == 'R'] = 'Norm'
feature['Label'][feature['Label'] == 'T'] = file.split('_')[0]
features = features.append(feature)
features.to_csv(outpath, index=False)
删除 Timestamp
根据论文描述:
The features of this dataset include timestamp, CAN ID, data length code (DLC), and the 8-bit data field of CAN packets (DATA[0]-DATA[7]). Since the feature “timestamp” has a strong correlation with cyber-attack simulation periods and can lead to biased models and results, this feature was removed from the feature space.
翻译:这个数据集的特征包括时间戳、CAN ID、数据长度代码(DLC)和CAN数据包的8位数据字段(DATA[0]-DATA[7])。由于 "时间戳 "这一特征与网络攻击模拟期有很强的相关性,会导致模型和结果出现偏差,所以这一特征被从特征空间中删除。
因此删除Timestamp这一特征,代码如下
df = df.drop(['Timestamp'], axis=1)
程序修改部分
不使用SMOTE
根据论文
As the minority classes have large numbers of samples (at least 491,847 samples), SMOTE is not required for balancing the CAN-intrusion-dataset.
翻译:由于少数类有大量的样本(至少491,847个样本),SMOTE不需要用于平衡CAN-intrusion-dataset.
数据集大小如下图所示
因此注释下列代码
修改特征数
修改FCBF的特征数
因为去除了Timestamp后,该数据集一共只有8个特征,所以修改FCBF选取的特征数,修改代码如下。
from FCBF_module import FCBF, FCBFK, FCBFiP, get_i
fcbf = FCBFK(k = 8)
修改BO-TPE的特征数
因为一共就8个特征,所以修改max_features如下:
"max_features":hp.quniform('max_features', 1, 8, 1),
修改 RandomForestClassifier 的特征数
rf_hpo = RandomForestClassifier(n_estimators = 71, min_samples_leaf = 1, max_depth = 46,
min_samples_split = 9, max_features = 8, criterion = 'entropy')
修改 DecisionTreeClassifier 的特征数
dt_hpo = DecisionTreeClassifier(min_samples_leaf = 2, max_depth = 47, min_samples_split = 3,
max_features = 8, criterion = 'gini')
修改 ExtraTreesClassifier 的特征数
et_hpo = ExtraTreesClassifier(n_estimators = 53, min_samples_leaf = 1, max_depth = 31,
min_samples_split = 5, max_features = 8, criterion = 'entropy')
参考文献
- Song H M, Woo J, Kim H K. In-vehicle network intrusion detection using deep convolutional neural network[J]. Vehicular Communications, 2020, 21: 100198.
- Seo E, Song H M, Kim H K. GIDS: GAN based intrusion detection system for in-vehicle network[C]//2018 16th Annual Conference on Privacy, Security and Trust (PST). IEEE, 2018: 1-6.
- Yang L, Moubayed A, Shami A. MTH-IDS: A Multitiered Hybrid Intrusion Detection System for Internet of Vehicles[J]. IEEE Internet of Things Journal, 2021, 9(1): 616-632.