MTH-IDS CAN数据集实现总结
MTH-IDS CAN数据集实现总结

MTH-IDS CAN数据集实现总结

数据集

数据集网址:CAR-HACKING DATASET

包含三种攻击,Dos攻击,模糊攻击,Spoofing Attack (RPM/gear)

数据集包含:Timestamp, CAN ID, DLC, DATA[0], DATA[1], DATA[2], DATA[3], DATA[4], DATA[5], DATA[6], DATA[7], Flag

  1. Timestamp:时间戳
  2. CAN ID:CAN 报文的 HEX 标识符 (ex. 043f)
  3. DLC:数据字节数,从 0 到 8
  4. DATA[0~7] :数据值(字节)
  5. 标志:T或R,T代表注入消息,R代表正常消息

数据集如下图所示

dataset

数据集处理部分

将数据集合并

将上述的四个文件合并。

import os
filepath = 'can_data/'
outpath = 'can_data/data_merge.csv'

allfile = os.listdir(filepath)
features = pd.DataFrame()
for file in allfile:
    # print(file.split('_')[0])
    feature = pd.read_csv(filepath + file, names=['Timestamp', 'ID', 'DLC', 'DATA[0]', 'DATA[1]', 'DATA[2]', 'DATA[3]', 'DATA[4]', 'DATA[5]', 'DATA[6]', 'DATA[7]', 'Label'])
    feature = feature.drop(feature[feature['DLC'] != 8].index)
    feature['ID'] = list(map(lambda x : int(x,16), feature['ID']))
    for i in range(8):
        feature['DATA[' + str(i) + ']'] = list(map(lambda x : int(x,16), feature['DATA[' + str(i) + ']']))
    feature['Label'][feature['Label'] == 'R'] = 'Norm' 
    feature['Label'][feature['Label'] == 'T'] = file.split('_')[0]
    features = features.append(feature)
features.to_csv(outpath, index=False)

删除 Timestamp

根据论文描述:

The features of this dataset include timestamp, CAN ID, data length code (DLC), and the 8-bit data field of CAN packets (DATA[0]-DATA[7]). Since the feature “timestamp” has a strong correlation with cyber-attack simulation periods and can lead to biased models and results, this feature was removed from the feature space.

翻译:这个数据集的特征包括时间戳、CAN ID、数据长度代码(DLC)和CAN数据包的8位数据字段(DATA[0]-DATA[7])。由于 "时间戳 "这一特征与网络攻击模拟期有很强的相关性,会导致模型和结果出现偏差,所以这一特征被从特征空间中删除。

因此删除Timestamp这一特征,代码如下

df = df.drop(['Timestamp'], axis=1)

程序修改部分

不使用SMOTE

根据论文

As the minority classes have large numbers of samples (at least 491,847 samples), SMOTE is not required for balancing the CAN-intrusion-dataset.

翻译:由于少数类有大量的样本(至少491,847个样本),SMOTE不需要用于平衡CAN-intrusion-dataset.

数据集大小如下图所示

CLASS LABEL AND SIZE OF THE CAN-INTRUSION-DATASET

因此注释下列代码

SMOTE

修改特征数

修改FCBF的特征数

因为去除了Timestamp后,该数据集一共只有8个特征,所以修改FCBF选取的特征数,修改代码如下。

from FCBF_module import FCBF, FCBFK, FCBFiP, get_i
fcbf = FCBFK(k = 8)

修改BO-TPE的特征数

因为一共就8个特征,所以修改max_features如下:

"max_features":hp.quniform('max_features', 1, 8, 1),

修改 RandomForestClassifier 的特征数

rf_hpo = RandomForestClassifier(n_estimators = 71, min_samples_leaf = 1, max_depth = 46, 
                                min_samples_split = 9,  max_features = 8, criterion = 'entropy')

修改 DecisionTreeClassifier 的特征数

dt_hpo = DecisionTreeClassifier(min_samples_leaf = 2, max_depth = 47, min_samples_split = 3, 
                                max_features = 8, criterion = 'gini')

修改 ExtraTreesClassifier 的特征数

et_hpo = ExtraTreesClassifier(n_estimators = 53, min_samples_leaf = 1, max_depth = 31, 
                              min_samples_split = 5, max_features = 8, criterion = 'entropy')

参考文献

  1. Song H M, Woo J, Kim H K. In-vehicle network intrusion detection using deep convolutional neural network[J]. Vehicular Communications, 2020, 21: 100198.
  2. Seo E, Song H M, Kim H K. GIDS: GAN based intrusion detection system for in-vehicle network[C]//2018 16th Annual Conference on Privacy, Security and Trust (PST). IEEE, 2018: 1-6.
  3. Yang L, Moubayed A, Shami A. MTH-IDS: A Multitiered Hybrid Intrusion Detection System for Internet of Vehicles[J]. IEEE Internet of Things Journal, 2021, 9(1): 616-632.

发表回复

您的电子邮箱地址不会被公开。