fit_transform数据统一处理,求问什么时候需要?
在我同时没有进行fit_transform的情况下,准确率:
决策树弱分类器的准确率是0.7867
决策树分类器的准确率是0.7734
AdaBoost分类器的准确率是0.8161
在我对数据同时进行fit_transform的情况下,准确率:
决策树弱分类器的准确率是0.7867
决策树分类器的准确率是0.7745
AdaBoost分类器的准确率是0.8138
以下是第一种情况:
train_data['Embarked'] = train_data['Embarked'].map({'S':0, 'C':1, 'Q':2})
test_data['Embarked'] = test_data['Embarked'].map({'S':0, 'C':1, 'Q':2})
train_data['Sex'] = train_data['Sex'].map({'male':0, 'female':1})
test_data['Sex'] = test_data['Sex'].map({'male':0, 'female':1})
train_data['Age'].fillna(train_data['Age'].mean(), inplace=True)
test_data['Age'].fillna(test_data['Age'].mean(), inplace=True)
train_data['Fare'].fillna(train_data['Fare'].mean(), inplace=True)
test_data['Fare'].fillna(test_data['Fare'].mean(), inplace=True)
features = ['Pclass', 'Sex','Age','SibSp', 'Parch', 'Fare', 'Embarked']
train_features = train_data[features]
train_labels = train_data['Survived']
test_features = test_data[features]
#train_features = dvec.fit_transform(train_features.to_dict(orient='record'))
#test_features = dvec.transform(test_features.to_dict(orient='record'))
以下是第二种情况:
#train_data['Embarked'] = train_data['Embarked'].map({'S':0, 'C':1, 'Q':2})
#test_data['Embarked'] = test_data['Embarked'].map({'S':0, 'C':1, 'Q':2})
#train_data['Sex'] = train_data['Sex'].map({'male':0, 'female':1})
#test_data['Sex'] = test_data['Sex'].map({'male':0, 'female':1})
train_data['Age'].fillna(train_data['Age'].mean(), inplace=True)
test_data['Age'].fillna(test_data['Age'].mean(), inplace=True)
train_data['Fare'].fillna(train_data['Fare'].mean(), inplace=True)
test_data['Fare'].fillna(test_data['Fare'].mean(), inplace=True)
features = ['Pclass', 'Sex','Age','SibSp', 'Parch', 'Fare', 'Embarked']
train_features = train_data[features]
train_labels = train_data['Survived']
test_features = test_data[features]
train_features = dvec.fit_transform(train_features.to_dict(orient='record'))
test_features = dvec.transform(test_features.to_dict(orient='record'))
展开