scikit learn - Do test data for machine learning need to have column names? -
suppose have training data below:
age:12 height:150 weight:100 gender:m age:15 height:145 weight:80 gender:f age:17 height:147 weight:110 gender:f age:11 height:144 weight:130 gender:m
after train data , model, if need pass 1 test observation prediction, need send data column names below?
age: 13 height:142 weight :90
i cases have seen people sending test data in array without column names. not sure how algorithms work.
note: using python scikit-learn , training data dataframe. not sure whether test data should in dataframe format
are predicting gender?
if so, yes. input records columns: age
, height
, weight
.
otherwise, predicting on record missing gender
value. keyerror
if model not allow missing fields/columns.
i not sure whether test data should in dataframe format
in short: yes.
usually this:
# x input data, format depends on how model (pre)process data. # numeric matrix, list of dict's, list of strings, etc. x_train, x_test, y_train, y_test = train_test_split(x, y) # fit , validate. clf.fit(x_train, y_train) y_pred = clf.predict(x_test)
so train , test data in same format, or @ least in compatible format (i.e.: pandas dataframe compatible list
of dict
's).
Comments
Post a Comment