scikit learn - Do test data for machine learning need to have column names? -


suppose have training data below:

age:12   height:150   weight:100     gender:m age:15   height:145   weight:80      gender:f age:17   height:147   weight:110     gender:f age:11   height:144   weight:130     gender:m 

after train data , model, if need pass 1 test observation prediction, need send data column names below?

age: 13   height:142  weight :90   

i cases have seen people sending test data in array without column names. not sure how algorithms work.

note: using python scikit-learn , training data dataframe. not sure whether test data should in dataframe format

are predicting gender?

if so, yes. input records columns: age, height , weight.

otherwise, predicting on record missing gender value. keyerror if model not allow missing fields/columns.

i not sure whether test data should in dataframe format

in short: yes.

usually this:

# x input data, format depends on how model (pre)process data. # numeric matrix, list of dict's, list of strings, etc. x_train, x_test, y_train, y_test = train_test_split(x, y) # fit , validate. clf.fit(x_train, y_train) y_pred = clf.predict(x_test) 

so train , test data in same format, or @ least in compatible format (i.e.: pandas dataframe compatible list of dict's).


Comments

Popular posts from this blog

c++ - llvm function pass ReplaceInstWithInst malloc -

Cross-Compiling Linux Kernel for Raspberry Pi - ${CCPREFIX}gcc -v does not work -

java.lang.NoClassDefFoundError When Creating New Android Project -