python - Read csv with tab delimiter produces errors -


i have csv file, uses '\t' tab delimiter. contains 5 columns. tried this:

import numpy np  #b=np.loadtxt(r'train_set.csv',dtype=str,delimiter=' ') my_data = np.genfromtxt('train_set.csv', delimiter='\t') print my_data 

but getting following errors:

traceback (most recent call last):   file "./wordcloud.py", line 7, in <module>     my_data = np.genfromtxt('train_set.csv', delimiter='\t')   file "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 1667, in genfromtxt     raise valueerror(errmsg) valueerror: errors detected !     line #14 (got 4 columns instead of 5)     line #21 (got 4 columns instead of 5)     line #135 (got 4 columns instead of 5) 

any ideas please? not know python (yet :))!


the dataset (which examine right now) looks this:

enter image description here


edit:

if do:

my_data = np.genfromtxt('train_set.csv', delimiter='    ') 

then getting no errors, output is:

[ nan  nan  nan ...,  nan  nan  nan] 

the answer gives these warnings:

...     line #26310 (got 4 columns instead of 5)     line #26383 (got 4 columns instead of 5)     line #26448 (got 4 columns instead of 5)     line #26489 (got 4 columns instead of 5)     line #26589 (got 4 columns instead of 5)     line #26593 (got 4 columns instead of 5)     line #26888 (got 4 columns instead of 5)     line #27002 (got 4 columns instead of 5)     line #27065 (got 4 columns instead of 5)     line #27234 (got 3 columns instead of 5)     line #27327 (got 4 columns instead of 5)     line #27418 (got 4 columns instead of 5)     line #27594 (got 4 columns instead of 5)     line #27827 (got 4 columns instead of 5)     line #27944 (got 4 columns instead of 5)     line #28074 (got 4 columns instead of 5)     line #28102 (got 4 columns instead of 5)     line #28147 (got 4 columns instead of 5)     line #28224 (got 4 columns instead of 5)     line #28264 (got 4 columns instead of 5)     line #28344 (got 4 columns instead of 5)     line #28484 (got 4 columns instead of 5)   warnings.warn(errmsg, conversionwarning) 

and output gets strange characters, like:

costing @ least \xc2\xa3429 

in place of costing @ least £429.

can check line #14, 21, , 135 of csv file? these lines not contain 5 columns, error states (all of them contains 4 columns).

if 5th column supposed blank, insert \t character @ end.

looking @ data, want:

my_data = np.genfromtxt('train_set.csv', delimiter='\t',                         invalid_raise=false, skip_header=1,                         dtype=none) 

invalid_raise: skip invalid lines (#14, 21 , 135). please recheck them. (in libre office: use 'save as')

skip_header: name explains itself.

dtype: should none, datatype of each column determined contents of column.


Comments

Popular posts from this blog

c++ - llvm function pass ReplaceInstWithInst malloc -

Cross-Compiling Linux Kernel for Raspberry Pi - ${CCPREFIX}gcc -v does not work -

java.lang.NoClassDefFoundError When Creating New Android Project -