python - Read csv with tab delimiter produces errors -
i have csv file, uses '\t' tab delimiter. contains 5 columns. tried this:
import numpy np #b=np.loadtxt(r'train_set.csv',dtype=str,delimiter=' ') my_data = np.genfromtxt('train_set.csv', delimiter='\t') print my_data
but getting following errors:
traceback (most recent call last): file "./wordcloud.py", line 7, in <module> my_data = np.genfromtxt('train_set.csv', delimiter='\t') file "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 1667, in genfromtxt raise valueerror(errmsg) valueerror: errors detected ! line #14 (got 4 columns instead of 5) line #21 (got 4 columns instead of 5) line #135 (got 4 columns instead of 5)
any ideas please? not know python (yet :))!
the dataset (which examine right now) looks this:
edit:
if do:
my_data = np.genfromtxt('train_set.csv', delimiter=' ')
then getting no errors, output is:
[ nan nan nan ..., nan nan nan]
the answer gives these warnings:
... line #26310 (got 4 columns instead of 5) line #26383 (got 4 columns instead of 5) line #26448 (got 4 columns instead of 5) line #26489 (got 4 columns instead of 5) line #26589 (got 4 columns instead of 5) line #26593 (got 4 columns instead of 5) line #26888 (got 4 columns instead of 5) line #27002 (got 4 columns instead of 5) line #27065 (got 4 columns instead of 5) line #27234 (got 3 columns instead of 5) line #27327 (got 4 columns instead of 5) line #27418 (got 4 columns instead of 5) line #27594 (got 4 columns instead of 5) line #27827 (got 4 columns instead of 5) line #27944 (got 4 columns instead of 5) line #28074 (got 4 columns instead of 5) line #28102 (got 4 columns instead of 5) line #28147 (got 4 columns instead of 5) line #28224 (got 4 columns instead of 5) line #28264 (got 4 columns instead of 5) line #28344 (got 4 columns instead of 5) line #28484 (got 4 columns instead of 5) warnings.warn(errmsg, conversionwarning)
and output gets strange characters, like:
costing @ least \xc2\xa3429
in place of costing @ least £429
.
can check line #14, 21, , 135 of csv file? these lines not contain 5 columns, error states (all of them contains 4 columns).
if 5th column supposed blank, insert \t
character @ end.
looking @ data, want:
my_data = np.genfromtxt('train_set.csv', delimiter='\t', invalid_raise=false, skip_header=1, dtype=none)
invalid_raise
: skip invalid lines (#14, 21 , 135). please recheck them. (in libre office: use 'save as')
skip_header
: name explains itself.
dtype
: should none
, datatype of each column determined contents of column.
Comments
Post a Comment