r - Processing a variable space delimited file limited into 2 columns -
for whatever reason data being provided in following format:
0001 text 0001 0002 has spaces in between 0003 yet supposed 2 columns 0009 why didn't comma delimit may ask? 0010 or use quotations? 001 knows 0012 i'm here file 0013 , hoping has elegant solution?
so above supposed 2 columns. have column first entries, ie 0001,0002,0003,0009,0010,001,0012,0013
, column else.
i recommend input.file
function "iotools" package.
usage like:
library(iotools) input.file("yourfile.txt", formatter = dstrsplit, nsep = " ", col_types = "character")
here's example. (i've created dummy temporary file in workspace purpose of illustration).
x <- tempfile() writelines(c("0001 text 0001", "0002 has spaces in between", "0003 yet supposed 2 columns", "0009 why didn't comma delimit may ask?", "0010 or use quotations?", "001 knows", "0012 i'm here file", "0013 , hoping has elegant solution?"), con = x) library(iotools) input.file(x, formatter = dstrsplit, nsep = " ", col_types = "character") # rowindex v1 # 1 0001 text 0001 # 2 0002 has spaces in between # 3 0003 yet supposed 2 columns # 4 0009 why didn't comma delimit may ask? # 5 0010 or use quotations? # 6 001 knows # 7 0012 i'm here file # 8 0013 , hoping has elegant solution?
elegant enough? ;-)
update 1
if you've read data in single-column data.frame
(as in @jaap's answer), can still benefit extreme speed of "iotools" package using formatter directly, rather calling in input.file
function.
in other words, use:
dstrsplit(as.character(mydf$v1), nsep = " ", col_types = "character")
update 2
in case interested, benchmarked solutions proposed jaap, , akrun against "iotools" approach. can find results @ this gist. summary: whether dealing file on disk or column of file in memory, "iotoos" best performer. did not test tomtom's solution because require further processing in answer.
Comments
Post a Comment