r - Processing a variable space delimited file limited into 2 columns -

for whatever reason data being provided in following format:

0001 text 0001 0002 has spaces in between 0003 yet supposed 2 columns 0009 why didn't comma delimit may ask? 0010 or use quotations? 001  knows 0012 i'm here file 0013 , hoping has elegant solution? 

so above supposed 2 columns. have column first entries, ie 0001,0002,0003,0009,0010,001,0012,0013 , column else.

i recommend input.file function "iotools" package.

usage like:

library(iotools) input.file("yourfile.txt", formatter = dstrsplit, nsep = " ", col_types = "character") 

here's example. (i've created dummy temporary file in workspace purpose of illustration).

x <- tempfile() writelines(c("0001 text 0001",              "0002 has spaces in between",              "0003 yet supposed 2 columns",              "0009 why didn't comma delimit may ask?",              "0010 or use quotations?",              "001  knows",              "0012 i'm here file",              "0013 , hoping has elegant solution?"), con = x)  library(iotools) input.file(x, formatter = dstrsplit, nsep = " ", col_types = "character") #   rowindex                                              v1 # 1     0001                           text 0001 # 2     0002                      has spaces in between # 3     0003     yet supposed 2 columns # 4     0009 why didn't comma delimit may ask? # 5     0010                         or use quotations? # 6      001                                       knows # 7     0012                  i'm here file # 8     0013     , hoping has elegant solution? 

elegant enough? ;-)

update 1

if you've read data in single-column data.frame (as in @jaap's answer), can still benefit extreme speed of "iotools" package using formatter directly, rather calling in input.file function.

in other words, use:

dstrsplit(as.character(mydf$v1), nsep = " ", col_types = "character") 

update 2

in case interested, benchmarked solutions proposed jaap, , akrun against "iotools" approach. can find results @ this gist. summary: whether dealing file on disk or column of file in memory, "iotoos" best performer. did not test tomtom's solution because require further processing in answer.


Popular posts from this blog

c++ - llvm function pass ReplaceInstWithInst malloc -

Cross-Compiling Linux Kernel for Raspberry Pi - ${CCPREFIX}gcc -v does not work -

java.lang.NoClassDefFoundError When Creating New Android Project -