r - Processing a variable space delimited file limited into 2 columns -


for whatever reason data being provided in following format:

0001 text 0001 0002 has spaces in between 0003 yet supposed 2 columns 0009 why didn't comma delimit may ask? 0010 or use quotations? 001  knows 0012 i'm here file 0013 , hoping has elegant solution? 

so above supposed 2 columns. have column first entries, ie 0001,0002,0003,0009,0010,001,0012,0013 , column else.

i recommend input.file function "iotools" package.

usage like:

library(iotools) input.file("yourfile.txt", formatter = dstrsplit, nsep = " ", col_types = "character") 

here's example. (i've created dummy temporary file in workspace purpose of illustration).

x <- tempfile() writelines(c("0001 text 0001",              "0002 has spaces in between",              "0003 yet supposed 2 columns",              "0009 why didn't comma delimit may ask?",              "0010 or use quotations?",              "001  knows",              "0012 i'm here file",              "0013 , hoping has elegant solution?"), con = x)  library(iotools) input.file(x, formatter = dstrsplit, nsep = " ", col_types = "character") #   rowindex                                              v1 # 1     0001                           text 0001 # 2     0002                      has spaces in between # 3     0003     yet supposed 2 columns # 4     0009 why didn't comma delimit may ask? # 5     0010                         or use quotations? # 6      001                                       knows # 7     0012                  i'm here file # 8     0013     , hoping has elegant solution? 

elegant enough? ;-)


update 1

if you've read data in single-column data.frame (as in @jaap's answer), can still benefit extreme speed of "iotools" package using formatter directly, rather calling in input.file function.

in other words, use:

dstrsplit(as.character(mydf$v1), nsep = " ", col_types = "character") 

update 2

in case interested, benchmarked solutions proposed jaap, , akrun against "iotools" approach. can find results @ this gist. summary: whether dealing file on disk or column of file in memory, "iotoos" best performer. did not test tomtom's solution because require further processing in answer.


Comments

Popular posts from this blog

c - How to retrieve a variable from the Apache configuration inside the module? -

c# - Constructor arguments cannot be passed for interface mocks -

python - malformed header from script index.py Bad header -