r - How to create a table from a dataframe -
the following dataframe have
col1<-c(1960,1960,1965,1986,1960 ,1969,1960,1993,1983,1924, 1960,1993,1960,1972 ,1960,1969) col2<-c ("a", "c","a","b", "a", "c", "b","a", "b","a", "b", "a", "c","c","a","a" ) mydata<-data.frame(col1,col2)
i want create two-way table calculate proportion each category (a, b , c) respectively before 1970 , after 1970 .
the desired output should be:
year b c before 1970 0.545 0.181 0.272 after 1970 0.4 0.4 0.2
any suggestion appreciated!
we can transform
dataset create column after 1970
, before 1970
values. can done first creating logical vector (col1 <= 1970
), adding 1 true
becomes 2 , false
1. use numeric index change values after 1970
, before 1970
. then, frequency of subset of columns ('col2' , 'col3') table
. finally, proportion row can got prop.table
margin
specified 1.
prop.table(table(transform(mydata, col3=c("after 1970", "before 1970")[(col1<=1970)+1l])[3:2]), 1) # col2 # col3 b c # after 1970 0.4000000 0.4000000 0.2000000 # before 1970 0.5454545 0.1818182 0.2727273
or option data.table
library(data.table) #convert 'data.frame' 'data.table' (`setdt(mydata)`) #create "year" column based on 'col1' values setdt(mydata)[col1<=1970, year:= "before 1970" ][is.na(year), year:= "after 1970"] #we can use `dcast` change long wide format dcast(mydata, year~col2, length)[, .sd/sum(unlist(.sd)) , year] # year b c #1: after 1970 0.4000000 0.4000000 0.2000000 #2: before 1970 0.5454545 0.1818182 0.2727273
or dplyr/tidyr
library(dplyr) library(tidyr) mydata %>% mutate(year= ifelse(col1 <= 1970, "before 1970", "after 1970")) %>% group_by(year) %>% mutate(n1=n()) %>% group_by(col2,n1, add=true) %>% tally() %>% ungroup() %>% mutate(n=n/n1) %>% select(-n1) %>% spread(col2,n)
Comments
Post a Comment