python - Pivot Pandas Dataframe and calculate 'columns' parameter -


say have following dataframe:

import pandas pd  df = pd.dataframe() df['id'] = [1, 1, 1, 2, 2] df['type'] = ['a', 'b', 'q', 'b', 'r'] df['status'] = [0, 0, 1, 0, 1] >>> df    id type  status 0   1          0 1   1    b       0 2   1    q       1 3   2    b       0 4   2    r       1 >>>  

i want group dataframe 'id' , reshape have "type" variable , "status" variable each item within group. see below:

   type1 type2 type3  status1  status2  status3 id                                              1          b     q        0        0        1 2      b     r   nan        0        1      nan 

the number of rows in output dataframe depend on max number of records in 1 group of ids.

i believe pivot function want use here. however, calls "columns" parameter believe should id of each item within each group. have clunky way of calculating this, appreciate advice on better way of doing this:

>>> g=df.groupby('id') >>> df['idingroup']=[item sublist in [range(1,len(i[1])+1) in g] item in sublist] >>> df    id type  status  idingroup 0   1          0          1 1   1    b       0          2 2   1    q       1          3 3   2    b       0          1 4   2    r       1          2 >>>  

then can loop through 'type' , 'status' variables , pivot each , merge them together:

>>> listofvalues=[] >>> valuecol in ['type','status']: ...     f=df.pivot(index='id',columns='idingroup',values=valuecol) ...     f.columns=[valuecol+str(col) col in f.columns] ...     f.columns.name=none ...     listofvalues.append(f) ...  >>> pd.concat(listofvalues,1)    type1 type2 type3  status1  status2  status3 id                                              1          b     q        0        0        1 2      b     r   nan        0        1      nan >>>  

am taking correct approach here? , if so, better way calculate "columns" parameter pivot function? (the id of each item within each group)

try this:

import pandas pd  df = pd.dataframe() df['id'] = [1, 1, 1, 2, 2] df['type'] = ['a', 'b', 'q', 'b', 'r'] df['status'] = [0, 0, 1, 0, 1]  g = df.groupby("id")[["type","status"]] df2 = g.apply(pd.dataframe.reset_index, drop=true).unstack() 

here result:

   type         status              0  1    2      0  1   2 id                            1      b    q      0  0   1 2     b  r  nan      0  1 nan 

the columns multiindex, if want flat it:

df2.columns = [l0 + str(l1+1) l0, l1 in df2.columns] 

the output:

   type1 type2 type3  status1  status2  status3 id                                              1          b     q        0        0        1 2      b     r   nan        0        1      nan 

Comments

Popular posts from this blog

c++ - llvm function pass ReplaceInstWithInst malloc -

Cross-Compiling Linux Kernel for Raspberry Pi - ${CCPREFIX}gcc -v does not work -

java.lang.NoClassDefFoundError When Creating New Android Project -