python - Pivot Pandas Dataframe and calculate 'columns' parameter -

- May 15, 2014

say have following dataframe:

import pandas pd  df = pd.dataframe() df['id'] = [1, 1, 1, 2, 2] df['type'] = ['a', 'b', 'q', 'b', 'r'] df['status'] = [0, 0, 1, 0, 1] >>> df    id type  status 0   1          0 1   1    b       0 2   1    q       1 3   2    b       0 4   2    r       1 >>>

i want group dataframe 'id' , reshape have "type" variable , "status" variable each item within group. see below:

   type1 type2 type3  status1  status2  status3 id                                              1          b     q        0        0        1 2      b     r   nan        0        1      nan

the number of rows in output dataframe depend on max number of records in 1 group of ids.

i believe pivot function want use here. however, calls "columns" parameter believe should id of each item within each group. have clunky way of calculating this, appreciate advice on better way of doing this:

>>> g=df.groupby('id') >>> df['idingroup']=[item sublist in [range(1,len(i[1])+1) in g] item in sublist] >>> df    id type  status  idingroup 0   1          0          1 1   1    b       0          2 2   1    q       1          3 3   2    b       0          1 4   2    r       1          2 >>>

then can loop through 'type' , 'status' variables , pivot each , merge them together:

>>> listofvalues=[] >>> valuecol in ['type','status']: ...     f=df.pivot(index='id',columns='idingroup',values=valuecol) ...     f.columns=[valuecol+str(col) col in f.columns] ...     f.columns.name=none ...     listofvalues.append(f) ...  >>> pd.concat(listofvalues,1)    type1 type2 type3  status1  status2  status3 id                                              1          b     q        0        0        1 2      b     r   nan        0        1      nan >>>

am taking correct approach here? , if so, better way calculate "columns" parameter pivot function? (the id of each item within each group)

try this:

import pandas pd  df = pd.dataframe() df['id'] = [1, 1, 1, 2, 2] df['type'] = ['a', 'b', 'q', 'b', 'r'] df['status'] = [0, 0, 1, 0, 1]  g = df.groupby("id")[["type","status"]] df2 = g.apply(pd.dataframe.reset_index, drop=true).unstack()

here result:

   type         status              0  1    2      0  1   2 id                            1      b    q      0  0   1 2     b  r  nan      0  1 nan

the columns multiindex, if want flat it:

df2.columns = [l0 + str(l1+1) l0, l1 in df2.columns]

the output:

   type1 type2 type3  status1  status2  status3 id                                              1          b     q        0        0        1 2      b     r   nan        0        1      nan

Search This Blog

Erty

python - Pivot Pandas Dataframe and calculate 'columns' parameter -

Comments

Post a Comment

Popular posts from this blog

Cross-Compiling Linux Kernel for Raspberry Pi - ${CCPREFIX}gcc -v does not work -

c++ - llvm function pass ReplaceInstWithInst malloc -

python - malformed header from script index.py Bad header -