garbage collection - How to pass output of one function to another in Spark -
i sending output of 1 function dataframe function.
val df1 = fun1 val df11 = df1.collect val df2 = df11.map(x =fun2( x,df3))
above 2 lines wriiten in main function. df1 large if collect on driver gives outof memory or gc issue. r ways send output of 1 function in spark?
spark can run data processing you. don't need intermediate collect step. should chain of transformations , add action @ end save resulting data out disk.
calling collect() useful debugging small results.
for example, this:
rdd.map(x => fun1(x)) .map(y => fun2(y)) .saveasobjectfile();
this article might helpful explain more this:
http://www.agildata.com/apache-spark-rdd-vs-dataframe-vs-dataset/
Comments
Post a Comment