garbage collection - How to pass output of one function to another in Spark -


i sending output of 1 function dataframe function.

val df1 = fun1 val df11 = df1.collect val df2 = df11.map(x =fun2( x,df3)) 

above 2 lines wriiten in main function. df1 large if collect on driver gives outof memory or gc issue. r ways send output of 1 function in spark?

spark can run data processing you. don't need intermediate collect step. should chain of transformations , add action @ end save resulting data out disk.

calling collect() useful debugging small results.

for example, this:

rdd.map(x => fun1(x))    .map(y => fun2(y))    .saveasobjectfile(); 

this article might helpful explain more this:

http://www.agildata.com/apache-spark-rdd-vs-dataframe-vs-dataset/


Comments

Popular posts from this blog

c - How to retrieve a variable from the Apache configuration inside the module? -

c# - Constructor arguments cannot be passed for interface mocks -

python - malformed header from script index.py Bad header -