hadoop - how to set spark RDD StorageLevel in hive on spark? -

- February 15, 2014

in hive on spark job , error :

org.apache.spark.shuffle.metadatafetchfailedexception: missing output location shuffle 0

thanks answer (why spark jobs fail org.apache.spark.shuffle.metadatafetchfailedexception: missing output location shuffle 0 in speculation mode?) , know may hiveonspark job has same problem

since hive translates sql hiveonspark job, don't how set in hive make hiveonspark job change storagelevel.memory_only storagelevel.memory_and_disk ?

thanks help~~~~

you can use cache/uncache [lazy] table <table_name> manage caching. more details.

if using dataframe's can use persist(...) specify storagelevel. @ api here..

in addition setting storage level, can optimize other things well. sparksql uses different caching mechanism called columnar storage more efficient way of caching data (as sparksql schema aware). there different set of config properties can tuned manage caching described in detail here (this latest version documentation. refer documentation of version using).

spark.sql.inmemorycolumnarstorage.compressed
spark.sql.inmemorycolumnarstorage.batchsize

Search This Blog

Erty

hadoop - how to set spark RDD StorageLevel in hive on spark? -

Comments

Post a Comment

Popular posts from this blog

C++: Boost interprocess memory mapped file error -

python - Selecting distinct values from a column in Peewee -

python - IO.UnsupportedOperation: Not Writable -