hadoop - how to set spark RDD StorageLevel in hive on spark? -


in hive on spark job , error :

org.apache.spark.shuffle.metadatafetchfailedexception: missing output location shuffle 0

thanks answer (why spark jobs fail org.apache.spark.shuffle.metadatafetchfailedexception: missing output location shuffle 0 in speculation mode?) , know may hiveonspark job has same problem

since hive translates sql hiveonspark job, don't how set in hive make hiveonspark job change storagelevel.memory_only storagelevel.memory_and_disk ?

thanks help~~~~

you can use cache/uncache [lazy] table <table_name> manage caching. more details.

if using dataframe's can use persist(...) specify storagelevel. @ api here..

in addition setting storage level, can optimize other things well. sparksql uses different caching mechanism called columnar storage more efficient way of caching data (as sparksql schema aware). there different set of config properties can tuned manage caching described in detail here (this latest version documentation. refer documentation of version using).

  • spark.sql.inmemorycolumnarstorage.compressed
  • spark.sql.inmemorycolumnarstorage.batchsize

Comments

Popular posts from this blog

c++ - llvm function pass ReplaceInstWithInst malloc -

java.lang.NoClassDefFoundError When Creating New Android Project -

Decoding a Python 2 `tempfile` with python-future -