python - How to keep track of status with multiprocessing and pool.map? -


i'm setting multiprocessing module first time, , basically, planning along lines of

from multiprocessing import pool pool = pool(processes=102) results = pool.map(whateverfunction, myiterable) print 1 

as understand it, 1 printed all processes have come , results complete. have status update on these. best way of implementing that?

i'm kind of hesitant of making whateverfunction() print. if there's around 200 values, i'm going have 'process done' printed 200 times, not useful.

i expect output like

10% of myiterable done 20% of myiterable done 

pool.map blocks until concurrent function calls have completed. pool.apply_async not block. moreover, use callback parameter report on progress. callback function, log_result, called once each time foo completes. passed value returned foo.

from __future__ import division import multiprocessing mp import time  def foo(x):     time.sleep(0.1)     return x  def log_result(retval):     results.append(retval)     if len(results) % (len(data)//10) == 0:         print('{:.0%} done'.format(len(results)/len(data)))  if __name__ == '__main__':     pool = mp.pool()     results = []     data = range(200)     item in data:         pool.apply_async(foo, args=[item], callback=log_result)     pool.close()     pool.join()     print(results) 

yields

10% done 20% done 30% done 40% done 50% done 60% done 70% done 80% done 90% done 100% done [0, 1, 2, 3, ..., 197, 198, 199] 

the log_result function above modifies global variable results , accesses global variable data. can not pass these variables log_result because callback function specified in pool.apply_async called 1 argument, return value of foo.

you can, however, make closure, @ least makes clear variables log_result depends on:

from __future__ import division import multiprocessing mp import time  def foo(x):     time.sleep(0.1)     return x  def make_log_result(results, len_data):     def log_result(retval):         results.append(retval)         if len(results) % (len_data//10) == 0:             print('{:.0%} done'.format(len(results)/len_data))     return log_result  if __name__ == '__main__':     pool = mp.pool()     results = []     data = range(200)     item in data:         pool.apply_async(foo, args=[item], callback=make_log_result(results, len(data)))     pool.close()     pool.join()     print(results) 

Comments

Popular posts from this blog

c++ - llvm function pass ReplaceInstWithInst malloc -

java.lang.NoClassDefFoundError When Creating New Android Project -

Decoding a Python 2 `tempfile` with python-future -