python - How to keep track of status with multiprocessing and pool.map? -
i'm setting multiprocessing module first time, , basically, planning along lines of
from multiprocessing import pool pool = pool(processes=102) results = pool.map(whateverfunction, myiterable) print 1
as understand it, 1
printed all processes have come , results complete. have status update on these. best way of implementing that?
i'm kind of hesitant of making whateverfunction()
print. if there's around 200 values, i'm going have 'process done' printed 200 times, not useful.
i expect output like
10% of myiterable done 20% of myiterable done
pool.map
blocks until concurrent function calls have completed. pool.apply_async
not block. moreover, use callback
parameter report on progress. callback function, log_result
, called once each time foo
completes. passed value returned foo
.
from __future__ import division import multiprocessing mp import time def foo(x): time.sleep(0.1) return x def log_result(retval): results.append(retval) if len(results) % (len(data)//10) == 0: print('{:.0%} done'.format(len(results)/len(data))) if __name__ == '__main__': pool = mp.pool() results = [] data = range(200) item in data: pool.apply_async(foo, args=[item], callback=log_result) pool.close() pool.join() print(results)
yields
10% done 20% done 30% done 40% done 50% done 60% done 70% done 80% done 90% done 100% done [0, 1, 2, 3, ..., 197, 198, 199]
the log_result
function above modifies global variable results
, accesses global variable data
. can not pass these variables log_result
because callback function specified in pool.apply_async
called 1 argument, return value of foo
.
you can, however, make closure, @ least makes clear variables log_result
depends on:
from __future__ import division import multiprocessing mp import time def foo(x): time.sleep(0.1) return x def make_log_result(results, len_data): def log_result(retval): results.append(retval) if len(results) % (len_data//10) == 0: print('{:.0%} done'.format(len(results)/len_data)) return log_result if __name__ == '__main__': pool = mp.pool() results = [] data = range(200) item in data: pool.apply_async(foo, args=[item], callback=make_log_result(results, len(data))) pool.close() pool.join() print(results)
Comments
Post a Comment