DAPL errors on Azure RDMA-enabled SLES cluster -
i set 2 azure a8 vms in availability set running sles-hpc 12 (following tutorial here: https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-cluster-rdma/).
when run intel mpi pingpong test, getting dapl errors:
azureuser@sshvm0:~> /opt/intel/impi/5.0.3.048/bin64/mpirun -hosts 10.0.0.4,10.0.0.5 -ppn 1 -n 2 -env i_mpi_fabrics=shm:dapl -env i_mpi_dynamic_connection=0 -env i_mpi_dapl_provider=ofa-v2-ib0 /opt/intel/impi/5.0.3.048/bin64/imb-mpi1 pingpong sshvm1:d28:bef0eb40: 12930 us(12930 us): dapl_rdma_accept: err -1 input/output error sshvm1:d28:bef0eb40: 12946 us(16 us): dapl err accept input/output error [1:10.0.0.5][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:622] error(0x40000): ofa-v2-ib0: not accept dapl connection request: dat_internal_error() assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c @ line 622: 0 internal abort - process 0
similar errors when running 1 of osu mpi microbenchmarks (compiled impi compiler):
azureuser@sshvm0:~> /opt/intel/impi/5.0.3.048/bin64/mpirun -hosts 10.0.0.4,10.0.0.5 -ppn 1 -n 2 -env i_mpi_fabrics=shm:dapl -env i_mpi_dynamic_connection=0 -env i_mpi_dapl_provider=ofa-v2-ib0 /opt/intel/impi/5.0.3.048/bin64/imb-mpi1 pingpong sshvm1:d28:bef0eb40: 12930 us(12930 us): dapl_rdma_accept: err -1 input/output error sshvm1:d28:bef0eb40: 12946 us(16 us): dapl err accept input/output error [1:10.0.0.5][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:622] error(0x40000): ofa-v2-ib0: not accept dapl connection request: dat_internal_error() assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c @ line 622: 0 internal abort - process 0
what cause of these errors? how fix , run these microbenchmarks? help!
i did verify ssh connectivity between 2 nodes running "mpiexec -machinefile machinefile -n 2 hostname"
you need update rdma drivers. have updated documentation follow link below https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-cluster-rdma/
please go section update linux rdma drivers sles 12
please follow instructions , update rdma drivers. please update drivers if have provisioned vm's in 1 of following regions east north central south central north europe
Comments
Post a Comment