DAOS Command Fails with "Transport layer mercury error" on CentOS 7.9
-
Hello, I'm encountering an issue when running the daos cont create command on my DAOS setup. The command fails with a "Transport layer mercury error." Below are the details of the error and my setup: Command and Error Message: [root@client2 ~]# daos cont create tank --label mycont external ERR # [5323.920594] mercury->msg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/na/na_ofi.c:3047 # na_ofi_msg_send(): fi_tsend() failed, rc: -2 (No such file or directory) external ERR # [5323.921055] mercury->hg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/mercury_core.c:2727 # hg_core_forward_na(): Could not post send for input buffer (NA_NOENTRY) hg ERR src/cart/crt_hg.c:1104 crt_hg_req_send_cb(0x2a5c8c0) [opc=0x1020004 (DAOS) rpcid=0x18ea69b600000000 rank:tag=0:0] RPC failed; rc: DER_HG(-1020): 'Transport layer mercury error' mgmt ERR src/mgmt/cli_mgmt.c:882 dc_mgmt_pool_find() tank: failed to get PS replicas from 1 servers, DER_HG(-1020): 'Transport layer mercury error' pool ERR src/pool/cli.c:198 dc_pool_choose_svc_rank() 00000000:tank: dc_mgmt_pool_find() failed, DER_HG(-1020): 'Transport layer mercury error' pool ERR src/pool/cli.c:503 dc_pool_connect_internal() 00000000:tank: cannot find pool service: DER_HG(-1020): 'Transport layer mercury error' ERROR: daos: DER_HG(-1020): Transport layer mercury error Environment Details: DAOS Version: daos-2.0.3-5.el7.x86_64 DAOS Client Version: daos-client-2.0.3-5.el7.x86_64 Libfabric Version: libfabric-1.15.1-1.el7.x86_64 Mercury Version: mercury-2.1.0~rc4-9.el7.x86_64 CentOS Version: CentOS 7.9 Fabric Interface: enp0s3 Additional Information: [root@server ~]# ip addr 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp0s3: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:bd:95:c2 brd ff:ff:ff:ff:ff:ff inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3 valid_lft 564sec preferred_lft 564sec inet6 fe80::e25:a2fd:9904:a8ac/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: enp0s8: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:bb:cb:4d brd ff:ff:ff:ff:ff:ff inet 192.168.56.104/24 brd 192.168.56.255 scope global no
-
Hello, I'm encountering an issue when running the daos cont create command on my DAOS setup. The command fails with a "Transport layer mercury error." Below are the details of the error and my setup: Command and Error Message: [root@client2 ~]# daos cont create tank --label mycont external ERR # [5323.920594] mercury->msg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/na/na_ofi.c:3047 # na_ofi_msg_send(): fi_tsend() failed, rc: -2 (No such file or directory) external ERR # [5323.921055] mercury->hg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/mercury_core.c:2727 # hg_core_forward_na(): Could not post send for input buffer (NA_NOENTRY) hg ERR src/cart/crt_hg.c:1104 crt_hg_req_send_cb(0x2a5c8c0) [opc=0x1020004 (DAOS) rpcid=0x18ea69b600000000 rank:tag=0:0] RPC failed; rc: DER_HG(-1020): 'Transport layer mercury error' mgmt ERR src/mgmt/cli_mgmt.c:882 dc_mgmt_pool_find() tank: failed to get PS replicas from 1 servers, DER_HG(-1020): 'Transport layer mercury error' pool ERR src/pool/cli.c:198 dc_pool_choose_svc_rank() 00000000:tank: dc_mgmt_pool_find() failed, DER_HG(-1020): 'Transport layer mercury error' pool ERR src/pool/cli.c:503 dc_pool_connect_internal() 00000000:tank: cannot find pool service: DER_HG(-1020): 'Transport layer mercury error' ERROR: daos: DER_HG(-1020): Transport layer mercury error Environment Details: DAOS Version: daos-2.0.3-5.el7.x86_64 DAOS Client Version: daos-client-2.0.3-5.el7.x86_64 Libfabric Version: libfabric-1.15.1-1.el7.x86_64 Mercury Version: mercury-2.1.0~rc4-9.el7.x86_64 CentOS Version: CentOS 7.9 Fabric Interface: enp0s3 Additional Information: [root@server ~]# ip addr 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp0s3: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:bd:95:c2 brd ff:ff:ff:ff:ff:ff inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3 valid_lft 564sec preferred_lft 564sec inet6 fe80::e25:a2fd:9904:a8ac/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: enp0s8: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:bb:cb:4d brd ff:ff:ff:ff:ff:ff inet 192.168.56.104/24 brd 192.168.56.255 scope global no
I don't know anything about DAOS so I can't comment on that. Have you had this working in the past and it stopped working, or are you trying to get it running now? Either way, CentOS 7 was initially released in 2014, and goes EOL at the end of the month. What you *might* be seeing is that C7 doesn't support the infrastructure needed for DAOS. This would probably be better answered in a forum dedicated to DAOS. In any case, given the short support lifetime of C7, I'd recommend you consider moving to something more recent, like CentOS 9 or one of the other RHEL 9 based distros. I hear good things about Rocky Linux. Or move to Ubuntu or Debian.
"A little song, a little dance, a little seltzer down your pants" Chuckles the clown