Could not find the record when gitlab CI tried to ssh to the machine but here the record of:
when shibd
started:
2024-03-25 02:21:51 INFO Shibboleth.Listener : listener service starting
When Gitlab CI accessed Shibboleth Metadata, which is the job before ssh:
138.26.49.1 - - [25/Mar/2024:02:21:56 -0500] "GET /Shibboleth.sso/Metadata HTTP/1.1" 200 7649 "-" "curl/7.88.1"
When host key got downloaded on the machine
Mar 25 02:21:58 ood-knightly cloud-init: download: 's3://knightly-key/ssh_host_rsa_key.pub' -> '/etc/ssh/ssh_host_rsa_key.pub' (382 bytes in 0.9 seconds, 416.90 B/s) which happened 2 seconds after accessing shibboleth metadata.
So I think it is a race condition issue. Maybe we should put a sleep before ssh command
I tried to ssh the machine, few hours after the test, and didn't get host key warning. It might be a race condition issue.
Checking log on OpenStack WebUI, the host keys were indeed downloaded.
[ 29.347496] cloud-init[1634]: download: 's3://knightly-key/*' -> '/etc/ssh/*' (1332 bytes in 0.0 seconds, 31.16 KB/s)
[ 29.405514] cloud-init[1634]: download: 's3://knightly-key/ssh_host_ecdsa_key' -> '/etc/ssh/ssh_host_ecdsa_key' (227 bytes in 0.1 seconds, 3.91 KB/s)
[ 30.326510] cloud-init[1634]: download: 's3://knightly-key/ssh_host_ecdsa_key.pub' -> '/etc/ssh/ssh_host_ecdsa_key.pub' (162 bytes in 0.9 seconds, 176.18 B/s)
[ 30.397475] cloud-init[1634]: download: 's3://knightly-key/ssh_host_ed25519_key' -> '/etc/ssh/ssh_host_ed25519_key' (387 bytes in 0.1 seconds, 5.48 KB/s)
[ 30.442422] cloud-init[1634]: download: 's3://knightly-key/ssh_host_ed25519_key.pub' -> '/etc/ssh/ssh_host_ed25519_key.pub' (82 bytes in 0.0 seconds, 1909.11 B/s)
[ 30.477033] cloud-init[1634]: download: 's3://knightly-key/ssh_host_rsa_key' -> '/etc/ssh/ssh_host_rsa_key' (1679 bytes in 0.0 seconds, 50.20 KB/s)
[ 31.395470] cloud-init[1634]: download: 's3://knightly-key/ssh_host_rsa_key.pub' -> '/etc/ssh/ssh_host_rsa_key.pub' (382 bytes in 0.9 seconds, 416.90 B/s)
Last night we experienced this issue again.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:yNnJMJ4WTHY7LDVprTnTn+BEKdLPlXoo3okpkjinh3o.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ED25519 key in /root/.ssh/known_hosts:6
remove with:
ssh-keygen -f "/root/.ssh/known_hosts" -R "138.26.49.134"
Host key for 138.26.49.134 has changed and you have requested strict checking.
Host key verification failed.
We don't really need to access nfs during boot time. We want to mount nfs after the network is available.
One solution is to move the nfs mount definition out of fstab and mount them with mount
command after the instance is completely up and running. Putting the command in user_data
could potentially solve the issue.
Another solution is using automounter like autofs
. We have autofs definition ready at build time. Because we are not accessing the folder during boot time it won't stop the booting process even if the network is not available at the beginning.
In the !80 (merged), we try with the latter option.
Bo-Chun Chen (f71716da) at 29 Feb 23:43
Merge branch 'feat-autofs' into 'main'
... and 9 more commits
Closes #113
Closes #113
Closes #113
https://gitlab.rc.uab.edu/rc/packer-openstack-hpc-image/-/jobs/41597
The assumption was that the image name format is only used by the pipeline, which is proven false.
Bo-Chun Chen (0baa78b5) at 14 Feb 09:11
Merge branch 'fix-clean-own-image' into 'main'
... and 1 more commit
Closes #112
Closes #112
https://gitlab.rc.uab.edu/rc/packer-openstack-hpc-image/-/jobs/41597
The assumption was that the image name format is only used by the pipeline, which is proven false.
https://gitlab.rc.uab.edu/rc/packer-openstack-hpc-image/-/jobs/41126
Looks like pipeline failed at pipeline cleanup.
https://uab-rc.slack.com/archives/C9YB78YSX/p1705590875167869
Ops team pointed out that there are a bunch of image artifacts, mostly from the knightly pipeline. Notice that the OOD pipeline should've cleaned up the old image, but it doesn't seem to work. Further investigation shows that we did not update the grep pattern when we updated the timestamp to look like RCF3339. (!65 (22e8f3ff))
Bo-Chun Chen (848d8839) at 13 Feb 14:25
Merge branch 'feat-cleanup-job' into 'main'
... and 7 more commits