(bug) Prevent failing VMs to be retried infinitely (ondemand)

Normally when a VM is failing the vm_ready? check, it is moved to the completed queue which deletes it. In a pooled config a new VM will be retried. For ondemand, we would also recreate the task to trigger the creation of a new VMs. There was a bug where an ondemand request would be retried infinitely when vm_ready? would always fail. We would never check the status of the request if it was deleted via the API or if it was detected as failed because it is expired (over the ondemand_request_ttl limit)
2026-01-26 01:58:41 -05:00 · 2022-07-25 08:29:11 -05:00 · 2022-07-25 08:29:11 -05:00 · 980344ee24
commit 980344ee24
parent 35102d57cd
1 changed files with 7 additions and 1 deletions
--- a/lib/vmpooler/pool_manager.rb
+++ b/lib/vmpooler/pool_manager.rb
@ -119,7 +119,13 @@ module Vmpooler
          pool_alias = redis.hget("vmpooler__vm__#{vm}", 'pool_alias') if request_id
          redis.multi
          redis.smove("vmpooler__pending__#{pool}", "vmpooler__completed__#{pool}", vm)
-          redis.zadd('vmpooler__odcreate__task', 1, "#{pool_alias}:#{pool}:1:#{request_id}") if request_id
+          if request_id
+            ondemandrequest_hash = redis.hgetall("vmpooler__odrequest__#{request_id}")
+            if ondemandrequest_hash && ondemandrequest_hash['status'] != 'failed' && ondemandrequest_hash['status'] != 'deleted'
+              # will retry a VM that did not come up as vm_ready? only if it has not been market failed or deleted
+              redis.zadd('vmpooler__odcreate__task', 1, "#{pool_alias}:#{pool}:1:#{request_id}")
+            end
+          end
          redis.exec
          $metrics.increment("errors.markedasfailed.#{pool}")
          $logger.log('d', "[!] [#{pool}] '#{vm}' marked as 'failed' after #{timeout} minutes")