It appears we renamed `/ondemand/` to `/ondemandvm/` at some point and,
as a result, have not been stripping hostnames from that endpoint's
metrics. This has caused issues with metrics collection due a very high
cardinality.
This change adds detection of running instances that are in a running
queue, but have no data in a active queue for the same pool. When this
happens a machine will live forever, impacting the running count, and
preventing the machine from being killed. Without this change running
instances that are not marked as active will live forever.
This change fixes template alias evaluation to ensure that the correct
data is set when generating on demand requests for pools that have a
backend weight configured for a value of 0. Without this change vmpooler
will return an empty selection in api for template alias evaluation.
To support this change tests are added that first reproduced the
failure, and then verified that it is resolved with the addition of the
patch. Additionally, test coverage is added to ensure that code paths
that include pickup gem usage are covered.
Introducing the Prometheus Stats code into ABS showed that the Clarity
could be improved a bit with better variable naming, some refactoring
to reduce repitition and documenting the Metrics table itself.
Filtering these changes back to the vmpooler code base.
This commit updates folder purging references to ensure that provider
name references are referring to the named provider, rather than the
provider type. Without this change folder purging fails because it
cannot identify target folders.
Ensure that the correct stats are registered for the Manager and the api
respectively. E.g. all checkout counters are for the api only, whereas
clone times belong to the manager.
Also new ondemand functionality stats weren't registered, so add these
along with missing delete stats.
Review changes suggested to revise the Metrics related files into a more
logical class structure.
Also fixup grammar typos in docs strings and any trailing metrics that
have been recently added to vmpooler.
Use the example provided in the Ruby Client to provide a customised
collector appropriate to log all calls to the API. The customised
filtering is used to replace individual node names and templates
for the /vm and request ID's for the /ondemand endpoints.
This module was failing our rubocop checks so have updated it since
it now forms part of vmpooler.
Separate trapping for litmus jobs is also included so that they don't
interfere with stats from the jenkins pipelines.
Break down the usage stats into smaller groups so as to manage the
number of stat lines collected for Prometheus.
This may need some further revision to filter out Litmus stats, or
otherwise collect litmus usage information.
The redis pooler connection metric used "metric_prefix" which is
misleading, so split this into connpool_type and connpool_provider.
Also remove some earlier jruby compatibility code to reduce
rebase conflicts when this is rebased on top of Matt's changes.
This is a re-architect of the vmpooler initialisation code to:
1. Allow an API service for both manager and the api
2. Add the Prometheus endpoints to the web service.
Needed to change the way the Rack Service is started as instantiating
using ".New" leads to a failure to initialise the http Stats
collection.
3. Selectively load the pooler api and/or Prometheus endpoints.
4. Rework API Spec tests for revised API loading. Needed to tidy up the
initialisation and perform a reset! after each test to avoid "leaks"
and dependencies between the tests.
Add a new Prometheus class as an additional stats feed along with the
existing feeds.
Move the metrics initialisation code into its own class and sub-class
the individual metrics implementations under this.
* (POOLER-174) Reduce duplicate of on demand code introduced in POOLER-158
refactored every parsing of request of type 'pool_alias:pool:count' into a
utility class, that is used by pool_manager and the api v1 class
* add some metrics to the od request generation
* fix rubocop offenses, we are now friends
This change adds a capability to vmpooler to provision instances on
demand. Without this change vmpooler only supports retrieving machines
from pre-provisioned pools.
Additionally, this change refactors redis interactions to reduce round
trips to redis. Specifically, multi and pipelined redis commands are
added where possible to reduce the number of times we are calling redis.
To support the redis refactor the redis interaction has changed to
leveraging a connection pool. In addition to offering multiple
connections for pool manager to use, the redis interactions in pool
manager are now thread safe.
Ready TTL is now a global parameter that can be set as a default for all
pools. A default of 0 has been removed, because this is an unreasonable
default behavior, which would leave a provisioned instance in the pool
indefinitely.
Pool empty messages have been removed when the pool size is set to 0.
Without this change, when a pool was set to a size of 0 the API and pool
manager would both show that a pool is empty.
Before this PR, the current running time was being inspected to decide if the
vm lifetime could be extended. But since vm lifetime is absolute and not relative
this check is now removed.
This commit fixes the purge_unconfigured_folders feature to ensure that it can successfully identify folders and instances that are no longer used. Without this change the feature does not work as advertised.
This commit adds detection for redis connection failures to pool_manager. When a connection fails the error will be raised to executeforcing the connection to be re-established. Without this change, when a redis connection fails, it generates a redis connection error, which is swallowed by a rescue for StandardError, preventing the manager application component from recovering in the case of a redis connection failure.
This commit adds the extra_config option to vmpooler to allow specifying additional configuration files to load from. Without this change vmpooler does not offer a mechanism to provide additional configuration files for the application.
This commit adds a capability to vmpooler to reset a pool, deleting its ready and pending instances and replacing them with fresh ones. Without this change vmpooler does not offer a mechanism to reset a pool without also changing its template.
* (POOLER-123) Implement a max TTL
Before this change, we could checkout a vm and set the lifetime to a
very high number which would esssentially keep the vm running forever.
Now implementing a config setting max_lifetime_upper_limit which enforces
a maximum lifetime in hours both for initial checkout and extending a
running vm
* (POOLER-123) Improve PUT vm endpoint error messaging
Prior to this commit the PUT vm endpoint didn't give any useful
information about why a user's request failed.
This commit updates PUT to output a more helpful set of error messages
in the `failure` key that gets returned in the JSON response.
* (POOLER-123) Update max_lifetime_upper_limit key
This commit switches the max_lifetime_upper_limit key from being a
symbol to being a string, which is what the config hash seems to contain.
* (maint) Add option to disable Redis persistence in docker-compose
This commit is just a handy little command override to the redis
container to prevent persistence.
Prior to this commit the pooler had no awareness of the complete set of
hostnames that are currently in use. This meant that it was possible to
allocate the same hostname twice, which would result in the original
host with that hostname becoming unreachable.
This commit adds a check for the existence of the
`vmpooler__vm__<hostname>` key before attempting to clone the vm.
This should prevent duplicate hostnames.
If the hostname is already taken, `_clone_vm` will retry with a new
random hostname multiple times before raising an exception.
Prior to this commit the hostname_shorten regex wouldn't match the
updated human readable hostnames because they contain dashes.
This commit updates the regex to capture dashes in the hostname, and
adds a few specs to verify that behavior.
This commit adds a shared mutex to vmpooler API so that checkout requests can be synchronized across threads. Without this change it is possible in some scenarios for vmpooler to allocate the same SUT to different API requests for a VM.
This commit updates the create_linked_clone pool option to correctly detect when linked clones have been set at a pool level. Without this change a pool setting create_linked_clone to false is not interpreted correctly, and a linked clone is created if possible.
This change adds the running host for a VM to the API data available via /vm/hostname. Without this change the running host would be logged to vmpooler log, but not available any other way. Additionally, the data will specify if a machine has been migrated. Without this change parent host data for a vmpooler machine is not available via the vmpooler API.
This commit adds a new configuration parameter to allow setting whether to create linked clones on a global, or per pool basis. Without this change vmpooler would always attempt to create linked clones. The default behavior of creating linked clones is preserved.
This allows the user to change the cluster in which the targeted pool
will clone to. Upon configuration change, the thread will wake up and
execute the change within 1 second.
This commit duplicates the vm_ready? check to the API layer to allow for API to validate that a VM is alive at checkout. Without this change API relies upon the checks in pool_manager validating pools. This change should allow for additional insight into whether a machine is in a ready state and resopnding at checkout time.
Before this change looping over many pools would query the redis backend
for each pool, leading in slow response from the backend for configurations
with many pools (60+)
Changed the requests to use redis pipelines https://redis.io/topics/pipelining
This is supported since the beginning, so will not force any redis update for
users. The pipeline method runs the queries in batches and we need to loop
over the result and reduces the number of requests to redis by N=number of
pools in the configuration.
This commit updates how a VM is checked out to ensure that there is no window where the VM could be considered discovered, and therefore destroyed. Without this change the VM is retrieved by calling 'spop' on the ready queue, and then adding it to the running queue. This change moves to selecting the VM by retrieving the last member of the set, and moving it with 'smove' from ready to running. As a result of this change vmpooler moves from retrieving the VMs from the ready state randomly, to instead retrieve the oldest VM in the queue. This change should reduce churn where it would otherwise not be required to satisfy demand.