Ensure that the correct stats are registered for the Manager and the api
respectively. E.g. all checkout counters are for the api only, whereas
clone times belong to the manager.
Also new ondemand functionality stats weren't registered, so add these
along with missing delete stats.
Review changes suggested to revise the Metrics related files into a more
logical class structure.
Also fixup grammar typos in docs strings and any trailing metrics that
have been recently added to vmpooler.
Use the example provided in the Ruby Client to provide a customised
collector appropriate to log all calls to the API. The customised
filtering is used to replace individual node names and templates
for the /vm and request ID's for the /ondemand endpoints.
This module was failing our rubocop checks so have updated it since
it now forms part of vmpooler.
Separate trapping for litmus jobs is also included so that they don't
interfere with stats from the jenkins pipelines.
Break down the usage stats into smaller groups so as to manage the
number of stat lines collected for Prometheus.
This may need some further revision to filter out Litmus stats, or
otherwise collect litmus usage information.
The redis pooler connection metric used "metric_prefix" which is
misleading, so split this into connpool_type and connpool_provider.
Also remove some earlier jruby compatibility code to reduce
rebase conflicts when this is rebased on top of Matt's changes.
This is a re-architect of the vmpooler initialisation code to:
1. Allow an API service for both manager and the api
2. Add the Prometheus endpoints to the web service.
Needed to change the way the Rack Service is started as instantiating
using ".New" leads to a failure to initialise the http Stats
collection.
3. Selectively load the pooler api and/or Prometheus endpoints.
4. Rework API Spec tests for revised API loading. Needed to tidy up the
initialisation and perform a reset! after each test to avoid "leaks"
and dependencies between the tests.
Add a new Prometheus class as an additional stats feed along with the
existing feeds.
Move the metrics initialisation code into its own class and sub-class
the individual metrics implementations under this.
* (POOLER-174) Reduce duplicate of on demand code introduced in POOLER-158
refactored every parsing of request of type 'pool_alias:pool:count' into a
utility class, that is used by pool_manager and the api v1 class
* add some metrics to the od request generation
* fix rubocop offenses, we are now friends
This change adds a capability to vmpooler to provision instances on
demand. Without this change vmpooler only supports retrieving machines
from pre-provisioned pools.
Additionally, this change refactors redis interactions to reduce round
trips to redis. Specifically, multi and pipelined redis commands are
added where possible to reduce the number of times we are calling redis.
To support the redis refactor the redis interaction has changed to
leveraging a connection pool. In addition to offering multiple
connections for pool manager to use, the redis interactions in pool
manager are now thread safe.
Ready TTL is now a global parameter that can be set as a default for all
pools. A default of 0 has been removed, because this is an unreasonable
default behavior, which would leave a provisioned instance in the pool
indefinitely.
Pool empty messages have been removed when the pool size is set to 0.
Without this change, when a pool was set to a size of 0 the API and pool
manager would both show that a pool is empty.
Before this PR, the current running time was being inspected to decide if the
vm lifetime could be extended. But since vm lifetime is absolute and not relative
this check is now removed.
This commit fixes the purge_unconfigured_folders feature to ensure that it can successfully identify folders and instances that are no longer used. Without this change the feature does not work as advertised.
This commit adds detection for redis connection failures to pool_manager. When a connection fails the error will be raised to executeforcing the connection to be re-established. Without this change, when a redis connection fails, it generates a redis connection error, which is swallowed by a rescue for StandardError, preventing the manager application component from recovering in the case of a redis connection failure.
This commit adds the extra_config option to vmpooler to allow specifying additional configuration files to load from. Without this change vmpooler does not offer a mechanism to provide additional configuration files for the application.
This commit adds a capability to vmpooler to reset a pool, deleting its ready and pending instances and replacing them with fresh ones. Without this change vmpooler does not offer a mechanism to reset a pool without also changing its template.
* (POOLER-123) Implement a max TTL
Before this change, we could checkout a vm and set the lifetime to a
very high number which would esssentially keep the vm running forever.
Now implementing a config setting max_lifetime_upper_limit which enforces
a maximum lifetime in hours both for initial checkout and extending a
running vm
* (POOLER-123) Improve PUT vm endpoint error messaging
Prior to this commit the PUT vm endpoint didn't give any useful
information about why a user's request failed.
This commit updates PUT to output a more helpful set of error messages
in the `failure` key that gets returned in the JSON response.
* (POOLER-123) Update max_lifetime_upper_limit key
This commit switches the max_lifetime_upper_limit key from being a
symbol to being a string, which is what the config hash seems to contain.
* (maint) Add option to disable Redis persistence in docker-compose
This commit is just a handy little command override to the redis
container to prevent persistence.
Prior to this commit the pooler had no awareness of the complete set of
hostnames that are currently in use. This meant that it was possible to
allocate the same hostname twice, which would result in the original
host with that hostname becoming unreachable.
This commit adds a check for the existence of the
`vmpooler__vm__<hostname>` key before attempting to clone the vm.
This should prevent duplicate hostnames.
If the hostname is already taken, `_clone_vm` will retry with a new
random hostname multiple times before raising an exception.
Prior to this commit the hostname_shorten regex wouldn't match the
updated human readable hostnames because they contain dashes.
This commit updates the regex to capture dashes in the hostname, and
adds a few specs to verify that behavior.
This commit adds a shared mutex to vmpooler API so that checkout requests can be synchronized across threads. Without this change it is possible in some scenarios for vmpooler to allocate the same SUT to different API requests for a VM.
This commit updates the create_linked_clone pool option to correctly detect when linked clones have been set at a pool level. Without this change a pool setting create_linked_clone to false is not interpreted correctly, and a linked clone is created if possible.
This change adds the running host for a VM to the API data available via /vm/hostname. Without this change the running host would be logged to vmpooler log, but not available any other way. Additionally, the data will specify if a machine has been migrated. Without this change parent host data for a vmpooler machine is not available via the vmpooler API.
This commit adds a new configuration parameter to allow setting whether to create linked clones on a global, or per pool basis. Without this change vmpooler would always attempt to create linked clones. The default behavior of creating linked clones is preserved.
This allows the user to change the cluster in which the targeted pool
will clone to. Upon configuration change, the thread will wake up and
execute the change within 1 second.
This commit duplicates the vm_ready? check to the API layer to allow for API to validate that a VM is alive at checkout. Without this change API relies upon the checks in pool_manager validating pools. This change should allow for additional insight into whether a machine is in a ready state and resopnding at checkout time.
Before this change looping over many pools would query the redis backend
for each pool, leading in slow response from the backend for configurations
with many pools (60+)
Changed the requests to use redis pipelines https://redis.io/topics/pipelining
This is supported since the beginning, so will not force any redis update for
users. The pipeline method runs the queries in batches and we need to loop
over the result and reduces the number of requests to redis by N=number of
pools in the configuration.
This commit updates how a VM is checked out to ensure that there is no window where the VM could be considered discovered, and therefore destroyed. Without this change the VM is retrieved by calling 'spop' on the ready queue, and then adding it to the running queue. This change moves to selecting the VM by retrieving the last member of the set, and moving it with 'smove' from ready to running. As a result of this change vmpooler moves from retrieving the VMs from the ready state randomly, to instead retrieve the oldest VM in the queue. This change should reduce churn where it would otherwise not be required to satisfy demand.
This change updates handling of pool aliases to allow for more than a
single pool to be configured as an alias pool. Without this change if
multiple pools are configured as an alias the last one to configure it
is set as the alias for that pool.
Additionally, redis testing requirements are removed in favor of
mock_redis. Without this change a redis server is required to run
vmpooler tests.
This commit updates vmpooler to set configuration values received via environment variables to integer values when an integer value is expected. Without this change vmpooler does not support setting integer values via environment variables. Additionally, testing is added for configuring vmpooler via environment variables.
To support this testing the gem climate_control is added, which allows
for testing environment variables without those set variables leaking to
other tests.
This commit updates vm usage stats collection to replace all instances of '.' characters within node strings. Without this change the node string containing a '.' character causes the metric to be interpreted as containing another node.
This commit updates providers_spec so the test ensures array content of the providers match. Without this change the provider_spec test will fail when comparing providers if the order is not exactly the same in each array.
This commit updates vmpooler to ship VM usage stats when a VM is destroyed. The stats are gathered for jobs based on user and pool name. If a jenkins build URL is present then this is broken down by user, instance, value stream, branch and project. Additionally, if present then the RMM_COMPONENT_TO_TEST_NAME will be listed after project. Without this change we do not collect stats on per VM usage and its correlation to users and pools.
This commit fixes checking of a VM that has already been identified as ready. Without this change a ready VM that has failed will be identified as having failed, but will not successfully be removed from the ready queue. Additionally, the default vm_checktime value has been reduced from 15 to 1 to ensure that ready VMs are checked within one minute of the time they have reached the ready state by default.
Lastly, the docker-compose files are updated to specify that the redis
instance used is a local redis instance.
This commit updates fetch_single_vm to return the name of the template that was requested, instead of the name of the pool providing the VM to meet the request. Without this change, when an alias is used for fetching a VM, a different pool title may be returned containing the requested VMs than the user initially requested.
This commit updates delta disk creation to reduce the likelihood of this being run more than once for any given template. Without this change an error can be generated with vsphere 6.5 or later when a template is updated, and then the update is reverted. The error prevents the image from being used because the template is never marked as prepared. To address this any failure is now logged, and the template is marked as prepared regardless of whether this was successful, or not, which allows the image to be used despite the error. This failure mode is more graceful and allows the pool to continue to function.
This commit refactorss the check_pool method in pool_manager.
Specifically, each commented section describing a stage of check_pool is
broken out into a separate method and check_pool is simplified by
calling these methods. Without this change it is difficult to follow the
intent for or make changes to check_pool.
Additionally, a docker-compose file is added to make it simple to launch
an all-in-one vmpooler instance along with a separate redis server with
docker.
This commit updates destroy_vm to remove references to its mutex tracking object when destroyed. Without this change a VM that is destroyed will leave its mutex tracking object forever causing the pool manager memory footprint to increase.
Prior to this commit the docs and examples used 'company.com'
(a real domain). This commit changes those occurrences to
'example.com', which is a IANA-managed reserved domain.