Previously, it was not able to mock objects to impersonate various RBVMOMI
objects. This commit changes the case statement to use `base.is_a?`
which can be mocked and allow mocked objects to mimic real objects.
Previously, it was not able to mock objects to impersonate a RbVmomi::VIM::Folder
object. This commit changes the case statement to an if statement using is_a?
which can be mocked and allow mocked objects to mimic real objects.
VM provisioning will be handled by VM Providers. This commit renames the use of
vsphere to provider where appropriate and changes the per-pool helper from
vsphere to providers to more accurately represent it's intended use.
Previously all of the VM provisioning code was intertwined with the VM lifecycle
code e.g. The VSphere specific code is mixed with Redis code. This makes it
impossible to add aditional providers or disable VSphere integration. This
commit begins the process to refactor the VSphere code out of the lifecycle code
by introducing the concept of VM Providers. A Provider will contain the logic/
code to manage VMs i.e. create/destroy/inquire. Therefore the Pool Manager can
query a strict interface into one or more Providers. Initially only a VSphere
provider will be available.
This commit adds the base class for all providers and describes the API or
contract that the Pool Manager will use to manage VMs.
In commit 03ad7bfa46 the global variables for credentials was change to an
instance variable however one reference was missed. This commit fixes this
omission.
This commit updates usage of global variables in vsphere_helper to be instance variables. They do not need to be global variables as brought up in issue #194. Without this change we are setting global variables when they are not needed.
This commit updates VM Pooler to fix any existing rubocop offenses and also
fixes any offenses in the lib/vmpooler.rb file. This commit also regenerates
the Todo file.
Previously, the clone_vm method took various VSphere specific parameters e.g.
template folder. However in order make VMPooler less VSphere specific this
method should just take the pool configuration and then it can determine the
appropriate settings itself. This commit also moves the threading to a clone_vm
while the actual method which does the work is now _clone_vm as per all other
multithread worker methods in pool_manager. This commit also updates the spec
tests appropriately.
Previously in check_ready_vm, if the VM is powered off, the VM is moved in
redis however the function doesn't return there, and instead then checks if the
hostname is the same, and then if TCP socket 22 is open. This is unnecessary as
we already know the VM is turned off so of course the hostname is wrong and TCP
22 is unavailable. The same applies for the VM hostname.
This commit instead returns after it is found a VM is no longer ready. This
commit also amends the spec tests for the correct behaviour.
Add spec tests for check_snapshot_queue
Previously the check_snapshot_queue method would execute the loop indefinitely
as it did not have a terminating condition. This made it impossible to test.
This commit modifies the check_snapshot_queue method so that it can take a
maxloop and delay parameter so that it can be tested.
Add spec tests for check_disk_queue
Previously the check_disk_queue method would execute the loop indefinitely as it
did not have a terminating condition. This made it impossible to test. This
commit modifies the check_disk_queue method so that it can take a maxloop and
delay parameter so that it can be tested.
Add spec tests for check_pool
Previously the check_pool method would execute the loop indefinitely as it did not
have a terminating condition. This made it impossible to test. This commit
modifies the check_pool method so that it can take a maxloop and delay parameter
so that it can be tested.
Add spec tests for execute!
Previously the execute! method would execute the loop indefinitely as it did not
have a terminating condition. This made it impossible to test. This commit
modifies the execute! method so that it can take a maxloop and delay parameter
so that it can be tested.
If a user attempts to start vmpooler using dummy authentication
without setting the environment variable VMPOOLER_DEBUG, the vmpooler
will now refuse to start.
Previously it was difficult to do local development as VMPooler requires an LDAP
service for authentication. This commit adds a dummy authentication provider.
The provider has passes authentication if the username and password are
different, and fails if the username and password are the same. This commit
also updates the documentation in the config YML file.
If `ENV['VMPOOLER_CONFIG']` is defined, it is read in as a YAML
configuration. This allows vmpooler to run in a docker daemon via
`docker run -e VMPOOLER_CONFIG -p 80:4567 -it vmpooler` rather than
embedding a YAML file within the container.
This commit updates vmpooler to ensure clone errors, and other pool manager errors are raised to the parent method. Without this change vmpooler gets stuck after a connection fails during clone operations and will not attempt to clone again.
This commit updates vmpooler to clear the migrations queue at application start time. When the application is shut down it is not considerate of any activities, like migrations, in flight. The result is that when the application is started again any stale entries in vmpooler__migration will be left until manually removed, which can prevent migrations from occurring.
This commit adds retry logic and configurable delays to vsphere helper.
Without this change vmpooler instances that have large numbers of pools
can create enough connections in a short period of time to cause vcenter
issues.
This commit updates vmpooler.migrate metric to send the pool name. Without this change the VM name is sent. Additionally, counts are added for migrate_from_#host and migrate_to_#host in order to allow tracking host migrations, and correlating them with CPU ready time changes.
This commit updates vmpooler to understand how to resolve a situation
where a pending VM does not exist. Without this change a pending VM that
does not exist in vmware inventory gets stuck in the pending state,
preventing the pool from ever reaching its target capacity.
As a part of this change the find_vm method is updated to perform a
light, then heavy search each time find_vm is called and all usage of
find_vm || find_vm_heavy is replaced. This makes find_vm usage
consistent across pool_manager.
Additionally, open_socket method is updated to resolve an incorrect
reference to the host name.
This commit updates pool manager to use a method for opening a socket
instead of opening it directly from check_pending_vm. Support is added
for specifying the domain of the VM to connect to, which lays the
groundwork for doing away with the assumption of having DNS search
domains set for vmpooler to move VMs to the ready state.
Additionally, this commit adds a block to ensure open_socket closes open connections. Without this change sockets are opened to each VM before moving to the ready state, and never explicitly closed.
Also, use open socket for check_ready_vm
Update migrate_vm to make clear when an if block is the end of the line by returning. Use scard instead of smembers.size() for determining migrations in progress.
A method is added to make more clear what's happening when checking if a socket can be opened to a pending VM on port 22. Additionally, the connection appends domain from the configuration, when present, to the VM name so DNS search is not required.
This commit adds a capability to pool_manager to migrate VMs placed in the migrating queue. When a VM is checked out an entry is created in vmpooler__migrating. The existing process for evaluating VM states executes the migrate_vm method for the provided VM, and removes it from the queue. The least used compatible host for the provided VM is selected and, if necessary, a migration to the lesser used host is performed. Migration time and time from the task being queued until completion are both tracked with the redis VM object in 'migration_time' and 'checkout_to_migration'. The migration time is logged in the vmpooler.log, or the VM is reported as not requiring migration. Without this change VMs are not evaluated for checkout at request time.
Add a method to wrap find_vm and find_vm_heavy in order to allow a
single operation to be performed that does both.
This commit also adds support for a configuration setting called
migration_limit that makes migration at checkout optional. Additionally,
logging is added to report a VM parent host when it is checked out. Without this
change vmpooler assumes that migration at checkout is always enabled.
If this setting is not present, or if the setting is 0, then migration
at checkout will be disabled. If the setting is greater than 0 then that
setting will be used to enforce a limit for the number of simultaneous
migrations that will be evaluated.
Documentation of this configuration option is added to the
vmpooler.yaml.example file.
Prior to this the only per-pool statistics that could be extracted from the API
were a list of empty pools in the "status" section of the returned results of
the `/status` endpoint.
This adds a new "pools" section to the '/status' results which lists, for each
pool, the following results:
- The number of ready vms in the pool
- The number of running vms in the pool
- The number of pending vms in the pool
- The maximum size of the pool (as specified in the vmpooler configuration)
Example:
```
{
"boot": {
"duration": {
"average": 163.6,
"min": 65.49,
"max": 830.07,
"total": 247744.71000000002
},
"count": {
"total": 1514
}
# ...
"pools": {
"pool1": {
"ready": 5,
"running": 2,
"pending": 1,
"max": 15
},
"pool2": {
"ready": 0,
"running": 10,
"pending": 0,
"max: 10
}
}
}
```
This includes spec coverage for this change (we could use more specs on `/status` in general); as well as a couple of general spec improvements.
* [QENG-3919] Make vmpooler checkouts be all or nothing (#153)
* (QENG-3919) spike for implementation of all-or-nothing checkout
* Fix two botched variable references
* Aggregate API helper methods
* Add specs for failed multi-vm allocation API endpoints
* (QENG-3919) Add tests for multiple vm requests
* (QENG-3919) Add (failing) specs for POST /vm/pool1+pool2 usages
This exposes the old (bad) behavior on this other code path. Will fix this up next.
* (QENG-3919) Bring query params version in line with JSON post version
Not clear to me why these had to be implemented so differently.
* (QENG-3919) extract common method from both methods of VM allocation
* (QENG-3919) Naming fix, cosmetic cleanups
I mean, I presume all these commits are going to get squashed away on merge anyway.
* (QENG-3919) Update API docs
We consider it a bug that the actual behavior was not this behavior, but the
documentation was also silent on this point.
* (QENG-3919) minor readability tweak in refactored method
* (QENG-3919) Clean up interim comments re: status codes
* (QENG-3919) Drop now-orphaned `checkout_vm` method
We kept this up-to-date while we were upgrading and refactoring, but, turns out,
this method is no longer called anywhere. 💀🔥
* (QENG-3919) Return 503 status on failed allocation
Making sure we go back to the original functionality, which was:
- status 200 when vms successfully allocated
- status 404 when a pool name is unknown
- status 404 when no pool name is specified
- status 503 when vm allocation failed
* (QENG-3919) add net-ldap to Gemfile
Maybe we shouldn't foil-ball gems onto servers.
* (QENG-3919) Turns out, spush isn't a redis command
And hence we see once again the weakness of mockist tests.
* (QENG-3919) Pin the net-ldap gem to 0.11 for the jrubies, etc.
* (QENG-3919) Correct an old spelling error in spec descriptions
* (QENG-3919) Further tweak net-ldap version
* (QENG-3919) return_single_vm -> return_vm_to_ready_state
cc @shermdog
* (RE-7014) Add support for statsd
They way we were using graphite was incorrect for the type of data we were sending it. statsd is the appropriate mechanism for our needs.
statsd and graphite are mutually exclusive and configuring statsd will take precendence over Graphite. Example of configuration in vmpooler.yaml.example
* (RE-7014) Add tracking of vm gets via statsd
Add the tracking of successful, failed, invalid, and empty pool vm gets. It is possible we may want to tweak this, but have validated with spec tests and pcaps.
```
vmpooler-tmp-dev.ready.debian-7-x86_64:1|c
vmpooler-tmp-dev.running.debian-7-x86_64:1|c
vmpooler-tmp-dev.checkout.invalid:1|c
vmpooler-tmp-dev.checkout.success.debian-7-x86_64:1|c
vmpooler-tmp-dev.checkout.empty:1|c
vmpooler-tmp-dev.running.debian-7-x86_64:1|c
vmpooler-tmp-dev.clone.debian-7-x86_64:12.10|ms
vmpooler-tmp-dev.ready.debian-7-x86_64:1|c
```
* (RE-7014) statsd nitpicks and additional rspec
Cleaned up some code review nitpicks and added pool_manager_spec for empty pool.
* (RE-7014) update statsd to use gauge for running/ready
Previously was using increment which was incorrect for that particular application.
* Revert "Merge pull request #155 from shermdog/RE-7014-cinext"
This reverts commit cc03a86f6a, reversing
changes made to 5aaab7c5c2.
* (QENG-4070) Consistently return 503 if valid pool is empty
There were several problems with how the pooler checked out vms with
respect to empty pools, invalid pools, and aliases:
- If the vmpooler config did not contain any aliases and the caller
requested a vm from an empty pool or a non-existent one, the vmpooler
would error with:
NoMethodError - undefined method `[]' for nil:NilClass
If the config contained a non-nil alias section, then:
- If the caller requested a vm from an empty pool and either the vm
didn't have an alias or the aliased pool was empty or non-existent, then
the request for that vm would be silently ignored. The vmpooler would
return 200 if the caller asked for multiple vms and the vmpooler was
able to checkout at least one vm. Otherwise it would return 404.
- Similarly, if the caller requested a vm from a non-existent pool, then
the request was silently ignored.
This commit adds a `pool_names` Set to the config containing all valid
pool names including aliases. This is used to determine whether a
requested template name is valid or not. This is necessary because redis
does not distinguish between empty and non-existent sets, e.g. the
following returns false in both cases:
backend.exists('vmpooler__ready__' + key)
If the caller requests a vm (single or multiple), and any vm references
an invalid pool name, we immediately return 404. Otherwise, we know the
request is for valid pool names, since the vmpooler requires a restart
to change pool names and counts.
We then attempt to acquire each vm, trying to match on pool name or
failing back to aliased pool name, as was the previous behavior.
The resulting behavior is:
- If the caller asks for at least one vm from an unknown pool, then
don't try to checkout any vms and respond with 404.
- If the caller asks for a vm, and at least one pool is empty, then
respond with 503, returning checked out vms back to the pool.
- Otherwise return 200 with the list of checked out vms.
This commit also makes `alias` optional again.
This commit also re-enables tests that were merged in from master, but
originally commented out due to the bugs described above..
* (maint) Add json pessimistic pin
json 2.0.x was released on July 1 and is not compatible with ruby < 2.0.
Since we still support that version, add a pessimistic pin, which is
what we were using prior to July 1.
* [QENG-4070] Make json version conditional on RUBY_VERSION
* Drop extraneous mocks from updated test
* Revert "Revert "Merge pull request #155 from shermdog/RE-7014-cinext""
This reverts commit 0fd6fff934.
* Fix some spec errors
These were caused in part by dropping changes from the original PR when we
dropped the v1_spec.rb master test file (in favor of the updated and separated
versions).
* [QENG-4075] Fix bug with template name on allocation failure
We're returning [nil,nil] in this case, meaning that name will not be set. This
means we'll get an error trying to concatenate the stats string. Use the
requested template name here instead.
* [QENG-4075] Refactor statsd methods / classes
Prior to this we could easily run into situations where `statds_prefix` would
be `nil` (and possibly the `statsd` handle itself). There was some significant
complexity and brittleness in how statsd was set up.
Refactored so that:
- `statsd_prefix` is no longer exposed to any callers of statsd methods
- there is now a `Vmpooler::DummyStatsd` class which can be returned when we are not actually going to publish stats, but would like to keep the calling interface consistent
- setup of the statsd handle is via just passing in `config[:statsd]`, if `nil`, this will result in a dummy handle being return
- defaulting of `server` values was fixed -- this did not actually work in the previous implementation. `config[:statsd][:server]` is now required.
- tests use a `DummyStatsd` instance instead of an rspec double.
- calls to `statsd.increment` were taking incorrect arguments (some our fault, some part of the prior implementation), and were not collecting data on which pools were "invalid" or "empty". Fixed this and are now explicitly tracking the invalid/empty pool names.
* [QENG-4075] Drop now-superfluous :statsd config defaulting
* [QENG-4075] Unify graphite and statsd for the pool manager
Prior to this, the `pool_manager.rb` library could take handles for both
graphite and statsd endpoints (which were considered mutually exclusive),
and then would use one. There was a bevy of conditional logic around sending
metrics to the graphite/statsd handles (and actually at least one bug of
omission).
Here we refactor more, building on earlier work:
- Our graphite class comes into line with the API of our Statsd and DummyStatsd classes
- In `pool_manager.rb` we now accept a single "metrics" handle, and we drop all the conditional logic around statsd vs. graphite
- We move the inconsistent error handling out of the calling classes and into our metrics classes, actually logging to `$stderr` when we can't publish metrics
- We unify the setup code to use `config` to determine whether statsd, graphite, or a dummy metrics handle should be used, and make that happen.
- Cleaned up some tests. We could probably stand to do a bit more work in this area.
* [QENG-4075] Clean up pool manager, specs
Prior to this, `pool_manager.rb` allowed the `metrics` argument to be optional,
but at this point it will be an instance of `Vmpooler::Statsd`,
'Vmpooler::Graphite', or `Vmpooler::DummyStatsd`, so making this non-optional.
Cleaned up that file's tests, cosmetically, as well as recognizing that the
behavioral difference between graphite and statsd does not depend on the pool
manager.
* [QENG-4075] update example vmpooler.yaml file
This documents the changes to :server being mandatory for all metrics
endpoints, as well as the graphite endpoint supporting an optional :port
configuration value.
* [QENG-4075] Rename usages of statsd -> metrics
Really, let's just support a generic metrics interface.
* (maint) move statsd-ruby require into Vmpooler::Statsd class
We've managed to move mentions of this out of the calling code, so let's
move the require.
* (maint) metrics.log -> metrics.timing
We missed this during the refactoring. Bringing this up to date.
* [QENG-4075] Allow specifying 'graphs:' for dashboard
Prior to this the dashboard front-end would use the configuration settings
for `graphite[:server]`/`graphite[:prefix]` to locate a graphite server
to use for rendering graphs.
Now that we have multiple possible metrics backends, the front-end graph
host for the dashboard could be entirely different from the back-end metrics
server that we publish to (if any).
This decouples those settings:
- use `graphs[:server]` / `graphs[:prefix]` for the graphite-compatible web front-end to use for dashboard display graphs
- fall back to `graphite[:server]`/`graphite[:prefix]` if `graphs` is not specified, in order to support legacy `vmpooler.yaml` configurations.
Note that since `statsd` takes precedence over `graphite`, it's possible to specify both `statsd` (for publishing) and `graphite` (for reading). We still prefer `graphs` over `graphite`.
Updated the example `vmpooler.yaml` config file.
* (maint) fix variable reference in new_metrics
This was referencing config directly, when what we want is for a
hash to be passed in (derived from config).
* (maint) Fix typo in updated graph link call
* (maint) default :graphs prefix to 'vmpooler'
* (maint) Fix parse error in vmpooler script
The things you find through manual QA 🧌
* (maint) use strings instead of symbols in config
Nested hash data comes back with string keys, not symbols. Be consistent.
* [QENG-4075] Factor out Vmpooler::DummyStatsd
This makes it visible to lib/vmpooler.rb, as well as putting this dummy
metrics endpoint in its own file for easier discovery.
* (maint) clean up statsd inclusion and require lines
The library is actually required as 'statsd' and not 'ruby-statsd', best I can tell.
* (maint) construct ::Statsd instead of Statsd
Because it's ambiguous in this scope, and, well, it doesn't
actually work in production.
* [QENG-4075] Also track completely invalid requests
When we don't even get a pool name we still want metrics to be recorded.
Add an additional disk to a running VM via the vmpooler API.
````
$ curl -X POST -H X-AUTH-TOKEN:a9znth9dn01t416hrguu56ze37t790bl --url vmpooler.company.com/api/v1/vm/fq6qlpjlsskycq6/disk/8
````
````json
{
"ok": true,
"fq6qlpjlsskycq6": {
"disk": "+8mb"
}
}
````
Provisioning and attaching disks can take a moment, but once the task completes it will be reflected in a `GET /vm/<hostname>` query:
````
$ curl --url vmpooler.company.com/api/v1/vm/fq6qlpjlsskycq6
````
````json
{
"ok": true,
"fq6qlpjlsskycq6": {
"template": "debian-7-x86_64",
"lifetime": 2,
"running": 0.08,
"state": "running",
"disk": [
"+8mb"
],
"domain": "delivery.puppetlabs.net"
}
}
This commit adds the following functions:
- `add_disk`: the wrapper function to add a new disk to a VM
Usage is:
````
add_disk(vmname, disksize, datastore)
````
`vmname` is the name of the VM to add the disk to, `disksize` is the
disk size in MB, and `datastore` is the datastore on which to provision
the new disk.
`add_disk` required the addition of the following helper functions:
- `find_device`: locate a device object in vSphere
- `find_disk_controller`: find the disk controller used by a VM
- `find_disk_devices`: find the disk devices used by a VM
- `find_disk_unit_number`: find a free SCSI ID to assign to a new disk
- `find_vmdks`: find names of VMDK disks attached to a VM