Commit graph

423 commits

Author SHA1 Message Date
kirby@puppetlabs.com
f8bd79a8d9 Handle empty queues in pool manager
Remove unneeded begin block in method

Fix formatting of rescue block in fail_pending_vm
2016-11-22 10:03:10 -08:00
kirby@puppetlabs.com
a244f9b92a Stop reloading configuration file from vspherehelper and instead source credentials from the configuration object that itself loads the configuration file when the application starts. Without this change the configuration file is reloaded every time vspherehelper is called. Additionally, this change makes it more straightforward to test vspherehelper connections.
A method is added to make more clear what's happening when checking if a socket can be opened to a pending VM on port 22. Additionally, the connection appends domain from the configuration, when present, to the VM name so DNS search is not required.
2016-11-22 09:48:28 -08:00
Scott Schneider
dab267a017 Fix JavaScript error on nil weekly_data
Without this patch the dashboard will error-out on unfound `weekly_data`
values.
2016-11-21 17:05:58 -08:00
kirby@puppetlabs.com
58a548bc90 Add support for migrating VMs to pool_manager.
This commit adds a capability to pool_manager to migrate VMs placed in the migrating queue. When a VM is checked out an entry is created in vmpooler__migrating. The existing process for evaluating VM states executes the migrate_vm method for the provided VM, and removes it from the queue. The least used compatible host for the provided VM is selected and, if necessary, a migration to the lesser used host is performed. Migration time and time from the task being queued until completion are both tracked with the redis VM object in 'migration_time' and 'checkout_to_migration'. The migration time is logged in the vmpooler.log, or the VM is reported as not requiring migration. Without this change VMs are not evaluated for checkout at request time.

Add a method to wrap find_vm and find_vm_heavy in order to allow a
single operation to be performed that does both.

This commit also adds support for a configuration setting called
migration_limit that makes migration at checkout optional. Additionally,
logging is added to report a VM parent host when it is checked out. Without this
change vmpooler assumes that migration at checkout is always enabled.
If this setting is not present, or if the setting is 0, then migration
at checkout will be disabled. If the setting is greater than 0 then that
setting will be used to enforce a limit for the number of simultaneous
migrations that will be evaluated.

Documentation of this configuration option is added to the
vmpooler.yaml.example file.
2016-11-08 16:32:10 -08:00
kirby@puppetlabs.com
538f30af8e Update find_least_used_host method to evaluate hosts based on utilization. Without this change the determination is based on VM count. Additionally, a method is added to find the least used host compatible with the provided VM in order to support migrating a VM at checkout. Lastly, a capability is added to migrate VMs to a provided host, which also supports migrating VMs at checkout.
Add method to check if vsphere connection is alive. Replace repeated usage of checking the current time in a begin/rescue block with this method.
2016-10-28 14:24:28 -07:00
kirby@puppetlabs.com
506c124578 Add vmpooler__migrating__vm at VM checkout 2016-10-14 12:40:33 -07:00
Rick Bradley
5a1d547830 [QENG-4181] Add inline documentation for /status endpoint
It's useful to be able to see, in the code, what sort of output we
are generating with endpoints like `/status`.
2016-09-10 12:36:25 -05:00
Rick Bradley
30dc060731 [QENG-4181] Add per-pool stats to /status API
Prior to this the only per-pool statistics that could be extracted from the API
were a list of empty pools in the "status" section of the returned results of
the `/status` endpoint.

This adds a new "pools" section to the '/status' results which lists, for each
pool, the following results:

 - The number of ready vms in the pool
 - The number of running vms in the pool
 - The number of pending vms in the pool
 - The maximum size of the pool (as specified in the vmpooler configuration)

Example:

```
{
  "boot": {
  "duration": {
    "average": 163.6,
    "min": 65.49,
    "max": 830.07,
    "total": 247744.71000000002
  },
  "count": {
    "total": 1514
  }
  # ...
  "pools": {
    "pool1": {
      "ready":   5,
      "running": 2,
      "pending": 1,
      "max":     15
    },
    "pool2": {
      "ready":   0,
      "running": 10,
      "pending": 0,
      "max:      10
    }
  }
}
```

This includes spec coverage for this change (we could use more specs on `/status` in general); as well as a couple of general spec improvements.
2016-09-09 17:00:47 -05:00
Rick Sherman
34b93ca6c3 Merge CI.next into Master (#161)
* [QENG-3919] Make vmpooler checkouts be all or nothing (#153)

* (QENG-3919) spike for implementation of all-or-nothing checkout

* Fix two botched variable references

* Aggregate API helper methods

* Add specs for failed multi-vm allocation API endpoints

* (QENG-3919) Add tests for multiple vm requests

* (QENG-3919) Add (failing) specs for POST /vm/pool1+pool2 usages

This exposes the old (bad) behavior on this other code path. Will fix this up next.

* (QENG-3919) Bring query params version in line with JSON post version

Not clear to me why these had to be implemented so differently.

* (QENG-3919) extract common method from both methods of VM allocation

* (QENG-3919) Naming fix, cosmetic cleanups

I mean, I presume all these commits are going to get squashed away on merge anyway.

* (QENG-3919) Update API docs

We consider it a bug that the actual behavior was not this behavior, but the
documentation was also silent on this point.

* (QENG-3919) minor readability tweak in refactored method

* (QENG-3919) Clean up interim comments re: status codes

* (QENG-3919) Drop now-orphaned `checkout_vm` method

We kept this up-to-date while we were upgrading and refactoring, but, turns out,
this method is no longer called anywhere.  💀 🔥

* (QENG-3919) Return 503 status on failed allocation

Making sure we go back to the original functionality, which was:

 - status 200 when vms successfully allocated
 - status 404 when a pool name is unknown
 - status 404 when no pool name is specified
 - status 503 when vm allocation failed

* (QENG-3919) add net-ldap to Gemfile

Maybe we shouldn't foil-ball gems onto servers.

* (QENG-3919) Turns out, spush isn't a redis command

And hence we see once again the weakness of mockist tests.

* (QENG-3919) Pin the net-ldap gem to 0.11 for the jrubies, etc.

* (QENG-3919) Correct an old spelling error in spec descriptions

* (QENG-3919) Further tweak net-ldap version

* (QENG-3919) return_single_vm -> return_vm_to_ready_state

cc @shermdog

* (RE-7014) Add support for statsd
They way we were using graphite was incorrect for the type of data we were sending it.  statsd is the appropriate mechanism for our needs.
statsd and graphite are mutually exclusive and configuring statsd will take precendence over Graphite.  Example of configuration in vmpooler.yaml.example

* (RE-7014) Add tracking of vm gets via statsd
Add the tracking of successful, failed, invalid, and empty pool vm gets.  It is possible we may want to tweak this, but have validated with spec tests and pcaps.

```
vmpooler-tmp-dev.ready.debian-7-x86_64:1|c
vmpooler-tmp-dev.running.debian-7-x86_64:1|c

vmpooler-tmp-dev.checkout.invalid:1|c
vmpooler-tmp-dev.checkout.success.debian-7-x86_64:1|c
vmpooler-tmp-dev.checkout.empty:1|c

vmpooler-tmp-dev.running.debian-7-x86_64:1|c

vmpooler-tmp-dev.clone.debian-7-x86_64:12.10|ms

vmpooler-tmp-dev.ready.debian-7-x86_64:1|c
```

* (RE-7014) statsd nitpicks and additional rspec
Cleaned up some code review nitpicks and added pool_manager_spec for empty pool.

* (RE-7014) update statsd to use gauge for running/ready
Previously was using increment which was incorrect for that particular application.

* Revert "Merge pull request #155 from shermdog/RE-7014-cinext"

This reverts commit cc03a86f6a, reversing
changes made to 5aaab7c5c2.

* (QENG-4070) Consistently return 503 if valid pool is empty

There were several problems with how the pooler checked out vms with
respect to empty pools, invalid pools, and aliases:

- If the vmpooler config did not contain any aliases and the caller
requested a vm from an empty pool or a non-existent one, the vmpooler
would error with:

    NoMethodError - undefined method `[]' for nil:NilClass

If the config contained a non-nil alias section, then:

- If the caller requested a vm from an empty pool and either the vm
didn't have an alias or the aliased pool was empty or non-existent, then
the request for that vm would be silently ignored. The vmpooler would
return 200 if the caller asked for multiple vms and the vmpooler was
able to checkout at least one vm.  Otherwise it would return 404.

- Similarly, if the caller requested a vm from a non-existent pool, then
the request was silently ignored.

This commit adds a `pool_names` Set to the config containing all valid
pool names including aliases. This is used to determine whether a
requested template name is valid or not. This is necessary because redis
does not distinguish between empty and non-existent sets, e.g. the
following returns false in both cases:

    backend.exists('vmpooler__ready__' + key)

If the caller requests a vm (single or multiple), and any vm references
an invalid pool name, we immediately return 404. Otherwise, we know the
request is for valid pool names, since the vmpooler requires a restart
to change pool names and counts.

We then attempt to acquire each vm, trying to match on pool name or
failing back to aliased pool name, as was the previous behavior.

The resulting behavior is:

- If the caller asks for at least one vm from an unknown pool, then
don't try to checkout any vms and respond with 404.
- If the caller asks for a vm, and at least one pool is empty, then
respond with 503, returning checked out vms back to the pool.
- Otherwise return 200 with the list of checked out vms.

This commit also makes `alias` optional again.

This commit also re-enables tests that were merged in from master, but
originally commented out due to the bugs described above..

* (maint) Add json pessimistic pin

json 2.0.x was released on July 1 and is not compatible with ruby < 2.0.
Since we still support that version, add a pessimistic pin, which is
what we were using prior to July 1.

* [QENG-4070] Make json version conditional on RUBY_VERSION

* Drop extraneous mocks from updated test

* Revert "Revert "Merge pull request #155 from shermdog/RE-7014-cinext""

This reverts commit 0fd6fff934.

* Fix some spec errors

These were caused in part by dropping changes from the original PR when we
dropped the v1_spec.rb master test file (in favor of the updated and separated
versions).

* [QENG-4075] Fix bug with template name on allocation failure

We're returning [nil,nil] in this case, meaning that name will not be set. This
means we'll get an error trying to concatenate the stats string. Use the
requested template name here instead.

* [QENG-4075] Refactor statsd methods / classes

Prior to this we could easily run into situations where `statds_prefix` would
be `nil` (and possibly the `statsd` handle itself). There was some significant
complexity and brittleness in how statsd was set up.

Refactored so that:

 - `statsd_prefix` is no longer exposed to any callers of statsd methods
 - there is now a `Vmpooler::DummyStatsd` class which can be returned when we are not actually going to publish stats, but would like to keep the calling interface consistent
 - setup of the statsd handle is via just passing in `config[:statsd]`, if `nil`, this will result in a dummy handle being return
 - defaulting of `server` values was fixed -- this did not actually work in the previous implementation. `config[:statsd][:server]` is now required.
 - tests use a `DummyStatsd` instance instead of an rspec double.
 - calls to `statsd.increment` were taking incorrect arguments (some our fault, some part of the prior implementation), and were not collecting data on which pools were "invalid" or "empty". Fixed this and are now explicitly tracking the invalid/empty pool names.

* [QENG-4075] Drop now-superfluous :statsd config defaulting

* [QENG-4075] Unify graphite and statsd for the pool manager

Prior to this, the `pool_manager.rb` library could take handles for both
graphite and statsd endpoints (which were considered mutually exclusive),
and then would use one. There was a bevy of conditional logic around sending
metrics to the graphite/statsd handles (and actually at least one bug of
omission).

Here we refactor more, building on earlier work:

 - Our graphite class comes into line with the API of our Statsd and DummyStatsd classes
 - In `pool_manager.rb` we now accept a single "metrics" handle, and we drop all the conditional logic around statsd vs. graphite
 - We move the inconsistent error handling out of the calling classes and into our metrics classes, actually logging to `$stderr` when we can't publish metrics
 - We unify the setup code to use `config` to determine whether statsd, graphite, or a dummy metrics handle should be used, and make that happen.
 - Cleaned up some tests. We could probably stand to do a bit more work in this area.

* [QENG-4075] Clean up pool manager, specs

Prior to this, `pool_manager.rb` allowed the `metrics` argument to be optional,
but at this point it will be an instance of `Vmpooler::Statsd`,
'Vmpooler::Graphite', or `Vmpooler::DummyStatsd`, so making this non-optional.

Cleaned up that file's tests, cosmetically, as well as recognizing that the
behavioral difference between graphite and statsd does not depend on the pool
manager.

* [QENG-4075] update example vmpooler.yaml file

This documents the changes to :server being mandatory for all metrics
endpoints, as well as the graphite endpoint supporting an optional :port
configuration value.

* [QENG-4075] Rename usages of statsd -> metrics

Really, let's just support a generic metrics interface.

* (maint) move statsd-ruby require into Vmpooler::Statsd class

We've managed to move mentions of this out of the calling code, so let's
move the require.

* (maint) metrics.log -> metrics.timing

We missed this during the refactoring. Bringing this up to date.

* [QENG-4075] Allow specifying 'graphs:' for dashboard

Prior to this the dashboard front-end would use the configuration settings
for `graphite[:server]`/`graphite[:prefix]` to locate a graphite server
to use for rendering graphs.

Now that we have multiple possible metrics backends, the front-end graph
host for the dashboard could be entirely different from the back-end metrics
server that we publish to (if any).

This decouples those settings:

 - use `graphs[:server]` / `graphs[:prefix]` for the graphite-compatible web front-end to use for dashboard display graphs
 - fall back to `graphite[:server]`/`graphite[:prefix]` if `graphs` is not specified, in order to support legacy `vmpooler.yaml` configurations.

Note that since `statsd` takes precedence over `graphite`, it's possible to specify both `statsd` (for publishing) and `graphite` (for reading). We still prefer `graphs` over `graphite`.

Updated the example `vmpooler.yaml` config file.

* (maint) fix variable reference in new_metrics

This was referencing config directly, when what we want is for a
hash to be passed in (derived from config).

* (maint) Fix typo in updated graph link call

* (maint) default :graphs prefix to 'vmpooler'

* (maint) Fix parse error in vmpooler script

The things you find through manual QA 🧌

* (maint) use strings instead of symbols in config

Nested hash data comes back with string keys, not symbols. Be consistent.

* [QENG-4075] Factor out Vmpooler::DummyStatsd

This makes it visible to lib/vmpooler.rb, as well as putting this dummy
metrics endpoint in its own file for easier discovery.

* (maint) clean up statsd inclusion and require lines

The library is actually required as 'statsd' and not 'ruby-statsd', best I can tell.

* (maint) construct ::Statsd instead of Statsd

Because it's ambiguous in this scope, and, well, it doesn't
actually work in production.

* [QENG-4075] Also track completely invalid requests

When we don't even get a pool name we still want metrics to be recorded.
2016-07-25 10:43:32 -05:00
FOXX
4e2a1fb62c Added IP lookup functionality for /vm/hostname (#154) 2016-06-28 17:29:06 -05:00
Colin
d4f3eb3c5f Merge pull request #147 from sschneid/add_disk
Allow new disks to be added to running VMs via vmpooler API
2016-01-14 14:38:24 -08:00
Scott Schneider
48a1a8d621 Add new disks via API
Add an additional disk to a running VM via the vmpooler API.

````
$ curl -X POST -H X-AUTH-TOKEN:a9znth9dn01t416hrguu56ze37t790bl --url vmpooler.company.com/api/v1/vm/fq6qlpjlsskycq6/disk/8
````
````json
{
  "ok": true,
  "fq6qlpjlsskycq6": {
    "disk": "+8mb"
  }
}
````

Provisioning and attaching disks can take a moment, but once the task completes it will be reflected in a `GET /vm/<hostname>` query:

````
$ curl --url vmpooler.company.com/api/v1/vm/fq6qlpjlsskycq6
````
````json
{
  "ok": true,
  "fq6qlpjlsskycq6": {
    "template": "debian-7-x86_64",
    "lifetime": 2,
    "running": 0.08,
    "state": "running",
    "disk": [
      "+8mb"
    ],
    "domain": "delivery.puppetlabs.net"
  }
}
2016-01-14 10:46:57 -08:00
Scott Schneider
7d0f7254ae Disk-adding functionality for vsphere_helper lib
This commit adds the following functions:

- `add_disk`: the wrapper function to add a new disk to a VM

Usage is:

````
add_disk(vmname, disksize, datastore)
````

`vmname` is the name of the VM to add the disk to, `disksize` is the
disk size in MB, and `datastore` is the datastore on which to provision
the new disk.

`add_disk` required the addition of the following helper functions:

- `find_device`: locate a device object in vSphere
- `find_disk_controller`: find the disk controller used by a VM
- `find_disk_devices`: find the disk devices used by a VM
- `find_disk_unit_number`: find a free SCSI ID to assign to a new disk
- `find_vmdks`: find names of VMDK disks attached to a VM
2016-01-14 10:35:22 -08:00
FOXX
10e507c262 Added prefix parameter to the vmpooler configuration 2016-01-14 11:42:12 -06:00
Colin
ad4e760f56 Merge pull request #109 from sschneid/dashboard2
An updated dashboard
2016-01-13 15:15:56 -08:00
Scott Schneider
5f787a3ca7 dashboard2 2016-01-13 12:01:05 -08:00
Scott Schneider
1a6cd99ed2 Merge pull request #139 from heathseals/extraconfig
add guestinfo.hostname to VirtualMachineConfigSpecs
2015-11-13 10:13:55 -08:00
Scott Schneider
20fa7d20be Merge pull request #138 from sschneid/qeng_2807
(QENG-2807) Allow pool 'alias' names
2015-11-13 09:24:48 -08:00
Heath Seals
6b9bcc4307 add guestinfo.hostname to VirtualMachineConfigSpecs
This commit adds a custom guestinfo keyword and hostname variable
that allows the VMware Tools to query the hostname.
2015-11-10 16:50:27 -08:00
Scott Schneider
17b24d69ad Allow pool 'alias' names
The following pool configuration would allow a pool to be aliased in POST
requests as 'centos-6-x86_64', 'centos-6-amd64', or 'centos-6-64':

````yaml
- name: 'centos-6-x86_64'
  alias: [ 'centos-6-amd64', 'centos-6-64' ]
  template: 'templates/centos-6-x86_64'
  folder: 'vmpooler/centos-6-x86_64'
  datastore: 'instance1'
  size: 5
````

The 'alias' configuration can be either a string or an array.

Note that even when requesting an alias, the pool's 'name' is returned in
the JSON response:

````
$ curl -d '{"centos-6-64":"1"}' --url vmpooler/api/v1/vm
````
````json
{
  "ok": true,
  "centos-6-x86_64": {
    "hostname": "cuna2qeahwlzji7"
  },
  "domain": "company.com"
}
````
2015-11-05 11:51:53 -08:00
Scott Schneider
d74c9ff512 Don't require username/password authentication for GET /token/:token route 2015-11-04 13:19:15 -08:00
Scott Schneider
e0356968df (QENG-2995) Display associated VMs in GET /token/:token endpoint 2015-11-04 12:35:35 -08:00
Colin
7b9b178861 (MAINT) Remove Ping Check on Running VMs
Prior to this commit, a running VM could fail a ping check and be
destroyed. This causes issues when network hiccups occur or the machine
is performing a reboot.

A VM that is in a ready state will now be destroyed when handed back or
it hits the lifetime TTL.
2015-10-02 13:03:48 -07:00
Colin
b8bdfe1301 (maint) Move VM Only When SSH Check Succeeds
An SSH check was added before moving a VM from pending to ready.
However, the result of that check did not matter and move_pending would
still be called. This moves the move_pending call to within the begin
block that holds the SSH check. If the check fails, then only
fail_pending will be called.
2015-09-17 12:54:29 -07:00
Scott Schneider
5b6985c3a7 (QENG-2952) Check that SSH is available
SSH should be available before a VM is moved from the 'pending' queue to
'ready'.

`check_ssh` should probably be a function in the tradition of DRY; I'm
going to hopefully follow up this PR with a `Vmpooler::Utility` library.
2015-09-17 11:12:51 -07:00
Scott Schneider
906ae89987 Remove duplicate (nested) "ok" responses
As we approach an "official" v1.0.0 of the API I'd like to remove some old
nested "ok" responses.  These were left in as the Beaker vmpooler
hypervisor used them, but I long-ago patched that code and I think it's
time to deprecate these.
2015-08-21 13:58:07 -07:00
Scott Schneider
89ce70dba9 Track token use times
* rename the Redis token 'timestamp' var to 'created'
* update the Redis token 'last' var when token is successfully validataed
* expose the Redis token 'last' var in GET /token route
2015-08-20 19:54:59 -07:00
Scott Schneider
492cfb06a3 List tokens via GET /token 2015-08-20 18:50:51 -07:00
Colin
acb95d34c8 (MAINT) Reduce redis Calls in API
The return values from most redis calls inform the caller of whether a
key or hash value exists. Several exists() calls can be removed in
favor of this approach.

Updated spec tests to account for a removal of exists() and ismember()
calls in API tests.
2015-07-28 14:47:01 -07:00
Scott Schneider
add88c7bba (QENG-1304) vmpooler should require an auth key for VM destruction 2015-07-28 12:03:14 -07:00
Scott Schneider
85aad61192 Fix snapshort revert functionality 2015-07-16 11:41:00 -07:00
Scott Schneider
1689133b19 Require an auth token to use snapshots 2015-07-16 10:59:30 -07:00
Scott Schneider
fe65d5b11b Merge branch 'master' into host_snapshots 2015-07-16 10:42:16 -07:00
Scott Schneider
1c3045fd65 Host snapshot functionality 2015-07-16 10:29:49 -07:00
Scott Schneider
821ffd866a Log empty pools
Make a note in the logfile when a pool is detected to be empty.

Also:

- vmpooler__empty__<pool> Redis key to determine when to log
- lifetime/TTL checks moved to `_check_running_vm` method
  - no longer pay attention to VMware-based 'host.runtime.bootTime'

This PR implements a bunch of other stuff to account for rspec testing:

- Thread creation and looping in `check_pool`
- Everything else in `_check_pool`
2015-07-07 11:12:46 -07:00
Scott Schneider
c720f12c05 Move tag-filtering and exporting to API helper methods 2015-06-30 19:45:16 -07:00
Scott Schneider
6523062b62 Allow for only a [configurable] tag set 2015-06-30 12:54:46 -07:00
Scott Schneider
3aa8389749 Discard/skip empty tags 2015-06-30 11:20:13 -07:00
Colin
b6cb20ba9f Merge pull request #108 from sschneid/api_summary_reorg
API summary rework
2015-06-08 11:28:45 -07:00
Scott Schneider
d3f4f6fb77 Rerouting for new /summary routes 2015-06-04 14:55:26 -07:00
Scott Schneider
ce05c94677 Generate summaries from helpers; individual routes
- '/summary*' routes are now generated from helper methods
- many '/summary/...' combinations now possible
  - '/summary/tag'
  - '/summary/tag/beaker_version'
  - '/summary/boot'
  - '/summary/boot/duration'
  - '/summary/clone'
  - '/summary/clone/count?from=2015-06-01'
  - etc.
2015-06-04 14:55:19 -07:00
Scott Schneider
d938a50ee8 Add get_tag_summary and get_task_summary helpers 2015-06-04 10:39:07 -07:00
Scott Schneider
1f62379be8 Only filter regex matches
and a spec test for it.

Previously using the example shown in vmpooler.yaml.example was failing
to tag strings WITHOUT a '/' in them.
2015-06-02 19:12:30 -07:00
Scott Schneider
4bed6edde4 This implements regex-based tag filtering 2015-06-02 10:53:14 -07:00
Roger Ignazio
ae91077494 Merge pull request #104 from colinPL/qeng_2360
(QENG-2360) check_running_vm Spec Tests
2015-05-19 15:06:46 -07:00
Colin
dec95ba693 (QENG-2360) check_running_vm Spec Tests
Add spec tests for pool_manager#check_running_vm. In the process of
writing these tests, the method was broken in to smaller methods for
testability reasons.
2015-05-19 10:23:31 -07:00
Scott Schneider
4cfc078684 Create daily tag indexes, report in /summary
- Store daily tag roll-ups in vmpooler__tag__<date>
- GET /summary will display daily tag counts and roll-up
2015-05-07 15:24:08 -07:00
Colin
640b1ef4da Merge pull request #101 from sschneid/token_metadata_in_vm_obj
Store token metadata in vmpooler__vm__ Redis hash
2015-05-06 13:33:01 -07:00
Scott Schneider
64bbd7c973 Display VM state in GET /vm/:hostname route 2015-04-30 19:38:31 -07:00
Scott Schneider
7bddfdef1b Store token metadata in vmpooler__vm__ Redis hash 2015-04-30 19:29:18 -07:00