Agenda:
- Templating LaunchSpecification
- Using overwrites for regions and instanceTypes
- default launchSpecification
- AMI
- Security group (ssh-only, in-http)
- userData: {data, extra}
- workerType.extra
- workerType.regions[region].extra
- workerType.instanceTypes[instanceType].extra
- extra = _.defaults({}, instance..., regions[r].extra, extra)
- workerType.scopes -> used to create temp creds in data
- for migration, insert .data things into the UserData twice, once as
- Injecting secret data using callback with token from user-data
- Timeout for spot-requests
- Expiration of unclaimed secret data
- Generation of temporary credentials (use '*' scope for now)
- deletion of secret instance data by callback
- Scale down strategy
- Can we track capacity in use (jonasfj thinks not)
- It should be tracked by a separate new system which lives outside the provisioner
- Adjust polling interval, by time to end of billing cycle
- poll slower towards the end of the billing cycle
- Hard kill nodes that fail to launch
- have a scanner on the secret data table that sees which secrets have expired and hard kill any instances or spot requests which are associated.
- secretDataTable: {spotRequestId, ...}
- We should cancel spot requests 10min before expires in a second pass
- Max life time of workers: 72 hours, hard killed after 96 hours
- we won't make this configurable yet
- worker maintains 72 hour kill
- provisioner maintains 96
- Soft terminate workers after workerType is updated
- pulse exchange worker-type-updated
- Hard kill old instances 1 hour after workerType update
- add lastModified to workerType data table
- AMI Management will happen outside the provisioner
- thing that creates the packer images will do the updates to the workerType through the provisioner API
- add provisioner endpoint to list all AMIs in all regions
- maybe provisioner could keep stats on when it provisions an instance with a specific AMI so we know which amis are old
BUGS:
1) 1145687 Insert lastModified into WorkerTypes stored
2) 1145692 rename 'overwrites', create userData and secrets keys in workerTypes and store as cascading-overwrite objects
3) 1145695 store secret data for use in workerType
4) 1145685 hard kill nodes which were created >96 hours ago
5) 1145688 hard kill nodes which were created more than one hour since lastModified
6) Store provisionerId/workerType info in UserData.data as well as top level UserData
Example: definition of workerType as we want it:
{
"workerType": "test",
"launchSpecification": {
"SecurityGroups": [
"default" // We use this for ssh-only and opening-http
],
},
// this key is added by provisioner API
"lastModified": new Date(),
"userData": {...},
"secrets": {...},
"scopes": [...],
"minCapacity": 1,
"maxCapacity": 30,
"scalingRatio": 1.0,
"minPrice": 0.2, // min price per utility factor
"maxPrice": 1, // max price per utility factor
"canUseOndemand": false,
"canUseSpot": true,
"instanceTypes": [
{
"instanceType": "m3.medium",
"capacity": 1,
"utility": 1,
"instanceUserData": {...<overwriting keys>...}
"instanceSecrets" {},
"instanceLaunchSpec": {
}
}
],
"regions": [
{
"region": "us-west-1",
"regionUserData": {...},
"regionSecrets" {},
"regionLaunchSpec": {
"ImageId": "ami-42908907"
}
}
]
}
UserData as given to instance:
{
provisionerBaseUrl // to get/delete secretData
secretToken: // to get/delete secretData
data: {} // opaque
capacity:
}
SecretData as given with secret token instance:
{
data: { /* JSON hardcoded in workerType definition*/ },
credentials: { // temp credentials
clientId,
accessToken
certificate
},
spotBid: ...
}
EXISTING:
{
"workerType": "test",
"launchSpecification": {
"SecurityGroups": [
"default"
],
"UserData": "eyJhIjoxfQ=="
},
"minCapacity": 1,
"maxCapacity": 30,
"scalingRatio": 1.0,
"minPrice": 0.2,
"maxPrice": 1,
"canUseOndemand": false,
"canUseSpot": true,
"instanceTypes": [{
"instanceType": "m3.medium",
"capacity": 1,
"utility": 1,
"overwrites": {
"UserData": "eyJhIjoxfQ=="
}
}],
"regions": [{
"region": "us-west-1",
"overwrites": {
"ImageId": "ami-42908907"
}
}]
}