/aws-replication-s3

High Level Options:

(James is leaning toward #3 for best use with testers/builders which all operate in short time windows then never touch the data at scale again)

#1: Replicate everything as the artifacts are created (assuming they are above a particular size) in N regions and expire them all at once.

#2: Replicate artifacts only as they are requested from other regions:

    This method would work by keeping track of how many requests we get for a particular artifact in a given region. For example:

     - Request #1
         -> we decide to initiate the replication
         -> serve from the "default" bucket in whatever region that is
    - Request #2
        -> we find a pending replication in our desired region (not ready yet)
        -> serve from the "default" bucket in whatever region that is
    - Request #3
        -> we find the completed replication
        -> we serve from our desired region


#3: Optimistically replicate into N regions (based on some criteria) and expire if the region becomes "cold" or the artifact is expired.

    When creating artifacts we either implicitly (via s3 region sync) or explicitly (through aws events for s3) copy artifacts to N regions. Insert one record for each region into "storage" with an expiration date (this can be as low as an hour).

    When user requests an artifact:
        -> check for artifact in desired region
            a. no artifact serve from "default" region
            b. artifact found serve from desired region
                -> Increase the expiration time in the record set for this region/artifact combination

This process also requires a background process which will expire the region specific caches of artifacts. This process will simply look for records which have expired prior to the current date.

E) Replicate with S3 (region sync)
We setup region sync, the regions we sync to have a life cycle policy of say 48 hours.
Whenever, we handle GET artifact, we do a HEAD to the replica in desired region, if it's
there we do a redirect to replica in desired region. Otherwise fallback to us-west-2.
Downside, after 48 hours, we'll never have a replica.

F) Replicate with S3 COPY (on first GET request)
On GET artifact:
If request is from other region:
if artifactEntity.regions[region].expires < now():
     redirect to desired region replica
else:
    if artifactEntity.regions[region].replicaRequested == false:
      // this pattern avoids multiple S3_COPY operations (well, limits it a lot for sure)
      azure_qeueu.put_message(replicate_Queue, artifactEntity, region)
      artifactEntity.modify ->
        artifactEntity.regions[region].replicaRequested = true
    redirect to us-west-2

In background worker:
artifactEntity, region = azure_queue.get_message(replica_queue)
S3 COPY(artifactEntity.source, region)
artifactEntity.modify ->
     artifactEntity.regions[region].expires = now() + S3_configured_lifecycle;
     artifactEntity.regions[region].replicaRequested = false;
azure_queue.delete_message(replica_queue, artifactEntity, region)

G) Replicate in an "s3-copy-proxy" component
On GET /<bucket>/<key>
Detect region from request
replicaEntity = ReplicaEntity.load(bucket, key) // or create empty one
if replicaEntity.regions[region].expires > now() + 15 min:
redirect to replicaEntity.regions[region].bucket + / + <key>
else:
// Employ long pooling tricks to keep connection from timing out:
// https://github.com/extrabacon/http-delayed-response
// Perhaps modify replicaEntity, so only one reader replicates it and other readers just poll
// waiting for the replicating reader to finish S3_COPY
S3_COPY(<bucket>/<key>, replicaEntity.regions[region].bucket + / + <key>)
replicaEntity.modify ->
    replicaEntity.regions[region].expires = now() + S3_configured_life_cycle
redirect to replicaEntity.regions[region].bucket + / + <key>

Useful info:

S3 life cycle policy (to expire replicatas)
eventual consistency

------------------ Temporary (Proxy) Solutions------------------

Need info:
read-after-write consistency us-east-1, NOT POSSIBLE!

A) Querystring redirecting proxy (backed by disk):
For ips in us.-east-1 queue redirects to:
<proxy-host>.com/?fetch=<s3-url>

Problems:

if <s3-url> is a signed URL... we have a problem. Won't work for secret artifacts
- Can't really see how to make any proxy scheme work for secret artifacts
- Maybe we could validate S3 signed URL :)
Not scalable...
Not sure ?fetch= works well with existing proxy schemes...

B) Transparent proxy at docker-worker level:
Probably bad... same as above....

C) Querystring redirecting proxy (backed by S3):
- scalable
- won't work in us-east-1
- Might as well be implemented on server side... ie. in queue.

D) Put CloudFlare infront of S3

Equivalent of using cloudfront
Free (not metered by GB)
We only pay of GB to cloudflare
Unreliable... docker certainly had issues with cloudflare..
Worth a testing :)

From their site:
The maximum file size CloudFlare's CDN will cache is 512mb.
https://www.cloudflare.com/plans
https://support.cloudflare.com/hc/en-us/articles/200172736-Is-CloudFlare-a-free-CDN-Content-Delivery-Network-

--- Further Ideas ---

Nginx + S3 Upload Proxy
----------------------------------

Both implicit and explicit copies are slow and come at some cost (Copy and Automatic region transfer). The approach I am proposing next combines the nginx proxy (we already tried) with a go http layer which will upload to s3 _directly_ as it pulls from the "source" region (all while serving content back to nginx). The important bit here is to use the Go layer to limit the amount of outbound traffic for the same objects (as we saw in the other proxy attempts we get overloaded simply trying to download the same type of object).