Nona Blog

Converting images to WebP from CDN

The rise of WebP: A new image format for the web

The WebP format has become increasingly popular since Google introduced it in 2010. Its biggest selling point lies in its ability to produce much smaller file sizes while maintaining similar image quality. Faster load times = higher conversion rates.

WebP is a modern image format that provides superior lossless and lossy compression for images on the web. WebP lossless images are 26% smaller in size compared to PNGs. WebP lossy images are 25–34% smaller than comparable JPEG images at equivalent SSIM quality index.

Google
Source: https://bitsofco.de/why-and-how-to-use-webp-images-today/

Tooling and considerations

Tooling: AWS (S3, CDN, Lambda@Edge), Sharp, User Agent

There are a few considerations we have to make before getting to the code:

  1. Firstly, not all browsers support WebP. Currently, WebP is natively supported in later versions of Google Chrome, Firefox, Edge, the Opera browser, Android Browser and Samsung internet.
  2. We may have a store of hundreds or thousands of pictures that we want to convert from supporting browser requests.
  3. We have to change what the HTTP request and response objects look like.
WebP support table: https://caniuse.com/#feat=webp

The plan: On-the-fly conversion

We’re going to listen for requests to CDN, and return a WebP image for all supporting browsers, granted that a WebP image exists. Otherwise, we’re going to fetch the image in its original format and convert it to WebP and return the newly converted WebP image.

It’s going to be…

LEGENDARY!

CDN requests and responses

On top of the considerations, we have to understand what the CDN request and response objects look like:

CDN events (that can be used to trigger Lambda functions)

We’ll be triggering our lambdas with the viewer request and origin response objects. The reason for using an origin response is that we want to leverage CDN caching for responses where the image conversion has already happened. However, for requests, since we modify the request uri , we change the cache key, and therefore need to do this on every viewer request.

The CDN Request Object

Don’t get intimidated, the important thing is that this object has the request headers  object and the request uri  string.

{
  "Records": [
    {
      "cf": {
        "config": {
          "distributionDomainName": "d123.cloudfront.net",
          "distributionId": "EDFDVBD6EXAMPLE",
          "eventType": "viewer-request",
          "requestId": "MRVMF7KydIvxMWfJIglgwHQwZsbG2IhRJ07sn9AkKUFSHS9EXAMPLE=="
        },
        "request": {
          "body": {
            "action": "read-only",
            "data": "eyJ1c2VybmFtZSI6IkxhbWJkYUBFZGdlIiwiY29tbWVudCI6IlRoaXMgaXMgcmVxdWVzdCBib2R5In0=",
            "encoding": "base64",
            "inputTruncated": false
          },
          "clientIp": "2001:0db8:85a3:0:0:8a2e:0370:7334",
          "querystring": "size=large",
          "uri": "/picture.jpg",
          "method": "GET",
          "headers": {
            "host": [
              {
                "key": "Host",
                "value": "d111111abcdef8.cloudfront.net"
              }
            ],
            "user-agent": [
              {
                "key": "User-Agent",
                "value": "curl/7.51.0"
              }
            ]
          },
          "origin": {
            "custom": {
              "customHeaders": {
                "my-origin-custom-header": [
                  {
                    "key": "My-Origin-Custom-Header",
                    "value": "Test"
                  }
                ]
              },
              "domainName": "example.com",
              "keepaliveTimeout": 5,
              "path": "/custom_path",
              "port": 443,
              "protocol": "https",
              "readTimeout": 5,
              "sslProtocols": [
                "TLSv1",
                "TLSv1.1"
              ]
            }
          }
        }
      }
    }
  ]
}

The CDN Response Object

Again, fear not — what’s really important here is that we have access to the  headers  and the request  uri  string.

{
  "Records": [
    {
      "cf": {
        "config": {
            "distributionDomainName": "d123.cloudfront.net",
            "distributionId": "EDFDVBD6EXAMPLE",
            "eventType": "viewer-response",
            "requestId": "xGN7KWpVEmB9Dp7ctcVFQC4E-nrcOcEKS3QyAez--06dV7TEXAMPLE=="
        },
        "request": {
          "clientIp": "2001:0db8:85a3:0:0:8a2e:0370:7334",
          "method": "GET",
          "uri": "/picture.jpg",
          "querystring": "size=large",
          "headers": {
            "host": [
              {
                "key": "Host",
                "value": "d111111abcdef8.cloudfront.net"
              }
            ],
            "user-agent": [
              {
                "key": "User-Agent",
                "value": "curl/7.18.1"
              }
            ]
          }
        },
        "response": {
          "status": "200",
          "statusDescription": "OK",
          "headers": {
            "server": [
              {
                "key": "Server",
                "value": "MyCustomOrigin"
              }
            ],
            "set-cookie": [
              {
                  "key": "Set-Cookie",
                  "value": "theme=light"
              },
              {
                  "key": "Set-Cookie",
                  "value": "sessionToken=abc123; Expires=Wed, 09 Jun 2021 10:18:14 GMT"
              }
            ]
          }
        }
      }
    }
  ]
}

Let’s get coding!

Summary

Here are the steps we’re going to take:

  1. Listen for requests to CDN, and trigger a Lambda function that hijacks any viewer request.
  2. Determine if the request event is for an image and if the browser requesting the resource supports WebP based on the user-agent we receive from the request.
  3. If we determine that the request is for an image and that the browser supports WebP, we replace it with therequest uri image extension with .webp and add the original extension into the request header
  4. Next, we trigger a separate Lambda that hijacks any CDN origin response.
  5. If the request uri on the response event has a .webp extension, and the response status is a 404, we check our S3 bucket for the same image, but with the original extension, we placed into our request header in step 3.
  6. If we find an image with the original extension in S3, we run a WebP conversion using Sharp and place it in the origin response, otherwise, we leave the 404 response unaltered.

The code: Viewer Request

const userAgent = require('useragent')
const path = require('path')

exports.handler = async (event, context, callback) => {
  const request = event.Records[0].cf.request
  const headers = request.headers
  const userAgentString = headers['user-agent'] && headers['user-agent'][0] ? headers['user-agent'][0].value : null
  const agent = userAgent.lookup(userAgentString)

  const browsersToInclude = [
    { browser: 'Chrome', version: 23 },
    { browser: 'Opera', version: 15 },
    { browser: 'Android', version: 53 },
    { browser: 'Chrome Mobile', version: 55 },
    { browser: 'Opera Mobile', version: 37 },
    { browser: 'UC Browser', version: 11 },
    { browser: 'Samsung Internet', version: 4 }
  ]

  const supportingBrowser = browsersToInclude
    .find(browser => browser.browser === agent.family && agent.major >= browser.version)
    
  if (supportingBrowser) {
    const fileFormat = path.extname(request.uri).replace('.', '')
    request.headers['original-resource-type'] = [{
      key: 'Original-Resource-Type',
      value: `image/${fileFormat}`
    }]

    const olduri = request.uri
    const newuri = olduri.replace(/(\.jpg|\.png|\.jpeg)$/g, '.webp')
    request.uri = newuri
  }

  return callback(null, request)
}

The code for the Viewer Request Lambda is straight forward. It compares the browser and browser version from the request to a predefined list of supported browsers to determine WebP support and rewrites png, jpg and jpeg extensions to webp; and leaves all the heavy lifting to the Origin Response Lambda.

This leaves our function pretty lightweight, which is pretty ideal since Viewer Request and Response Lambda’s can’t be more than 1MB in size.

The code: Origin Response

const path = require('path')
const AWS = require('aws-sdk')

const S3 = new AWS.S3({
  signatureVersion: 'v4',
})

const Sharp = require('sharp')
const BUCKET = 'some-bucket'
const QUALITY = 75

exports.handler = async (event, context, callback) => {
  const { request, response } = event.Records[0].cf
  const { uri } = request
  const headers = response.headers

  if (path.extname(uri) === '.webp') {
    if (response.status === 404) {
      const format = reqeust.headers['original-resource-type'] && reqeust.headers['original-resource-type'][0]
        ? request.headers['resource-type'][0].value.replace('image/', '')
        : null

      const key = uri.substring(1)
      const s3key = key.replace('.webp', `.${format}`)

      try {
        const bucketResource = await S3.getObject({ Bucket: BUCKET, Key: s3key }).promise()
        const sharpImageBuffer = await Sharp(bucketResource.Body)
          .webp({ quality: +QUALITY })
          .toBuffer()

        await S3.putObject({
          Body: sharpImageBuffer,
          Bucket: BUCKET,
          ContentType: 'image/webp',
          CacheControl: 'max-age=31536000',
          Key,
          StorageClass: 'STANDARD'
        }).promise()

        response.status = 200
        response.body = sharpImageBuffer.toString('base64')
        response.bodyEncoding = 'base64'
        response.headers['content-type'] = [{ key: 'Content-Type', value: 'image/webp' }]
      } catch (error) {
        console.error(error)
      }
    } else {
      headers['content-type'] = [{
        'value': 'image/webp',
        'key': 'Content-Type'
      }]
    }
  }

  callback(null, response)
 }

The origin response function does all the heavy lifting. If the response status is a 404, it fetches request headers to determine the original file extension. It then replaces the webp extension in the request uri with the original file extension and queries S3 with the new uri (s3Key).

If it finds the file in S3, it then converts the image to WebP using Sharp, puts it in the S3 bucket, and places it in the response body as a base64 image. It finally sets the Content-Type header to image/webp. If it fails to find the image in the S3 bucket, it sets the Content-Type header to image/webp and leaves the response as a 404.

That’s it!

Gotchas

  1. If you’re deploying using the Serverless Application Model (I’ve attached a conjoined template in the appendix), make sure you use 2 separate projects for your viewer request and origin response functions — AWS won’t let you deploy viewer requests more than 1MB (Installing Sharp will make your zip exceed this).
  2. You need to give your functions the edgelambda.amazonaws.comexecution role.
  3. Cloudfront Triggers for Lambda@Edge are only available in us-east-1. Make sure your Lambda’s are deployed in that specific region.
  4. Cloudwatch logs for your Lambda’s won’t necessarily be in the us-east-1 region, instead, they’ll be in the region closest to where you’re making that response from (It’s CDN after all)
  5. If you’re on Mac OS, Sharp might not run if you install it locally and deploy it to AWS — it needs to be specifically installed for Linux. There are multiple ways to do this. Sharp recommends using t2.micro instance and ssh’ing into it; I find this unnecessarily complex and difficult to maintain across teams — I use a Docker container running Linux to install all my npm packages and create a zip that I push using aws sam. I’ve attached it in the appendix.

Appendix

Conjoined SAM Template

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Description: Rewrite jpg and png requests to webp if the browser supports webp
Resources:
  WebpTransformRequest:
    Type: 'AWS::Serverless::Function'
    Properties:
      CodeUri: lambda/resources.zip
      Handler: src/transformRequest.handler
      Runtime: nodejs8.10
      Timeout: 5
      Role: !GetAtt WebpExecutionRole.Arn
      FunctionName: WebpTransformRequest
  WebpTransformResponse:
    Type: 'AWS::Serverless::Function'
    Properties:
      CodeUri: lambda/resources.zip
      Handler: src/transformResponse.handler
      Runtime: nodejs8.10
      Timeout: 5
      Role: !GetAtt WebpExecutionRole.Arn
      FunctionName: WebpTransformResponse

  # ==== ROLES ==== #
  WebpExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
        - Effect: Allow
          Action: sts:AssumeRole
          Principal:
            Service:
            - lambda.amazonaws.com
            - edgelambda.amazonaws.com

  # ==== POLICIES ==== #
  PublishLogsPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      Description: Allows functions to write logs
      Roles:
      - !Ref WebpExecutionRole
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Action:
            - logs:CreateLogGroup
            - logs:CreateLogStream
            - logs:PutLogEvents
            Resource: '*'

Creating Functions ZIP from Docker Container

With the  Makefile  and  Dockerfile  in your root, run make all

.PHONY: all image install package rebuild-sharp dist update-zip-content clean-files remove-docker-image

all: image install package rebuild-sharp dist update-zip-content clean-files

image:
	docker build --tag lambci/lambda:build-nodejs8.10 .

package: image
	docker run --rm --volume "${PWD}/lambda":/build lambci/lambda:build-nodejs8.10 yum -y install make && cd lambda && ls && pwd && npm install

rebuild-sharp: package
	cd lambda && rm -rf node_modules/sharp && npm install --arch=x64 --platform=linux --target=8.10.0 sharp

dist: rebuild-sharp
	cd lambda && zip -FS -q -r resources.zip *

update-zip-contents: dist
	zip -ur lambda/resources.zip src

clean-files: update-zip-contents
	rm -r lambda/node_modules

remove-docker-image:
	docker rmi --force lambci/lambda:build-nodejs8.10
FROM amazonlinux

ADD etc/nodesource.gpg.key /etc # Find this here: https://rpm.nodesource.com/pub/el/NODESOURCE-GPG-SIGNING-KEY-EL and place in /etc

WORKDIR /tmp

RUN yum -y install gcc-c++ && \
    rpm --import /etc/nodesource.gpg.key && \
    curl --location --output ns.rpm https://rpm.nodesource.com/pub_6.x/el/7/x86_64/nodejs-6.10.1-1nodesource.el7.centos.x86_64.rpm && \
    rpm --checksig ns.rpm && \
    rpm --install --force ns.rpm && \
    npm install -g npm@latest && \
    npm cache clean --force && \
    yum clean all && \
    npm i -g nw-gyp && \
    rm --force ns.rpm

WORKDIR /build

Add comment