Creating a Static Website on Google Cloud (with SSL, Access Logging)

Goals:

Install gsutil

Install gsutil, the Google Cloud command line tool:

$ curl https://sdk.cloud.google.com | bash
$ exec -l $SHELL
$ gcloud init

Create “bucket”

Created a bucket through the web interface, or using the gsutil command line tool:

$ gsutil mb gs://mentor.maxmautner.com

Push files

Set the bucket permissions & settings using gsutil instead of through the web console:

$ gsutil acl ch -u AllUsers:R gs://mentor.maxmautner.com
$ gsutil defacl set public-read gs://mentor.maxmautner.com
$ gsutil web set -m index.html -e 404.html gs://mentor.maxmautner.com

Created a dummy index.html, 404.html file:

Upload the files to our bucket:

$ gsutil rsync -R ./ gs://mentor.maxmautner.com

Verify file uploaded:

$ gsutil ls -a gs://mentor.maxmautner.com

Setup Custom Domain

Create a CNAME record for your subdomain (e.g. www) to point to c.storage.googleapis.com.

Access Logs

Setting up log delivery:

$ gsutil mb gs://maxmautner-logs-bucket
$ gsutil acl ch -g cloud-storage-analytics@google.com:W gs://maxmautner-logs-bucket
$ gsutil defacl set project-private gs://maxmautner-logs-bucket
$ gsutil logging set on -b gs://maxmautner-logs-bucket -o mentor gs://mentor.maxmautner.com

Checking that logging is successfully setup for your bucket:

$ gsutil logging get gs://mentor.maxmautner.com
{"logBucket": "maxmautner-logs", "logObjectPrefix": "mentor"}

Requests logs are created hourly (source)

I figured I would/should copy the official docs on this topic as this is critical information (March 10th 2018):

Usage logs are generated hourly when there is activity to report in the monitored bucket. Usage logs are typically created 15 minutes after the end of the hour.

Note:

  • Any log processing of usage logs should take into account the possibility that they may be delivered later than 15 minutes after the end of an hour.
  • Usually, hourly usage log object(s) contain records for all usage that occurred during that hour. Occasionally, an hourly usage log object contains records for an earlier hour, but never for a later hour.
  • Cloud Storage may write multiple log objects for the same hour.
  • Occasionally, a single record may appear twice in the usage logs. While we make our best effort to remove duplicate records, your log processing should be able to remove them if it is critical to your log analysis. You can use the s_request_id field to detect duplicates.

Query Logs

Query access logs using SQL (BigQuery?)

Download this JSON manifest for mapping the log file format into BigQuery

Run this command to load our log files into the bigquery usage table::

$ bq load --skip_leading_rows=1 storageanalysis.usage gs://maxmautner-logs/* ./cloud_storage_usage_schema_v0.json

From the official docs:

When using wildcards, you might want to move logs already uploaded to BiqQuery to another directory (e.g., gs://example-logs-bucket/processed) to avoid uploading data from a log more than once.

For now, let’s not sweat it :)

Now open your SQL shell:

$ bq shell
> show storageanalysis.usage

   Last modified               Schema               Total Rows   Total Bytes   Expiration   Time Partitioning   Labels   kmsKeyName
 ----------------- ------------------------------- ------------ ------------- ------------ ------------------- -------- ------------
  11 Mar 01:44:27   |- time_micros: integer         38           15518
                    |- c_ip: string
                    |- c_ip_type: integer
                    |- c_ip_region: string
                    |- cs_method: string
                    |- cs_uri: string
                    |- sc_status: integer
                    |- cs_bytes: integer
                    |- sc_bytes: integer
                    |- time_taken_micros: integer
                    |- cs_host: string
                    |- cs_referer: string
                    |- cs_user_agent: string
                    |- s_request_id: string
                    |- cs_operation: string
                    |- cs_bucket: string
                    |- cs_object: string

This is the schema of the table we can now query, for example:

> select cs_uri, count(*) from [storageanalysis.usage] group by cs_uri;

Will give us the number of requests grouped by the URI–including query parameters.

Stripping query parameters is left as an exercise for the reader.

Setup SSL

In order to use SSL on our Google Cloud Storage static site we need to use a load balancer.

Adding a Cloud Storage bucket to content-based load balancing

Creating a load balancer

Options for obtaining a certificate: