Static Websites on Google Cloud
Goals:
- install gsutil
- create a bucket
- upload your static site content to the bucket
- setup custom domain
- logging requests
- query access logs
- setup SSL on custom domain
Install gsutil
Install gsutil
, the Google Cloud command line tool:
$ curl https://sdk.cloud.google.com | bash
$ exec -l $SHELL
$ gcloud init
Create “bucket”
Created a bucket through the web interface, or using the gsutil
command line tool:
$ gsutil mb gs://mentor.maxmautner.com
Push files
Set the bucket permissions & settings using gsutil
instead of through the web console:
$ gsutil acl ch -u AllUsers:R gs://mentor.maxmautner.com
$ gsutil defacl set public-read gs://mentor.maxmautner.com
$ gsutil web set -m index.html -e 404.html gs://mentor.maxmautner.com
Created a dummy index.html, 404.html file:
Upload the files to our bucket:
$ gsutil rsync -R ./ gs://mentor.maxmautner.com
Verify file uploaded:
$ gsutil ls -a gs://mentor.maxmautner.com
Setup Custom Domain
Create a CNAME record for your subdomain (e.g. www) to point to c.storage.googleapis.com.
Access Logs
$ gsutil mb gs://maxmautner-logs-bucket
$ gsutil acl ch -g cloud-storage-analytics@google.com:W gs://maxmautner-logs-bucket
$ gsutil defacl set project-private gs://maxmautner-logs-bucket
$ gsutil logging set on -b gs://maxmautner-logs-bucket -o mentor gs://mentor.maxmautner.com
Checking that logging is successfully setup for your bucket:
$ gsutil logging get gs://mentor.maxmautner.com
{"logBucket": "maxmautner-logs", "logObjectPrefix": "mentor"}
Requests logs are created hourly (source)
I figured I would/should copy the official docs on this topic as this is critical information (March 10th 2018):
Usage logs are generated hourly when there is activity to report in the monitored bucket. Usage logs are typically created 15 minutes after the end of the hour.
Note:
- Any log processing of usage logs should take into account the possibility that they may be delivered later than 15 minutes after the end of an hour.
- Usually, hourly usage log object(s) contain records for all usage that occurred during that hour. Occasionally, an hourly usage log object contains records for an earlier hour, but never for a later hour.
- Cloud Storage may write multiple log objects for the same hour.
- Occasionally, a single record may appear twice in the usage logs. While we make our best effort to remove duplicate records, your log processing should be able to remove them if it is critical to your log analysis. You can use the s_request_id field to detect duplicates.
Query Logs
Query access logs using SQL (BigQuery?)
Download this JSON manifest for mapping the log file format into BigQuery
Run this command to load our log files into the bigquery usage
table::
$ bq load --skip_leading_rows=1 storageanalysis.usage gs://maxmautner-logs/* ./cloud_storage_usage_schema_v0.json
From the official docs:
When using wildcards, you might want to move logs already uploaded to BiqQuery to another directory (e.g., gs://example-logs-bucket/processed) to avoid uploading data from a log more than once.
For now, let’s not sweat it :)
Now open your SQL shell:
$ bq shell
> show storageanalysis.usage
Last modified Schema Total Rows Total Bytes Expiration Time Partitioning Labels kmsKeyName
----------------- ------------------------------- ------------ ------------- ------------ ------------------- -------- ------------
11 Mar 01:44:27 |- time_micros: integer 38 15518
|- c_ip: string
|- c_ip_type: integer
|- c_ip_region: string
|- cs_method: string
|- cs_uri: string
|- sc_status: integer
|- cs_bytes: integer
|- sc_bytes: integer
|- time_taken_micros: integer
|- cs_host: string
|- cs_referer: string
|- cs_user_agent: string
|- s_request_id: string
|- cs_operation: string
|- cs_bucket: string
|- cs_object: string
This is the schema of the table we can now query, for example:
> select cs_uri, count(*) from [storageanalysis.usage] group by cs_uri;
Will give us the number of requests grouped by the URI–including query parameters.
Stripping query parameters is left as an exercise for the reader.
Setup SSL
In order to use SSL on our Google Cloud Storage static site we need to use a load balancer.
Adding a Cloud Storage bucket to content-based load balancing
Options for obtaining a certificate:
- LetsEncrypt, certbot–is it even possible on Google Cloud?
- paid SSL certificate vendor