Cloud, GCP

Spanner – GCP custom SQL solution

Google Cloud Platform (GCP) offers 3 SQL engines as a service: MySQL 5.6/5.7, PostgreSQL 9.6 and Spanner. MySQL and PostgreSQL are quite well known across the world. Spanner is completely new player – Google’s own product dedicated to handle thousands of requests.

When we setup the database, it is required to specify number of nodes and its location (regional or multiregional). The single node is composed from 3 replicas. Such setup allows to get 10k reads and 2k writes but do not guarantee any SLA. To get SLA guarantee (99.99% for “regional” setup and 99.999% for multiregional), you need to set at least 3 nodes.

It ensures:

  • Scalability
  • Sharding
  • Synchronous replication (3 replicas, master-master)
  • Access management
  • Auditability
  • Online schema changes
  • Encryption
  • Externally consistent read/write

What is external consistency? Google describes it as when you start 2 transactions, A and B (A starts first and ends second), the result of the transaction B won’t be visible until A won’t complete.

The database consistency (row versioning based on timestamp) is based on True Time API – a complex solution which ensures time synchronization between servers in Google’s data centers.

Available data types:

  • bool
  • int64
  • float64
  • string (length)
  • bytes (length)
  • date (timezone-independent)
  • timestamp

Interesting part is lack of autoincrement (identity) column property. Also there is no decimal data type. Google advises “store arbitrary precision numbers as strings”. Yet another unusual treat is storing data in array. Array cannot be nested one into another and they are limited to 10 MB.
Also struct data type is supported (only in operations like select, update) not to store the data.

Very interesting concept is a DDL clause INTERLEAVE IN PARENT. On table creation we can specify that new table “lives” inside the parent. That causes that physical data are stored on a same shard (e.g. a car with history of his service).

Yet another is tablesample operator. It is used to get sample results from database (e.g. we want to take a look on usual data). They are 2 methods: Reservoir (based on row number) and Bernoulli (based on % of total records).

SELECT * FROM User TABLESAMPLE RESERVOIR (100 ROWS);
SELECT * FROM User TABLESAMPLE BERNOULLI (0.1 PERCENT);

The last feature which is worth to mention is Partitioned DML which can be used to batch update or delete of records. The sad part is that I did not found how to execute it directly using SQL (until the engine is not doing that for us), otherwise using Google’s lib we have to specify that we want to execute partitioned DML.

With partitioned DML only rows which matches where clause are locked, transaction is managed on Cloud Spanner side, DML must be idempotent and statment must be fully partitionable.

/* partitionedstatements */
UPDATE User SET IsActive = true WHERE IsActive IS NULL
UPDATE User SET AccountCreation = '2018-01-01 00:00:00'

/* non-partitioned statements */
UPDATE User SET IsActive = !IsActive WHERE true
UPDATE User SET FailedLoginTry = FailedLoginTry + 1 WHERE true
DELETE FROM User WHERE UserId IN (SELECT UserId FROM UserRole WHERE RoleId = 7)

Leave a Reply

Your email address will not be published. Required fields are marked *