We conducted the TPC-DS benchmark (on a small scale) on cloud data warehouses & GPU databases with100-GB TPC-DS dataset to assess mainly fit for our client needs but also for our internal reference and proof of concepts. 

100GB is not an official dataset size for TPC-DS benchmarking. Nonetheless, it did help us assess fit for our client needs and gave a good indication of the performances of the data warehouses and database.

The assessment was done for different cloud warehouses and GPU databases including Azure Synapse, Snowflake, Kinetica, OmniSci, Incorta, and SkySQL. The motive was to evaluate:

  • Load Performance: Time is taken to bulk load 24 tables into the respective warehouse or GPU database. The 24 tables dataset is a standard as per TPC-DS specifications.
  • Query Performance:- Time is taken to run some selected TPC-DS queries (15 to be specific) for complex, long-running, reporting, interactive, and analytical workloads. TPC-DS has more than 15 queries defined for benchmarks.

The benchmarking process consisted of the following steps:

  1. Generating TPC-DS dataset;
  2. Loading the data into respective warehouses or GPU databases;
  3. Scaling and running queries one after the other in sequence. Concurrent workloads were not tested;
  4. Plotting the results in a graph.

Although the results cannot be considered for official publication, we got a decent idea of the performance of the systems. It was a good learning experience on:

  • How to conduct TPC-DS bench-marking tests - generate datasets and queries.
  • The architecture and features (pros and cons) of many data warehouses and GPU databases.

More on TPC-DS can be found in the links given below:

TPC-DS Dataset

http://www.tpc.org/tpcds/default5.asp

https://gerardnico.com/data/type/relation/benchmark/tpcds/schema#table_list

TPC-DS Queries

https://gerardnico.com/data/type/relation/benchmark/tpcds/query