Google Cloud Data Engineering

The depth and intensity of reading and practice required for Data Engineering Exam

  1. Hive vs Dataproc and how they work on Google Cloud https://cloud.google.com/solutions/using-apache-hive-on-cloud-dataproc

  2. How the workers of a cluster can download dependencies from the internet, if the cluster nodes have no egress/ingress allowance.

  3. How Cloud Datastore and Spanner and Spanner vs Bigtable are different?

  4. 2 PB of data (key, value) where will you store, Datastore, Spanner, Bigtable?. What about 1 TB?

  5. Storage transfer service vs transfer appliance. Can you use a private web address to transfer 2 PB of data over six months with storage transfer service? docs: https://cloud.google.com/storage-transfer/docs/overview https://cloud.google.com/storage-transfer/docs/on-prem-overview#requirements

  6. BigQuery partitioning for easy querying, with a timestamp and unique ID for the dataset?

  7. Dataflow template vs DAG on Cloud Composer for running spark in which some of the jobs in sequence and others concurrent? docs: https://cloud.google.com/composer/docs/how-to/using/using-dataflow-template-operator

  8. fluentd in_tail plugin vs Mysql plugin for MariaDB with Stack driver agent?

  9. Does Cloud Vision API have damage detection without training? docs: https://cloud.google.com/vision/docs/features-list

  10. Cloud ML vs Dataproc spark ML from existing spark ML models? And where do you store data cloud storage or bigquery?

  11. How to do you improve Area Under Curve (AUC)? - Hyperparameter tuning, model deployment?

Last updated