Google Cloud Data Engineering
The depth and intensity of reading and practice required for Data Engineering Exam
- 1.Hive vs Dataproc and how they work on Google Cloud https://cloud.google.com/solutions/using-apache-hive-on-cloud-dataproc
- 2.How the workers of a cluster can download dependencies from the internet, if the cluster nodes have no egress/ingress allowance.
- 3.How Cloud Datastore and Spanner and Spanner vs Bigtable are different?
- 4.2 PB of data (key, value) where will you store, Datastore, Spanner, Bigtable?. What about 1 TB?
- 5.Storage transfer service vs transfer appliance. Can you use a private web address to transfer 2 PB of data over six months with storage transfer service? docs: https://cloud.google.com/storage-transfer/docs/overview https://cloud.google.com/storage-transfer/docs/on-prem-overview#requirements
- 6.BigQuery partitioning for easy querying, with a timestamp and unique ID for the dataset?
- 7.Dataflow template vs DAG on Cloud Composer for running spark in which some of the jobs in sequence and others concurrent? docs: https://cloud.google.com/composer/docs/how-to/using/using-dataflow-template-operator
- 8.fluentd
in_tail
plugin vsMysql
plugin for MariaDB with Stack driver agent? - 9.Does Cloud Vision API have damage detection without training? docs: https://cloud.google.com/vision/docs/features-list
- 10.Cloud ML vs Dataproc spark ML from existing spark ML models? And where do you store data cloud storage or bigquery?
- 12.
- 13.Bigquery Analytical functions - https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
- 14.How to do you improve Area Under Curve (AUC)? - Hyperparameter tuning, model deployment?
Last modified 2yr ago