Google Cloud Data Engineering
The depth and intensity of reading and practice required for Data Engineering Exam
Hive vs Dataproc and how they work on Google Cloud https://cloud.google.com/solutions/using-apache-hive-on-cloud-dataproc
How the workers of a cluster can download dependencies from the internet, if the cluster nodes have no egress/ingress allowance.
How Cloud Datastore and Spanner and Spanner vs Bigtable are different?
2 PB of data (key, value) where will you store, Datastore, Spanner, Bigtable?. What about 1 TB?
Storage transfer service vs transfer appliance. Can you use a private web address to transfer 2 PB of data over six months with storage transfer service? docs: https://cloud.google.com/storage-transfer/docs/overview https://cloud.google.com/storage-transfer/docs/on-prem-overview#requirements
BigQuery partitioning for easy querying, with a timestamp and unique ID for the dataset?
Dataflow template vs DAG on Cloud Composer for running spark in which some of the jobs in sequence and others concurrent? docs: https://cloud.google.com/composer/docs/how-to/using/using-dataflow-template-operator
fluentd
in_tail
plugin vsMysql
plugin for MariaDB with Stack driver agent?Does Cloud Vision API have damage detection without training? docs: https://cloud.google.com/vision/docs/features-list
Cloud ML vs Dataproc spark ML from existing spark ML models? And where do you store data cloud storage or bigquery?
Streaming data into bigquery - https://cloud.google.com/bigquery/streaming-data-into-bigquery
Bigquery Analytical functions - https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
How to do you improve Area Under Curve (AUC)? - Hyperparameter tuning, model deployment?
Last updated