# Google Cloud Data Engineering

1. Hive vs Dataproc and how they work on Google Cloud\
   <https://cloud.google.com/solutions/using-apache-hive-on-cloud-dataproc><br>
2. How the workers of a cluster can download dependencies from the internet, if the cluster nodes have no egress/ingress allowance.<br>
3. How Cloud Datastore and Spanner and Spanner vs Bigtable are different?<br>
4. 2 PB of data (key, value) where will you store, Datastore, Spanner, Bigtable?. What about 1 TB?<br>
5. Storage transfer service vs transfer appliance. Can you use a private web address to transfer 2 PB of data over six months with storage transfer service?\
   \
   docs: <https://cloud.google.com/storage-transfer/docs/overview>\
   <https://cloud.google.com/storage-transfer/docs/on-prem-overview#requirements><br>
6. BigQuery partitioning for easy querying, with a timestamp and unique ID for the dataset?<br>
7. Dataflow template vs DAG on Cloud Composer for running spark in which some of the jobs in sequence and others concurrent?\
   \
   docs: <https://cloud.google.com/composer/docs/how-to/using/using-dataflow-template-operator><br>
8. fluentd `in_tail` plugin vs `Mysql` plugin for MariaDB with Stack driver agent?<br>
9. Does Cloud Vision API have damage detection without training?\
   \
   docs: <https://cloud.google.com/vision/docs/features-list><br>
10. Cloud ML vs Dataproc spark ML from existing spark ML models? And where do you store data cloud storage or bigquery?<br>
11. Kubeflow - <https://www.kubeflow.org/docs/components/pipelines/sdk/sdk-overview/><br>
12. Streaming data into bigquery - <https://cloud.google.com/bigquery/streaming-data-into-bigquery><br>
13. Bigquery Analytical functions - <https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts><br>
14. How to do you improve Area Under Curve (AUC)? - Hyperparameter tuning, model deployment?
