Google Professional Data Engineer Quiz 2 Topic 2 Questions 6-10

Question: 1

MJTelcoâ€™s Google Cloud Dataflow pipeline is now ready to start receiving data from the 50,000 installations. You want to allow Cloud Dataflow to scale its compute power up as required. Which Cloud Dataflow pipeline configuration setting should you update?

AThe zone

BThe number of workers

CThe disk size per worker

DThe maximum number of workers

Show Answer

Question: 2

Flowlogisticâ€™s CEO wants to gain rapid insight into their customer base so his sales team can be better informed in the field. This team is not very technical, so theyâ€™ve purchased a visualization tool to simplify the creation of BigQuery reports. However, theyâ€™ve been overwhelmed by all the data in the table, and are spending a lot of money on queries trying to find the data they need. You want to solve their problem in the most cost-effective way. What should you do?

AExport the data into a Google Sheet for virtualization.

BCreate an additional table with only the necessary columns.

CCreate a view on the table to present to the virtualization tool.

DCreate identity and access management (IAM) roles on the appropriate columns, so only they appear in a query.

Show Answer

Question: 3

Flowlogisticâ€™s management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?

ACloud Pub/Sub, Cloud Dataflow, and Cloud Storage

BCloud Pub/Sub, Cloud Dataflow, and Local SSD

CCloud Pub/Sub, Cloud SQL, and Cloud Storage

DCloud Load Balancing, Cloud Dataflow, and Cloud Storage

Show Answer

Question: 4

Your companyâ€™s customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations. What should you do?

AAdd a node to the MySQL cluster and build an OLAP cube there.

BUse an ETL tool to load the data from MySQL into Google BigQuery.

CConnect an on-premises Apache Hadoop cluster to MySQL and perform ETL.

DMount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.

Show Answer

Question: 5

You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain usersâ€™ privacy?

AGrant the consultant the Viewer role on the project.

BGrant the consultant the Cloud Dataflow Developer role on the project.

CCreate a service account and allow the consultant to log on with it.

DCreate an anonymized sample of the data for the consultant to work with in a different project.

Show Answer

Google Professional Data Engineer Quiz:2 Topic:2 Questions:6-10