Google Professional Data Engineer Quiz 4 Topic 5 Questions 16-20

Question: 1

Your company is loading comma-separated values (CSV) files into Google BigQuery. The data is fully imported successfully; however, the imported data is not matching byte-to-byte to the source file. What is the most likely cause of this problem?

AThe CSV data loaded in BigQuery is not flagged as CSV.

BThe CSV data has invalid rows that were skipped on import.

CThe CSV data loaded in BigQuery is not using BigQuery's default encoding.

DThe CSV data has not gone through an ETL phase before loading into BigQuery.

Show Answer

Question: 2

A live TV show asks viewers to cast votes using their mobile phones. The event generates a large volume of data during a 3 minute period. You are in charge of the Voting restructure* and must ensure that the platform can handle the load and Hal all votes are processed. You must display partial results write voting is open. After voting doses you need to count the votes exactly once white optimizing cost. What should you do?

q2_Professional Data Engineer

ACreate a Memorystore instance with a high availability (HA) configuration

BWrite votes to a Pub Sub tope and have Cloud Functions subscribe to it and write voles to BigQuery

CWrite votes to a Pub/Sub tope and toad into both Bigtable and BigQuery via a Dataflow pipeline Query Bigtable for real-time results and BigQuery for later analysis Shutdown the Bigtable instance when voting concludes
D Create a Cloud SQL for PostgreSQL database with high availability (HA) configuration and multiple read replicas

Show Answer

Question: 3

Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.

The data scientists have written the following code to read the data for a new key features in the logs.

BigQueryIO.Read

.named(''ReadLogData'')

.from(''clouddataflow-readonly:samples.log_data'')

You want to improve the performance of this data read. What should you do?

ASpecify the TableReference object in the code.

BUse .fromQuery operation to read specific fields from the table.

CUse of both the Google BigQuery TableSchema and TableFieldSchema classes.

DCall a transform that returns TableRow objects, where each element in the PCollexction represents a single row in the table.

Show Answer

Question: 4

You are deploying a new storage system for your mobile application, which is a media streaming service. You decide the best fit is Google Cloud Datastore. You have entities with multiple properties, some of which can take on multiple values. For example, in the entity 'Movie' the property 'actors' and the property 'tags' have multiple values but the property 'date released' does not. A typical query would ask for all movies with actor= ordered by date_released or all movies with tag=Comedy ordered by date_released. How should you avoid a combinatorial explosion in the number of indexes?

q4_Professional Data Engineer

AOption A

BOption B.

COption C

DOption D

Show Answer

Question: 5

Your companyâ€™s customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations. What should you do?

AAdd a node to the MySQL cluster and build an OLAP cube there.

BUse an ETL tool to load the data from MySQL into Google BigQuery.

CConnect an on-premises Apache Hadoop cluster to MySQL and perform ETL.

DMount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.

Show Answer

Google Professional Data Engineer Quiz:4 Topic:5 Questions:16-20