You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the intitial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this data. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? Choose 2 answers.
A) Denormalize the data as must as possible.
B) Preserve the structure of the data as much as possible.
C) Use BigQuery UPDATE to further reduce the size of the dataset.
D) Develop a data pipeline where status updates are appended to BigQuery instead of updated.
E) Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery's support for external data sources to query.
Correct Answer:
Verified
Q93: The marketing team at your organization provides
Q94: You have developed three data processing jobs.
Q95: You use BigQuery as your centralized analytics
Q96: You are running a pipeline in Cloud
Q97: As your organization expands its usage of
Q99: You work for a manufacturing company that
Q100: You have historical data covering the last
Q101: You need to move 2 PB of
Q102: You are planning to migrate your current
Q103: You are implementing several batch jobs that
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents