You architect a system to analyze seismic data. Your extract, transform, and load (ETL) process runs as a series of MapReduce jobs on an Apache Hadoop cluster. The ETL process takes days to process a data set because some steps are computationally expensive. Then you discover that a sensor calibration step has been omitted. How should you change your ETL process to carry out sensor calibration systematically in the future?
A) Modify the transformMapReduce jobs to apply sensor calibration before they do anything else.
B) Introduce a new MapReduce job to apply sensor calibration to raw data, and ensure all other MapReduce jobs are chained after this.
C) Add sensor calibration data to the output of the ETL process, and document that all users need to apply sensor calibration themselves.
D) Develop an algorithm through simulation to predict variance of data output from the last MapReduce job based on calibration factors, and apply the correction to all data.
Correct Answer:
Verified
Q237: When using Cloud Dataproc clusters, you can
Q238: All Google Cloud Bigtable client requests go
Q239: Dataproc clusters contain many configuration files. To
Q240: Which action can a Cloud Dataproc Viewer
Q241: Google Cloud Bigtable indexes a single value
Q243: When you store data in Cloud Bigtable,
Q244: If you're running a performance test that
Q245: You need to choose a database for
Q246: You are designing a cloud-native historical data
Q247: You have Cloud Functions written in Node.js
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents