Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost?
A) Migrate the workload to Google Cloud Dataflow
B) Use pre-emptible virtual machines (VMs) for the cluster
C) Use a higher-memory node so that the job runs faster
D) Use SSDs on the worker nodes so that the job can run faster
Correct Answer:
Verified
Q24: Your company receives both batch- and stream-based
Q25: You are deploying a new storage system
Q26: You have some data, which is shown
Q27: An online retailer has built their current
Q28: Your company produces 20,000 files every hour.
Q30: You have enabled the free integration between
Q31: You are developing an application that uses
Q32: Your financial services company is moving to
Q33: Your organization has been collecting and analyzing
Q34: You are choosing a NoSQL database to
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents