You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE) . Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?
A) Significantly increase the max_batch_size TensorFlow Serving parameter. Significantly increase the max_batch_size TensorFlow Serving parameter.
B) Switch to the tensorflow-model-server-universal version of TensorFlow Serving.
C) Significantly increase the max_enqueued_batches TensorFlow Serving parameter. max_enqueued_batches
D) Recompile TensorFlow Serving using the source to support CPU-specific optimizations. Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes.
Correct Answer:
Verified
Q4: Your organization's call center has asked you
Q5: You are training a Resnet model on
Q6: You work for a social media company.
Q7: You have trained a deep neural network
Q8: You have written unit tests for a
Q10: You have been asked to develop an
Q11: You are building an ML model to
Q12: You work for a large hotel chain
Q13: You are developing a Kubeflow pipeline on
Q14: You have a demand forecasting pipeline in
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents