A company is planning to do a proof of concept for a machine learning (ML) project using Amazon SageMaker with a subset of existing on-premises data hosted in the company's 3 TB data warehouse. For part of the project, AWS Direct Connect is established and tested. To prepare the data for ML, data analysts are performing data curation. The data analysts want to perform multiple step, including mapping, dropping null fields, resolving choice, and splitting fields. The company needs the fastest solution to curate the data for this project. Which solution meets these requirements?
A) Ingest data into Amazon S3 using AWS DataSync and use Apache Spark scrips to curate the data in an Amazon EMR cluster. Store the curated data in Amazon S3 for ML processing.
B) Create custom ETL jobs on-premises to curate the data. Use AWS DMS to ingest data into Amazon S3 for ML processing.
C) Ingest data into Amazon S3 using AWS DMS. Use AWS Glue to perform data curation and store the data in Amazon S3 for ML processing.
D) Take a full backup of the data store and ship the backup files using AWS Snowball. Upload Snowball data into Amazon S3 and schedule data curation jobs using AWS Batch to prepare the data for ML.
Correct Answer:
Verified
Q26: A company launched a service that produces
Q27: A company is streaming its high-volume billing
Q28: A company's marketing team has asked for
Q29: A company has developed an Apache Hive
Q30: A financial company hosts a data lake
Q32: A company stores its sales and marketing
Q33: A large financial company is running its
Q34: A company that produces network devices has
Q35: A transportation company uses IoT sensors attached
Q36: A company leverages Amazon Athena for ad-hoc
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents