A company wants to run analytics on its Elastic Load Balancing logs stored in Amazon S3. A data analyst needs to be able to query all data from a desired year, month, or day. The data analyst should also be able to query a subset of the columns. The company requires minimal operational overhead and the most cost-effective solution. Which approach meets these requirements for optimizing and querying the log data?
A) Use an AWS Glue job nightly to transform new log files into .csv format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data.
B) Launch a long-running Amazon EMR cluster that continuously transforms new log files from Amazon S3 into its Hadoop Distributed File System (HDFS) storage and partitions by year, month, and day. Use Apache Presto to query the optimized format.
C) Launch a transient Amazon EMR cluster nightly to transform new log files into Apache ORC format and partition by year, month, and day. Use Amazon Redshift Spectrum to query the data.
D) Use an AWS Glue job nightly to transform new log files into Apache Parquet format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data.
Correct Answer:
Verified
Q98: An online retail company uses Amazon Redshift
Q99: A data analyst is designing an Amazon
Q100: A streaming application is reading data from
Q101: A hospital uses wearable medical sensor devices
Q102: A software company hosts an application on
Q104: A company uses Amazon Elasticsearch Service (Amazon
Q105: A company operates toll services for highways
Q106: A manufacturing company uses Amazon Connect to
Q107: A manufacturing company has been collecting IoT
Q108: A financial company uses Amazon S3 as
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents