A company that produces network devices has millions of users. Data is collected from the devices on an hourly basis and stored in an Amazon S3 data lake. The company runs analyses on the last 24 hours of data flow logs for abnormality detection and to troubleshoot and resolve user issues. The company also analyzes historical logs dating back 2 years to discover patterns and look for improvement opportunities. The data flow logs contain many metrics, such as date, timestamp, source IP, and target IP. There are about 10 billion events every day. How should this data be stored for optimal performance?
A) In Apache ORC partitioned by date and sorted by source IP
B) In compressed .csv partitioned by date and sorted by source IP
C) In Apache Parquet partitioned by source IP and sorted by date
D) In compressed nested JSON partitioned by source IP and sorted by date
Correct Answer:
Verified
Q29: A company has developed an Apache Hive
Q30: A financial company hosts a data lake
Q31: A company is planning to do a
Q32: A company stores its sales and marketing
Q33: A large financial company is running its
Q35: A transportation company uses IoT sensors attached
Q36: A company leverages Amazon Athena for ad-hoc
Q37: An insurance company has raw data in
Q38: A company has a business unit uploading
Q39: A company wants to improve the data
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents