Consider a flight data store with the following data structure, where all times are in GMT. Each record consists of the 13 attributes; the set of allowable values of the attributes and format are specified in the description (metadata).
Data Value Description
1 Year 1999-2017
2 Month 1-12
3 Day of Month 1-31
4 Day of the Week 1 (Monday) – 7(Sunday)
5 Departure Time Scheduled Departure time (hhmm)
6 Actual Departure time Recorded Departure time (hhmm)
7 Arrival Time Recorded Arrival time (hhmm)
8 Carrier Carrier code (unique)
9 Flight Number Flight Number
10 Departure Delay minutes
11 Arrival Delay minutes
12 Cancellation Yes or No
13 Weather Delay minutes
An example record would have the following values:
(2015, 4, 20, 5, 1430, 1400, 1820, 131, JL729, 30, 15, No, 0)
Flight monitors would like to determine the number of flights which were delayed for each carrier.
- Assuming that the data is stored in a relational database produce, with justification, the SQL statement to create the table and the SQL statement to determine the number of flights which were delayed for each carrier.
- Assuming that the data is too large to be processed in a centralised manner, and that it is stored in an ordinary file, produce a distributed solution which applies MapReduce to the data processing.
- a) Justify your decisions and all the steps of your solution, and specify clearly the map and reduce functions.
- b) Identify the advantages and drawbacks of this solution.
- c) Use diagrams if required.
- Assuming that the monitors wish to determine the number of delayed flights for a specific year or month for example, comment on the general applicability of your solution.
Please if I can get help in this question it will be helpful