Sequential and parallel processing

tobi123 · November 15, 2018, 11:31pm

Consider a flight data store with the following data structure, where all times are in GMT. Each record consists of the 13 attributes; the set of allowable values of the attributes and format are specified in the description (metadata).

        Data Value 				Description

1 Year 1999-2017
2 Month 1-12
3 Day of Month 1-31
4 Day of the Week 1 (Monday) – 7(Sunday)
5 Departure Time Scheduled Departure time (hhmm)
6 Actual Departure time Recorded Departure time (hhmm)
7 Arrival Time Recorded Arrival time (hhmm)
8 Carrier Carrier code (unique)
9 Flight Number Flight Number
10 Departure Delay minutes
11 Arrival Delay minutes
12 Cancellation Yes or No
13 Weather Delay minutes

An example record would have the following values:

(2015, 4, 20, 5, 1430, 1400, 1820, 131, JL729, 30, 15, No, 0)

Flight monitors would like to determine the number of flights which were delayed for each carrier.

1. Assuming that the data is stored in a relational database produce, with justification, the SQL statement to create the table and the SQL statement to determine the number of flights which were delayed for each carrier.
1. Assuming that the data is too large to be processed in a centralised manner, and that it is stored in an ordinary file, produce a distributed solution which applies MapReduce to the data processing.
a) Justify your decisions and all the steps of your solution, and specify clearly the map and reduce functions.
b) Identify the advantages and drawbacks of this solution.
c) Use diagrams if required.
1. Assuming that the monitors wish to determine the number of delayed flights for a specific year or month for example, comment on the general applicability of your solution.

Please if I can get help in this question it will be helpful

yosiasz · November 16, 2018, 1:16pm

This sounds more like hadoop than ms SQL server. Also sounds like homework or interview