From kenmac Wed Nov 29 18:19:37 -0500 2000 From: "Kenneth M. Mackenzie" To: gte220x@prism, gte584f@prism, gt4947c@prism, gt8230b@prism, gte225u@prism, gte211q@prism, gte020r@prism, gte534w@prism, gte572q@prism, gte463q@prism, gt9182b@prism, gt0178f@prism, gt6645f@prism, gte823q@prism, gte148r@prism, gte909q@prism, gt5231a@prism, gte930q@prism, gte829m@prism, gte163i@prism, gt5193d@prism, gte182g@prism, gt7150b@prism, gte409s@prism, gte161h@prism, gt5809a@prism, gt6671b@prism, gt7116c@prism CC: pp Subject: clarifications on HW8 Reply-to: kenmac@cc.gatech.edu Hi, [note: paper for Friday if you didn't get it today: http://www3-int.cc.gatech.edu/classes/AY2001/cs6290_fall/papers/codrescu99.pdf http://www3-int.cc.gatech.edu/classes/AY2001/cs6290_fall/papers/codrescu99.ps ] Here are some clarifications about HW8 based on questions folks have asked. As usual, you can define anything you have to assume but here are the assumptions I was making. o In Problem 1, the work per jacobi cell, w, is ambiguous. I count one unit of "w" for each cell and ignore the number of multiplies/adds going on within a cell. o In Problem, parts D, E and F, ignore "edge" partitions in the array, i.e. compute the communication only for internal partitions that are bounded on all four sides, e.g, the partitions assigned to processors 5, 6, 9 and 10 in the figure on page 6. o In Problem 2, part A, the difference between synchronous and overlapped communication is ambiguous. I define synchronous communication as communication in which you always wait for round-trips. For instance, to exchange a row of data with my neighbor, I send a message containing my row and then wait (doing nothing) for a return message containing the neighbor's row. Only one message is in flight at a time. Overlapped communication assumes that a node does *not* wait for a reply when sending a message. To send a message, I pay sender-side overhead plus transmit time but then after that I can immediately send another message (or receive a message). o Problem 2 mistakenly gives the computation per cell as three FADDs plus one FMUL. There are really four FADDs. I don't care which you assume but in the solutions I stuck with three FADDs rather than fixing it. -- Ken