Do we have to worry about memory disambiguation problem in this assignment?

No, We assume that there is a perfect memory disambiguation predictor.

What should I use to find out memory instructions? is checking mem_type sufficient or do I have to check opcode?

if opcode is OP_ST mem_type is MEM_ST and if opcode is OP_LD mem_type is MEM_LD. so just checking mem_type itself is sufficient.

What will be the cache miss penalty? Is this KNOB_DCACHE_HIT_LATENCY+KNOB_MEM_LATENCY_ROW_HIT or just KNOB_MEM_LATENCY_ROW_HIT?

It is KNOB_DCACHE_HIT_LATENCY+ KNOB_MEM_LATENCY_ROW_HIT (or KNOB_MEM_LATENCY_ROW_MISS) + additional queuing delay.

Do we need to implement store-load forwarding?

Yes, it will be done though MSHR. Be careful of memory sizes. e.g.) store writes location 0x001 with the memory size 1B, then load address 0x003 cannot get data from the store because both two addresses really do not overlap.

What if load/store addresses are mapped into two different cache blocks (unaligned accesses)?

Let's just access the first cache block

Do we need to translate virtual addresses to physical addresses?

No, we just assume that the addresses that are provided in Pin are physical addresses.

Is the DCACHE pipelined ?

Yes, we assume that the DCACHE is pipelined. so every cycle, the processor can access the cache.

When an instruction can retire? Can it retire out of order?

No, you need to enforce in order retirement. You need to have a buffer in the WB stage to make instructions retire in-order.

Do we need to handle write traffic from the cache?

To simplify the assignment, let's ignore write traffic (writing value into the DRAM) from the cache. (this happens when cache hit for store instructions). Please note that, when there is a cache miss for store, the simulator still should send the request to the DRAM.

Do we need to make it sure only one instruction can send to the WB stage in one cycle?

No, you do not have to model that behavior.