| |
Sim-MASE Issues
|
|
|
I've been hacking sim-mase (from the
SimpleScalar
v4.0 pre-release)
for a little bit, and I've collected some of my bug finds/fixes here.
These are all presented in the form of email correspondences with some of
the MASE folks. Since all of my fixes have been applied to fairly
hacked up versions of MASE, I don't have "patched" source
files online (although there are code snippets). If you're a seasoned
SimpleScalar hacker, you should be able to follow the discussions and make
the fixes yourself.
Bugs I've Found So Far
Blind load speculation/partial store forward interaction
Date: Fri, 30 Aug 2002 00:07:52 -0700
From: Gabriel Loh
Subject: MASE, possible bug?
I ran across a funny case in sim-mase where I would get a simulator
panic in certain situations. First of all if you're not maintaining the
code, just let me know who to send this to. This is probably a fairly
rare bug, but it killed a simulation run of mine.
This code is from mase-exec.c, starting at line 1096 (using blind load
speculation):
transfer_ok = transfer_store_data(re, load_re);
if (transfer_ok) {
load_re->match_in_LSQ = TRUE;
if (load_re->issued) execute_inst(load_re);
}
else {
reset_inst(load_re);
}
if (load_re->completed) {
recovery_needed = TRUE;
break;
}
(I'm using the code from the simplescalar.com v4.0 pre-release.)
Normally if a blind speculated load causes a memory ordering violation,
recovery_needed gets flagged, and a pipeline flush occurs in the code
that follows the snippet above. But if the conflicting store causes a
partial store forward, then transfer_ok is set to false. This in turn
causes reset_inst to get called, which resets load_re's completed flag.
This in turn causes recovery_needed to *not* get set, and the
misspeculated load never gets cleaned up. On the following cycle,
lsq_refresh will notice that the load is ready to execute (operands
ready, blind speculate ok, and not queued/issued/completed due to the
reset_inst call), stick the load back on the ready_queue, and then the
panic in mase-commit.c:468 (in mase_writeback) gets called (children
woken up twice).
I think the following modification should take care of things:
transfer_ok = transfer_store_data(re, load_re);
if (transfer_ok) {
load_re->match_in_LSQ = TRUE;
if (load_re->issued) execute_inst(load_re);
}
if (load_re->completed) {
recovery_needed = TRUE;
break;
}
else if(!transfer_ok) {
reset_inst(load_re);
}
In this code, a recovery happens if there was a store-load conflict,
regardless of whether it was a partial store forward or not. If the
load is completed, we'll have to go and clean up its children. Only in
the case that the load hasn't completed yet, we can reset it without a
pipeline flush.
-Gabe
|
And a response from the MASE folks:
Date: Tue, 3 Sep 2002 09:25:46 -0400 (EDT)
To: Gabriel Loh
Subject: Re: MASE, possible bug?
After looking it at more closely, I believe you still need to reset the
instruction if the transfer was not successful and the instruction is
completed:
transfer_ok = transfer_store_data(re, load_re);
if (transfer_ok) {
load_re->match_in_LSQ = TRUE;
if (load_re->issued) execute_inst(load_re);
}
if (load_re->completed) {
recovery_needed = TRUE;
if (!transfer_ok) reset_inst(load_re);
break;
}
if (!transfer_ok) reset_inst(load_re);
If you don't reset the instruction, the load itself is not cleaned up,
causing a checker error. The key is to check the completed flag before
reseting.
Again, thanks for finding the bug.
|
NaN != NaN
Date: Thu, 07 Nov 2002 13:04:59 -0800
From: Gabriel Loh
Subject: Re: MASE, possible bug?
Found a bug in mase-checker.c, checker_check().
When comparing the register values of floating point registers to see if
they agree, you'll get a false positive if both the re->odep_value and
the isq->odep_value contain NaN. The IEEE f.p. standard actually states
that a NaN is not equal to itself, so the comparison will always return
false! My fix is to just check for NaN's first. I don't know how
portable the function isnan() is, but it's available under rh linux and
alpha osf.
case vt_sfloat:
if (isnan(re->odep_value[i].f) &&
isnan(isq->reg_state[j].reg_value.f))
{
/* reg contents equal, although IEEE754 says NaN != NaN */
;
}
else if (re->odep_value[i].f != isq->reg_state[j].reg_value.f)
found_error = 1;
break;
case vt_dfloat:
if (isnan(re->odep_value[i].d) &&
isnan(isq->reg_state[j].reg_value.d))
{
/* ditto */
;
}
else if (re->odep_value[i].d != isq->reg_state[j].reg_value.d)
found_error = 1;
break;
I found this in running vpr (Spec2k,ref-place,fastfwd 200M insts,
SimpleScalar for AlphaAXP), which reported a fair number of checker errors.
-Gabe
|
Store forward of single precision FP value; partial forwards
Date: Fri, 08 Nov 2002 15:25:58 -0800
From: Gabriel Loh
Subject: Re: MASE, possible bug?
> Thanks - got to love FP formats. A quick websearch shows that isnan is
> defined in math.h so I'm guessing it is portable. I will add this fix
> eventually.
More FP madness... ;-)
The Alpha instruction LDS loads a single-precision fp value. In the
current machine.def, the "implementation" code unpacks the
single-precision value (a float) into a double-precision value (a
double). There's a problem when you do store-to-load forwarding. In
mase-exec.c:transfer_store_data, we copy a value from the store's source
register into mem_value, but we do this based on the architectural
"mem_size", which does not take into account the fact that our
implementation unpacks floats into doubles. So what happens is that we
only copy 32 bits from the source register, even though the proper value
has been expanded into a full 64-bit double. As a temporary hack, I've
just added an extra conversion:
(in transfer_store_data)
#ifdef TARGET_ALPHA
if (src->op == STS)
{
/* double->float conversion stolen from STS implementation
in machine.def */
sqword_t _longhold;
sword_t _inthold;
enum md_fault_type _fault;
_longhold = src_value.q;
_inthold = (((_longhold >> 32) & ULL(0xc0000000))
| ((_longhold >> 29) & ULL(0x3fffffff)));
dest->mem_value.w = _inthold;
}
else
#endif
{
switch (dest->mem_size) {
case 1: dest->mem_value.b = src_value.b; break;
case 2: dest->mem_value.h = src_value.h; break;
case 4: dest->mem_value.w = src_value.w; break; /*!!!*/
case 8: dest->mem_value.q = src_value.q; break;
default: fatal("Unsupported memory size");
}
}
/* the case marked !!! is what originally would have been
called, thus only grabbing 32 bits from a src_value that
actually contains a 64-bit double. */
Note: instead of the STS conversion code, I initially just tried a
regular C conversion (mem_value.f = (float)src_value.d), but I actually
got different (although very close) results -- you can see a single bit
change if you print out the values in hex in checker_print_errors).
This means that the current implementations of STS/LDS do not do
float/double conversions in the same way as gcc 2.96 does (I'm running
on RH Linux 7.1, x86). Anyone have any idea why the original
machine.def implementation didn't just let the compiler handle it?
Of course the "proper" thing to do is to go and implement
single-precision operations correctly (i.e. not by doing it with
doubles), but I don't have the time at the moment. Figured you'd at
least like to be aware of this bug though.
I was also running into some store forwarding issues for the case when
the load and store addresses are not equal (but still overlap). For
example, in the sqrt() function, the function preamble writes a double
value (64 bits) to the stack:
0x12007b060: 9e1e0038 stt $f16, 56(sp)
later, we read out only the most significant 32 bits because we want to
do some bit-twiddling with the f.p. exponent:
0x12007b078: a05e003c ldl t1, 60(sp)
(the stack pointer hasn't changed between these two insts, so both
instructions refer to the same 64-bit piece of memory)
[code is from Spec2k/eon downloaded from simplescalar.com]
The load-long (LDL) does not recognize the store-double (STT) as a
"parent" store in lsq_refresh, and will go ahead and issue. The ideal
thing to do would be to recognize this case and perform the proper
partial store forwarding. An easier (in terms of simulator hacking)
solution is to just recognize this case and prevent the load from
issuing until the store has cleared out (in the same way a stl->ldq
partial store forward is currently handled when the addresses are the same).
-Gabe
|
And my "fix" for this
(updated 5/22/03; was missing the break after setting prevent_queue)
>
> It does indeed seem to be the problem I am getting.
>
> Eon Twolf and VPR all had between .4% and 1% error rate for 1B ff and 1B
> simulated state.
>
> The problem is not so much the performance hit, but rather that my changes
> to the code change the number of errors, and therefore it would cast their
> verificability in doubt.
>
> I'll check out their paper and see what they had to say.
>
> Do you happen to know if anyone has tackled the problem of correcting
> this, and do you know if SimpleScalar 3.0 suffers from the same problem?
>
I do think SS3.0 does suffer from the same problem, although I haven't looked
at it in a while (without the checker infrastructure, who could have known how
much impact it was having?); so I don't remember if they do anything and what
that might be. What I do right now is that I check for any possible partial
store fowarding cases, and punt on it (hold up the load until the
possible-parent store commits and clears out of the LSQ, and then the load just
grabs its value from the cache).
On line 944 of mase-exec.c (in lsq_refresh, at the end of the if block that
starts on line 898 with "if (sinfo_table[j].addr == load_addr)", I added the
code:
if (sinfo_table[j].addr == load_addr) { /* line 898 */
...
} /* line 944 */
else
{
int size = LSQ[sinfo_table[j].index].mem_size;
int mask;
if(LSQ[index].mem_size > size)
size = LSQ[index].mem_size;
mask = ~(size-1);
if ((sinfo_table[j].addr&mask) == (load_addr&mask))
{
prevent_queue = 1;
break;
}
}
This finds the larger of the two memory sizes between the store and the load,
and uses that to mask out the low order bits of the address when making the
address comparison. Any overlapping memory references should be detected in
this fashion, and we just set the prevent_queue flag to 1 to prevent this load
from getting its value directly from the store.
|
SQRTT Definition
Date: Sat, 06 Mar 2004 14:57:48 -0800
From: Gabriel Loh
Subject: MASE, minor bug
In target-alpha/alpha.def, the definition for SQRTT is not filled in properly,
which results in sim-mase dying for about five or so of the SpecFP benchmarks
(I'm using the single-sample 100M Sim-Points). It currently looks like
(alpha.def:2658):
DEFINST(SQRTT, 0x2b,
"sqrtt", "B,C",
NA, NA,
DNA, DNA, DNA, DNA, DNA)
Whereas I believe it should be:
DEFINST(SQRTT, 0x2b,
"sqrtt", "B,C",
FloatSQRT, F_FCOMP,
DFPR(RC), DNA, DFPR(RB), DNA, DNA)
Note that in the Simplescalar 3.0d maintenance release, the register specifier
fields are properly filled in for SQRTT, but the op-class and flags fields are
not. This could lead to some minor accuracy bugs (for example, not allocating
the fp-sqrt unit correctly).
|
|
|