Sim-MASE Issues



I've been hacking sim-mase (from the SimpleScalar v4.0 pre-release) for a little bit, and I've collected some of my bug finds/fixes here. These are all presented in the form of email correspondences with some of the MASE folks. Since all of my fixes have been applied to fairly hacked up versions of MASE, I don't have "patched" source files online (although there are code snippets). If you're a seasoned SimpleScalar hacker, you should be able to follow the discussions and make the fixes yourself.

Bugs I've Found So Far



Blind load speculation/partial store forward interaction

Date: Fri, 30 Aug 2002 00:07:52 -0700
From: Gabriel Loh
Subject: MASE, possible bug?

I ran across a funny case in sim-mase where I would get a simulator 
panic in certain situations.  First of all if you're not maintaining the 
code, just let me know who to send this to.  This is probably a fairly 
rare bug, but it killed a simulation run of mine.

This code is from mase-exec.c, starting at line 1096 (using blind load 
speculation):

           transfer_ok = transfer_store_data(re, load_re);
           if (transfer_ok) {
             load_re->match_in_LSQ = TRUE;
             if (load_re->issued) execute_inst(load_re);
           }
           else {
             reset_inst(load_re);
           }

           if (load_re->completed) {
             recovery_needed = TRUE;
             break;
           }

(I'm using the code from the simplescalar.com v4.0 pre-release.)

Normally if a blind speculated load causes a memory ordering violation, 
recovery_needed gets flagged, and a pipeline flush occurs in the code 
that follows the snippet above.  But if the conflicting store causes a 
partial store forward, then transfer_ok is set to false.  This in turn 
causes reset_inst to get called, which resets load_re's completed flag. 
  This in turn causes recovery_needed to *not* get set, and the 
misspeculated load never gets cleaned up.  On the following cycle, 
lsq_refresh will notice that the load is ready to execute (operands 
ready, blind speculate ok, and not queued/issued/completed due to the 
reset_inst call), stick the load back on the ready_queue, and then the 
panic in mase-commit.c:468 (in mase_writeback) gets called (children 
woken up twice).

I think the following modification should take care of things:

           transfer_ok = transfer_store_data(re, load_re);
           if (transfer_ok) {
             load_re->match_in_LSQ = TRUE;
             if (load_re->issued) execute_inst(load_re);
           }

           if (load_re->completed) {
             recovery_needed = TRUE;
             break;
           }
           else if(!transfer_ok) {
             reset_inst(load_re);
           }

In this code, a recovery happens if there was a store-load conflict, 
regardless of whether it was a partial store forward or not.  If the 
load is completed, we'll have to go and clean up its children.  Only in 
the case that the load hasn't completed yet, we can reset it without a 
pipeline flush.

-Gabe
          

And a response from the MASE folks:
Date: Tue, 3 Sep 2002 09:25:46 -0400 (EDT)
To: Gabriel Loh
Subject: Re: MASE, possible bug?

After looking it at more closely, I believe you still need to reset the
instruction if the transfer was not successful and the instruction is
completed:

                transfer_ok = transfer_store_data(re, load_re);
                if (transfer_ok) {
                  load_re->match_in_LSQ = TRUE;
                  if (load_re->issued) execute_inst(load_re);
                }

                if (load_re->completed) {
                  recovery_needed = TRUE;
                  if (!transfer_ok) reset_inst(load_re);
                  break;
                }
                if (!transfer_ok) reset_inst(load_re);

If you don't reset the instruction, the load itself is not cleaned up,
causing a checker error.  The key is to check the completed flag before
reseting.

Again, thanks for finding the bug.

          


NaN != NaN

Date: Thu, 07 Nov 2002 13:04:59 -0800
From: Gabriel Loh
Subject: Re: MASE, possible bug?


Found a bug in mase-checker.c, checker_check().

When comparing the register values of floating point registers to see if 
they agree, you'll get a false positive if both the re->odep_value and 
the isq->odep_value contain NaN.  The IEEE f.p. standard actually states 
that a NaN is not equal to itself, so the comparison will always return 
false!  My fix is to just check for NaN's first.  I don't know how 
portable the function isnan() is, but it's available under rh linux and 
alpha osf.


         case vt_sfloat:
           if (isnan(re->odep_value[i].f) &&
               isnan(isq->reg_state[j].reg_value.f))
           {
             /* reg contents equal, although IEEE754 says NaN != NaN */
             ;
           }
           else if (re->odep_value[i].f != isq->reg_state[j].reg_value.f)
             found_error = 1;
           break;
         case vt_dfloat:
           if (isnan(re->odep_value[i].d) &&
               isnan(isq->reg_state[j].reg_value.d))
           {
             /* ditto */
             ;
           }
           else if (re->odep_value[i].d != isq->reg_state[j].reg_value.d)
             found_error = 1;
           break;


I found this in running vpr (Spec2k,ref-place,fastfwd 200M insts, 
SimpleScalar for AlphaAXP), which reported a fair number of checker errors.

-Gabe
          


Store forward of single precision FP value; partial forwards

Date: Fri, 08 Nov 2002 15:25:58 -0800
From: Gabriel Loh
Subject: Re: MASE, possible bug?

> Thanks - got to love FP formats.  A quick websearch shows that isnan is
> defined in math.h so I'm guessing it is portable.  I will add this fix
> eventually.

More FP madness... ;-)

The Alpha instruction LDS loads a single-precision fp value.  In the 
current machine.def, the "implementation" code unpacks the 
single-precision value (a float) into a double-precision value (a 
double).  There's a problem when you do store-to-load forwarding.  In 
mase-exec.c:transfer_store_data, we copy a value from the store's source 
register into mem_value, but we do this based on the architectural 
"mem_size", which does not take into account the fact that our 
implementation unpacks floats into doubles.  So what happens is that we 
only copy 32 bits from the source register, even though the proper value 
has been expanded into a full 64-bit double.  As a temporary hack, I've 
just added an extra conversion:


(in transfer_store_data)

#ifdef TARGET_ALPHA
     if (src->op == STS)
     {
       /* double->float conversion stolen from STS implementation
          in machine.def */
       sqword_t _longhold;							
       sword_t _inthold;							
       enum md_fault_type _fault;						

       _longhold = src_value.q;
       _inthold = (((_longhold >> 32) & ULL(0xc0000000))			
       | ((_longhold >> 29) & ULL(0x3fffffff)));		
       dest->mem_value.w = _inthold;
     }
     else
#endif
     {
       switch (dest->mem_size) {
         case 1: dest->mem_value.b = src_value.b; break;
         case 2: dest->mem_value.h = src_value.h; break;
         case 4: dest->mem_value.w = src_value.w; break;  /*!!!*/
         case 8: dest->mem_value.q = src_value.q; break;
         default: fatal("Unsupported memory size");
       }
     }

/* the case marked !!! is what originally would have been
    called, thus only grabbing 32 bits from a src_value that
    actually contains a 64-bit double. */

Note: instead of the STS conversion code, I initially just tried a 
regular C conversion (mem_value.f = (float)src_value.d), but I actually 
got different (although very close) results -- you can see a single bit 
change if you print out the values in hex in checker_print_errors). 
This means that the current implementations of STS/LDS do not do 
float/double conversions in the same way as gcc 2.96 does (I'm running 
on RH Linux 7.1, x86).  Anyone have any idea why the original 
machine.def implementation didn't just let the compiler handle it?

Of course the "proper" thing to do is to go and implement 
single-precision operations correctly (i.e. not by doing it with 
doubles), but I don't have the time at the moment.  Figured you'd at 
least like to be aware of this bug though.




I was also running into some store forwarding issues for the case when 
the load and store addresses are not equal (but still overlap).  For 
example, in the sqrt() function, the function preamble writes a double 
value (64 bits) to the stack:


   0x12007b060:  9e1e0038        stt     $f16, 56(sp) 


later, we read out only the most significant 32 bits because we want to 
do some bit-twiddling with the f.p. exponent:

   0x12007b078:  a05e003c        ldl     t1, 60(sp) 


(the stack pointer hasn't changed between these two insts, so both 
instructions refer to the same 64-bit piece of memory)

[code is from Spec2k/eon downloaded from simplescalar.com]


The load-long (LDL) does not recognize the store-double (STT) as a 
"parent" store in lsq_refresh, and will go ahead and issue.  The ideal 
thing to do would be to recognize this case and perform the proper 
partial store forwarding.  An easier (in terms of simulator hacking) 
solution is to just recognize this case and prevent the load from 
issuing until the store has cleared out (in the same way a stl->ldq 
partial store forward is currently handled when the addresses are the same).

-Gabe
          

And my "fix" for this
(updated 5/22/03; was missing the break after setting prevent_queue)

> 
> It does indeed seem to be the problem I am getting.
> 
> Eon Twolf and VPR all had between .4% and 1% error rate for 1B ff and 1B 
> simulated state.
> 
> The problem is not so much the performance hit, but rather that my changes 
> to the code change the number of errors, and therefore it would cast their 
> verificability in doubt.
> 
> I'll check out their paper and see what they had to say.
> 
> Do you happen to know if anyone has tackled the problem of correcting 
> this, and do you know if SimpleScalar 3.0 suffers from the same problem?
> 


I do think SS3.0 does suffer from the same problem, although I haven't looked
at it in a while (without the checker infrastructure, who could have known how
much impact it was having?); so I don't remember if they do anything and what
that might be.  What I do right now is that I check for any possible partial
store fowarding cases, and punt on it (hold up the load until the
possible-parent store commits and clears out of the LSQ, and then the load just
grabs its value from the cache).

On line 944 of mase-exec.c (in lsq_refresh, at the end of the if block that
starts on line 898 with "if (sinfo_table[j].addr == load_addr)", I added the
code:

        if (sinfo_table[j].addr == load_addr) {  /* line 898 */
          ...
        }                /* line 944 */
        else
        {
          int size = LSQ[sinfo_table[j].index].mem_size;
          int mask;
          if(LSQ[index].mem_size > size)
            size = LSQ[index].mem_size;
          mask = ~(size-1);
 
          if ((sinfo_table[j].addr&mask) == (load_addr&mask))
          {
            prevent_queue = 1;
            break;
          }
        }

This finds the larger of the two memory sizes between the store and the load,
and uses that to mask out the low order bits of the address when making the
address comparison.  Any overlapping memory references should be detected in
this fashion, and we just set the prevent_queue flag to 1 to prevent this load
from getting its value directly from the store.
          


SQRTT Definition

Date: Sat, 06 Mar 2004 14:57:48 -0800
From: Gabriel Loh
Subject: MASE, minor bug

In target-alpha/alpha.def, the definition for SQRTT is not filled in properly,
which results in sim-mase dying for about five or so of the SpecFP benchmarks
(I'm using the single-sample 100M Sim-Points).  It currently looks like
(alpha.def:2658):

DEFINST(SQRTT,         0x2b,
        "sqrtt",       "B,C",
        NA,            NA,
        DNA, DNA,      DNA, DNA, DNA)


Whereas I believe it should be:

DEFINST(SQRTT,         0x2b,
        "sqrtt",       "B,C",
        FloatSQRT,     F_FCOMP,
        DFPR(RC), DNA, DFPR(RB), DNA, DNA)


Note that in the Simplescalar 3.0d maintenance release, the register specifier
fields are properly filled in for SQRTT, but the op-class and flags fields are
not.  This could lead to some minor accuracy bugs (for example, not allocating
the fp-sqrt unit correctly).

          


Gabriel H. Loh, © 2008
Last generated 3 Oct '08