[Common-dev] Bug in HXAtomicAddUINT32()?
Wolfgang Schildbach wschildbach at helixcommunity.orgI believe there is a subtle bug in all of our 0x80000000-marker based
HXAtomicAddUINT32() implementations. This bug has probably never been
exercised, and probably never will, but I'd like to put this forth for
discussion nevertheless.
This is the description for our "marker algorithm":
/*
* This implementation sacrifices being able to store the value
* 0x800000000 in the INT32 value, which is a special "busy" marker value.
* Since these are intended for use primarily with AddRef/Release and
* resource usage counters, this should be acceptable for now. If a counter
* is incremented to the point it would conflict with the flag, it is
* incremented one more to hop over it. The same in reverse for decrement.
*
* Basic design of the flag-based implementation:
* 1. Load a register with 0x80000000
* 2. _atomically_ swap it with the INT32 (critical!)
* 3. Compare what we got with 0x80000000
* 4. Branch if equal to #2
* 5. Increment (or decrement) the result
* 6. Compare to 0x80000000
* 7. Increment (or decrement) again if equal
* 8. Save the new value to the INT32's location in memory
* 9. Return new INT32 result if required
*/
The problem appears if HXAtomicAddUINT32() and HXAtomicDecUINT32() are
mixed (and I don't think there is a rule that says they can't). Consider
this scenario:
counter operation
0x7ffffffe Inc
0x7fffffff Add(2)
0x80000001 Dec
0x80000000 (intermediate value 0x80000000 is decremented again)
0x7fffffff Sub(2)
0x7ffffffd
Note that we end up with a different counter than we started with. The
problem is that step (6) compares against equality. It rather should check
against wraparound/overflow (as the comments in the assembly say).
Here is my proposal for a solution:
// implement these using atomic swap assembly insn
static HX_INLINE_ALWAYS UINT32 hxAtomicSwap(UINT32* pNum, UINT32 b)
{ UINT32 c = *pNum; *pNum = b; return c ; }
static HX_INLINE_ALWAYS UINT32 hxAddAvoidMarker(UINT32 a, UINT32 b)
{
// can implement this with overflow flag if provided by CPU
UINT32 ovf = a ;
a += b ;
ovf = ~ovf & (ovf ^ a) ; /* if a was < 0x80000000 before, and is
now >= marker,set bit 31 */
return a + (ovf >> 31) ; /* check for overflow and increment again if
so */
}
static HX_INLINE_ALWAYS UINT32 hxSubAvoidMarker(UINT32 a, UINT32 b)
{
// can implement this with overflow flag if provided by CPU
UINT32 ovf = a ;
a -= (b+1) ;
ovf = ~(ovf & (ovf ^ a)) ; /* if a was > 0x80000000 before, and is now
< marker,set bit 31*/
return a + (ovf >> 31) ; /* check for overflow and increment again
if so */
}
static HX_INLINE_ALWAYS UINT32 HXAtomicAddRetUINT32(UINT32* pNum, UINT32 ulNum)
{
UINT32 b,c = 0x80000000 ;
do {b = hxAtomicSwap(pNum, c);} while (b == c) ;
return (*pNum = hxAddAvoidMarker(b,ulNum));
}
static HX_INLINE_ALWAYS UINT32 HXAtomicSubRetUINT32(UINT32* pNum, UINT32 ulNum)
{
UINT32 b,c = 0x80000000 ;
do {b = hxAtomicSwap(pNum, c);} while (b == c) ;
return (*pNum = hxSubAvoidMarker(b,ulNum));
}
Of course, this may be considered overkill since the fault case is so
extremely improbable. However, if overflow detection is done in assembly,
the code can actually be more compact than what we are using now:
static HX_INLINE_ALWAYS UINT32 hxAtomicSwap(UINT32* pNum, UINT32 a)
{ // ARM ADS
__asm { swp a, a, [pNum] }
return a ;
}
static HX_INLINE_ALWAYS UINT32 hxAddAvoidMarker(UINT32 a, UINT32 b)
{
__asm { adds a,a,b; addvs a,a,#1 }
return a ;
}
results in this:
HXAtomicAddRetUINT32 PROC
MOV r2,r0
MOV r3,#0x80000000
|L2.8|
SWP r0,r3,[r2]
CMP r0,r3
BEQ |L2.8|
ADDS r0,r0,r1
ADDVS r0,r0,#1
STR r0,[r2,#0]
MOV pc,lr
ENDP
Comments?
-
- Wolfgang Schildbach, Principal Codec Engineer, RealNetworks Codec Group
- email wschildbach at helixcommunity.org