Kai mandagiai paprasiau kompiliatoriaus siek tiek paoptimizuot, rezultatai
pasidare vos vos tikslesni:
Kodas:
0.00000000000000000000 0.00000000000000000000
0.10000000149011612000 0.10000000000000001000
0.20000000298023224000 0.20000000000000001000
0.30000000447034836000 0.30000000000000004000
0.40000001341104507000 0.40000000000000002000
0.50000000745058060000 0.50000000000000000000
0.60000000149011612000 0.59999999999999998000
0.70000002533197403000 0.69999999999999996000
0.80000004917383194000 0.79999999999999993000
0.90000007301568985000 0.89999999999999991000
1.00000009685754780000 0.99999999999999989000
1.10000012069940570000 1.09999999999999990000
1.20000014454126360000 1.20000000000000000000
1.30000016838312150000 1.30000000000000000000
1.40000019222497940000 1.40000000000000010000
1.50000021606683730000 1.50000000000000020000
1.60000023990869520000 1.60000000000000030000
1.70000026375055310000 1.70000000000000040000
1.80000028759241100000 1.80000000000000050000
1.90000031143426900000 1.90000000000000060000
102.00000182539225000000 102.00000000000000000000
Priezhastis:
Neoptimizuotas kodas:
Kodas:
LC1:
.long 1036831949 /* (float) 0.1f */
.align 8
LC2:
.long -1717986918 /* (double) 0.1 (upper 4 bytes) */
.long 1069128089 /* (double) 0.1 (lower 4 bytes) */
/* ..... */
/* the loop */
L3:
movl $1000, -4(%ebp) /* for (i = 1000 */
L6:
leal -4(%ebp), %eax
decl (%eax) /* i-- */
cmpl $-1, -4(%ebp) /* i >= 0 */
jne L9 /* if so, go to body */
jmp L7 /* otherwise, jump to call printf */
L9:
flds -8(%ebp) /* "download" f from memory */
flds LC1 /* "download" 0.1f from memory */
faddp %st, %st(1) /* add */
fstps -8(%ebp) /* "upload" result to memory */
fldl -16(%ebp) /* "download" d from memory */
fldl LC2 /* "download" 1.0 from memory */
faddp %st, %st(1) /* add */
fstpl -16(%ebp) /* "upload" result to memory */
jmp L6 /* end of for */
L7:
/* printf goes here */
leave
ret
Optimizuotas kodas:
Kodas:
LC3:
.long 1036831949 /* (float) 0.1f */
.align 8
LC4:
.long -1717986918 /* (double) 0.1 (upper 4 bytes) */
.long 1069128089 /* (double) 0.1 (lower 4 bytes) */
/* ..... */
/* the loop */
movl $999, %ebx /* for (i = 1000 */
L11:
/* at this point st(0) contains d, st(1) contains f */
fxch %st(1) /* swap st(0) with st(1) */
decl %ebx /* i-- */
fadds LC3 /* add 1.0f to st(0) (i.e. f) */
fxch %st(1) /* swap st(0) with st(1) */
faddl LC4 /* add 1.0 to st(0) (i.e. d) */
cmpl $-1, %ebx /* i >= 0 */
jne L11 /* end of for */
/* printf goes here */
leave
ret
Akivaizdu, kad tikslumo skirtumas atsiranda del to, kad neoptimizuotoj
versijoj po kiekvieno += rezultatas yra talpinamas i atminti. Tuo
tarpu optimizuotoj visas ciklas ivykdomas "neishleidzhiant" f ir d
ish FPU.
Jeigu ash teisingai supratau, taip vyksta del to, kad Intelio procesoriai
visus vidinius skaichiavimus atlieka 80 bitu tikslumu ir tik
kopijuojant i/ish FPU atlieka konvertavima i reikiama (32/64 bitu)
formata. Taigi neoptimizuotoj versijoj paklaida atsiranda del to, kad
po kiekvieno veiksmo rezultatas suapvalinamas iki 64 bitu (double
precision floating point).
Any floating point gurus around? Patvirtinkit arba paneikit.
BTW, ta serija devynetu spausdinant double'a atsiranda del paklaidos,
susikaupianchios printf() viduje, ne del paklaidos skaichiavimuose.
-rtfb