mirror of
git://git.gnupg.org/gnupg.git
synced 2024-11-04 20:38:50 +01:00
.. | ||
distfiles | ||
mpih-add1.S | ||
mpih-shift.S | ||
mpih-sub1.S | ||
README | ||
udiv-qrnnd.S |
This directory contains mpn functions for various HP PA-RISC chips. Code that runs faster on the PA7100 and later implementations, is in the pa7100 directory. RELEVANT OPTIMIZATION ISSUES Load and Store timing On the PA7000 no memory instructions can issue the two cycles after a store. For the PA7100, this is reduced to one cycle. The PA7100 has a lookup-free cache, so it helps to schedule loads and the dependent instruction really far from each other. STATUS 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the instructions bwlow (but some sw pipelining is needed to avoid the xmpyu-fstds delay): fldds s1_ptr xmpyu fstds N(%r30) xmpyu fstds N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) addc stws res_ptr addc stws res_ptr addib Loop 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb (asymptotically) on the PA7100, using the instructions below. With proper sw pipelining and the unrolling level below, the speed becomes 8 cycles/limb. fldds s1_ptr fldds s1_ptr xmpyu fstds N(%r30) xmpyu fstds N(%r30) xmpyu fstds N(%r30) xmpyu fstds N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) addc addc addc addc addc %r0,%r0,cy-limb ldws res_ptr ldws res_ptr ldws res_ptr ldws res_ptr add stws res_ptr addc stws res_ptr addc stws res_ptr addc stws res_ptr addib