SM2调研

大整数乘法

基线算法

使用 双精度 中间变量，算法复杂度 $O(n^2)$，进位累加次数 $n*n$
Comba 算法

使用 三精度 中间变量，算法复杂度 $O(n^2)$，进位累加次数 $2*n$
Karatsuba 算法

使用 分而治之 思想，算法复杂度 $O(n^{log_23}) \approx O(n^{1.585})$
Montgomery 算法（蒙哥马利算法）

快速计算模乘、模幂运算（蒙哥马利算法）

大数库

MIRACL

MIRACL Cryptographic SDK: Multiprecision Integer and Rational Arithmetic Cryptographic Library is a C software library that is widely regarded by developers as the gold standard open source SDK for elliptic curve cryptography (ECC).

针对椭圆曲线进行优化
GMP

GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating-point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.

主要用于整数、有理数和浮点数运算，针对椭圆曲线支持较少
OpenSSL

包含大数库

测试

SM2-Sign 中的关键步骤为 $k*G$，其中 $G$ 为椭圆曲线生成元。由于椭圆曲线参数是确定的，因此可针对 $G$ 预先生成查找表，从而加速点乘运算。

SM2-Verify 中的关键步骤为 $s^{\prime}G + r^{\prime}P$, 由于 $P$ 为用户公钥，是可变参数，因此无法针对 $r^{\prime}*P$ 预先生成查找表，只能在验签过程中实时计算部分查找表加速验签过程。

SM2-GmSSL
1. 不预先生成查找表(no EC_KEY_precompute_mult)
  
  wang@ubuntu:~/project/sm2-base$ ./main
  test gmssl sm2 speed in 10s
  sm2 verify count: 8176.000
  time cost: 11.314
  verify speed: 722.662287/s
  test gmssl sm2 verify speed in 10s
  gmssl sm2 verify count: 8176.000
  time cost: 13.246
  verify speed: 617.263539/s
2. 预先生成乘法查找表(EC_KEY_precompute_mult)
  
  test gmssl sm2 speed in 10s
  sm2 verify count: 65520.000
  time cost: 18.185
  verify speed: 3603.007366/s
  test gmssl sm2 verify speed in 10s
  gmssl sm2 verify count: 8176.000
  time cost: 12.835
  verify speed: 637.009360/s

 wang@ubuntu:~$ gmssl speed sm2
                               sign    verify    sign/s verify/s
  <font color=red>256 bit sm2 (sm2p256v1)   0.0003s   0.0015s   3601.3    668.8</font>

最新版
- 两个查找表—关闭 SM2_NO_CONST_TIME
  GmSSL 2.1.0 - OpenSSL 1.1.0d 03 Jan 2018
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
  compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DGMSSL_NO_TURBO -DOPENSSLDIR=”\”/home/wang/local/gmssl\”” -DENGINESDIR=”\”/home/wang/local/gmssl/lib/engines-1.1\”” -Wa,—noexecstack
```
                          sign    verify    sign/s verify/s
```
  256 bit sm2 (sm2p256v1) 0.0002s 0.0008s 5196.5 1306.8
```
                          encrypt decrypt   enc/s  dec/s
```
  256 bit sm2 (sm2p256v1) 0.0009s 0.0007s 1165.1 1397.3
- 两个查找表—开启 SM2_NO_CONST_TIME
  GmSSL 2.1.0 - OpenSSL 1.1.0d 03 Jan 2018
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
  compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DSM2_NO_CONST_TIME -DGMSSL_NO_TURBO -DOPENSSLDIR=”\”/home/wang/local/gmssl\”” -DENGINESDIR=”\”/home/wang/local/gmssl/lib/engines-1.1\”” -Wa,—noexecstack
```
                          sign    verify    sign/s verify/s
```
  256 bit sm2 (sm2p256v1) 0.0002s 0.0008s 5203.7 1331.6
```
                          encrypt decrypt   enc/s  dec/s
```
  256 bit sm2 (sm2p256v1) 0.0009s 0.0007s 1160.1 1423.
- SM2默认曲线——使用NISTP查表（关闭 ec_nistp_64_gcc_128, 开启 SM2_NO_CONST_TIME）
  
  GmSSL 2.5.4 - OpenSSL 1.1.0d 19 Jun 2019
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
  compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DSM2_NO_CONST_TIME -DGMSSL_NO_TURBO -DOPENSSLDIR=”\”/home/wang/local/gmssl\”” -DENGINESDIR=”\”/home/wang/local/gmssl/lib/engines-1.1\”” -Wa,—noexecstack
```
                          sign    verify    sign/s verify/s
```
  256 bit sm2 (sm2p256v1) 0.0002s 0.0007s 6457.7 1340.3
```
                          encrypt decrypt   enc/s  dec/s
```
  256 bit sm2 (sm2p256v1) 0.0008s 0.0007s 1190.5 1406.4
- SM2默认曲线——使用SM2P256查表（开启 ec_nistp_64_gcc_128, 开启 SM2_NO_CONST_TIME）
  
  GmSSL 2.5.4 - OpenSSL 1.1.0d 19 Jun 2019
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
  compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DL_ENDIAN -D__arch64__ -DSM2_NO_CONST_TIME -DGMSSL_NO_TURBO -DOPENSSLDIR=”\”/home/wang/local/gmssl\”” -DENGINESDIR=”\”/home/wang/local/gmssl/lib/engines-1.1\”” -Wa,—noexecstack
```
                          sign    verify    sign/s verify/s
```
  256 bit sm2 (sm2p256v1) 0.0001s 0.0004s 7275.6 2594.0
```
                          encrypt decrypt   enc/s  dec/s
```
  256 bit sm2 (sm2p256v1) 0.0007s 0.0006s 1394.6 1691.9
- SM2默认曲线——32个查找表（开启 ec_nistp_64_gcc_128, 开启 SM2_NO_CONST_TIME, 开启SM2P256TABLE）
  
  GmSSL 2.5.4 - OpenSSL 1.1.0d 19 Jun 2019
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
  compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DSM2P256TABLE -DL_ENDIAN -D__arch64__ -DSM2_NO_CONST_TIME -DGMSSL_NO_TURBO -DOPENSSLDIR=”\”/home/wang/local/gmssl\”” -DENGINESDIR=”\”/home/wang/local/gmssl/lib/engines-1.1\”” -Wa,—noexecstack
```
                          sign    verify    sign/s verify/s
```
  256 bit sm2 (sm2p256v1) 0.0001s 0.0003s 12838.6 2915.4
```
                          encrypt decrypt   enc/s  dec/s
```
  256 bit sm2 (sm2p256v1) 0.0007s 0.0006s 1512.6 1718.2

# gmssl ecparam -text -noout -name sm2p256v1 -param_enc explicit
Field Type: prime-field
Prime:
    00:ff:ff:ff:fe:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:
    ff:ff:ff:ff:ff:ff:00:00:00:00:ff:ff:ff:ff:ff:
    ff:ff:ff
A:   
    00:ff:ff:ff:fe:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:
    ff:ff:ff:ff:ff:ff:00:00:00:00:ff:ff:ff:ff:ff:
    ff:ff:fc
B:   
    28:e9:fa:9e:9d:9f:5e:34:4d:5a:9e:4b:cf:65:09:
    a7:f3:97:89:f5:15:ab:8f:92:dd:bc:bd:41:4d:94:
    0e:93
Generator (uncompressed):
    04:32:c4:ae:2c:1f:19:81:19:5f:99:04:46:6a:39:
    c9:94:8f:e3:0b:bf:f2:66:0b:e1:71:5a:45:89:33:
    4c:74:c7:bc:37:36:a2:f4:f6:77:9c:59:bd:ce:e3:
    6b:69:21:53:d0:a9:87:7c:c6:2a:47:40:02:df:32:
    e5:21:39:f0:a0
Order: 
    00:ff:ff:ff:fe:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:
    ff:ff:72:03:df:6b:21:c6:05:2b:53:bb:f4:09:39:
    d5:41:23
Cofactor:  1 (0x1)

对于 P256 上的一个点 x，其大小为 256 bits = 32 bytes = 8 words = 4 dwords，对于定义在 P256 上的椭圆曲线点 G=(Gx, Gy)，其大小为 8 dwords。考虑 $k*G$：

$\begin{align} k*G &= \left(k_{3}*2^{192} + k_{2}*2^{128} + k_{1}*2^{64} + k_{0}\right)*G\\ &= \left((k_{3,1}*2^{32}+k_{3,0})*2^{192} + \right.\\ &\quad\quad (k_{2,1}*2^{32}+k_{2,0})*2^{128} + \\ &\quad\quad (k_{1,1}*2^{32}+k_{1,0})*2^{64} + \\ &\left.\quad\quad (k_{0,1}*2^{32}+k_{0,0})\right)*G \\ &=\left(k_{3,1}*2^{192+32}+k_{2,1}*2^{128+32}+k_{1,1}*2^{64+32}+k_{0,1}*2^{32}\right)*G+\\ &\quad \left(k_{3,0}*2^{192}+k_{2,0}*2^{128}+k_{1,0}*2^{64}+k_{0,0}\right)*G \end{align}$

/*-
 * Base point pre computation
 * --------------------------
 *
 * Two different sorts of precomputed tables are used in the following code.
 * Each contain various points on the curve, where each point is three field
 * elements (x, y, z).
 *
 * For the base point table, z is usually 1 (0 for the point at infinity).
 * This table has 2 * 16 elements, starting with the following:
 * index | bits    | point
 * ------+---------+------------------------------
 *     0 | 0 0 0 0 | 0G
 *     1 | 0 0 0 1 | 1G
 *     2 | 0 0 1 0 | 2^64G
 *     3 | 0 0 1 1 | (2^64 + 1)G
 *     4 | 0 1 0 0 | 2^128G
 *     5 | 0 1 0 1 | (2^128 + 1)G
 *     6 | 0 1 1 0 | (2^128 + 2^64)G
 *     7 | 0 1 1 1 | (2^128 + 2^64 + 1)G
 *     8 | 1 0 0 0 | 2^192G
 *     9 | 1 0 0 1 | (2^192 + 1)G
 *    10 | 1 0 1 0 | (2^192 + 2^64)G
 *    11 | 1 0 1 1 | (2^192 + 2^64 + 1)G
 *    12 | 1 1 0 0 | (2^192 + 2^128)G5
 *    13 | 1 1 0 1 | (2^192 + 2^128 + 1)G
 *    14 | 1 1 1 0 | (2^192 + 2^128 + 2^64)G
 *    15 | 1 1 1 1 | (2^192 + 2^128 + 2^64 + 1)G
 * followed by a copy of this with each element multiplied by 2^32.
 *
 * The reason for this is so that we can clock bits into four different
 * locations when doing simple scalar multiplies against the base point,
 * and then another four locations using the second 16 elements.
 *
 * Tables for other points have table[i] = iG for i in 0 .. 16. */



/*
 * select_point selects the |idx|th point from a precomputation table and
 * copies it to out.
 */
static void select_point(const u64 idx, unsigned int size,
                         const smallfelem pre_comp[16][3], smallfelem out[3])
{
    unsigned j;
    u64 *outlimbs = &out[0][0];

#ifdef SM2_NO_CONST_TIME
    const u64 *inlimbs = (u64 *)&pre_comp[idx][0][0];
    for (j = 0; j < NLIMBS * 3; j++) {
        outlimbs[j] = inlimbs[j];
    }
#else
    int i;
    memset(out, 0, sizeof(*out) * 3);

    for (i = 0; i < size; i++) {
        const u64 *inlimbs = (u64 *)&pre_comp[i][0][0];
        u64 mask = i ^ idx;
        mask |= mask >> 4;
        mask |= mask >> 2;
        mask |= mask >> 1;
        mask &= 1;
        mask--;
        for (j = 0; j < NLIMBS * 3; j++)
            outlimbs[j] |= inlimbs[j] & mask;
    }
#endif
}

/*
 * Interleaved point multiplication using precomputed point multiples: The
 * small point multiples 0*P, 1*P, ..., 17*P are in pre_comp[], the scalars
 * in scalars[]. If g_scalar is non-NULL, we also add this multiple of the
 * generator, using certain (large) precomputed multiples in g_pre_comp.
 * Output point (X, Y, Z) is stored in x_out, y_out, z_out
 */
static void batch_mul(felem x_out, felem y_out, felem z_out,
                      const felem_bytearray scalars[],
                      const unsigned num_points, const u8 *g_scalar,
                      const int mixed, const smallfelem pre_comp[][17][3],
                      const smallfelem g_pre_comp[2][16][3])
{
    int i, skip;
    unsigned num, gen_mul = (g_scalar != NULL);
    felem nq[3], ftmp;
    smallfelem tmp[3];
    u64 bits;
    u8 sign, digit;

    /* set nq to the point at infinity */
    memset(nq, 0, sizeof(nq));


    /*
     * Loop over all scalars msb-to-lsb, interleaving additions of multiples
     * of the generator (two in each of the last 32 rounds) and additions of
     * other points multiples (every 5th round).
     */
    skip = 1;                   /* save two point operations in the first
                                 * round */
    for (i = (num_points ? 255 : 31); i >= 0; --i) {
        /* double */
        if (!skip)
            point_double(nq[0], nq[1], nq[2], nq[0], nq[1], nq[2]);

        /* add multiples of the generator */
        if (gen_mul && (i <= 31)) {
            /* first, look 32 bits upwards */
            bits = get_bit(g_scalar, i + 224) << 3;
            bits |= get_bit(g_scalar, i + 160) << 2;
            bits |= get_bit(g_scalar, i + 96) << 1;
            bits |= get_bit(g_scalar, i + 32);
            /* select the point to add, in constant time */
            select_point(bits, 16, g_pre_comp[1], tmp);

            if (!skip) {
                /* Arg 1 below is for "mixed" */
                point_add(nq[0], nq[1], nq[2],
                          nq[0], nq[1], nq[2], 1, tmp[0], tmp[1], tmp[2]);
            } else {
                smallfelem_expand(nq[0], tmp[0]);
                smallfelem_expand(nq[1], tmp[1]);
                smallfelem_expand(nq[2], tmp[2]);
                skip = 0;
            }

            /* second, look at the current position */
            bits = get_bit(g_scalar, i + 192) << 3;
            bits |= get_bit(g_scalar, i + 128) << 2;
            bits |= get_bit(g_scalar, i + 64) << 1;
            bits |= get_bit(g_scalar, i);
            /* select the point to add, in constant time */
            select_point(bits, 16, g_pre_comp[0], tmp);
            /* Arg 1 below is for "mixed" */
            point_add(nq[0], nq[1], nq[2],
                      nq[0], nq[1], nq[2], 1, tmp[0], tmp[1], tmp[2]);
        }

        /* do other additions every 5 doublings */
        if (num_points && (i % 5 == 0)) {
            /* loop over all scalars */
            for (num = 0; num < num_points; ++num) {
                bits = get_bit(scalars[num], i + 4) << 5;
                bits |= get_bit(scalars[num], i + 3) << 4;
                bits |= get_bit(scalars[num], i + 2) << 3;
                bits |= get_bit(scalars[num], i + 1) << 2;
                bits |= get_bit(scalars[num], i) << 1;
                bits |= get_bit(scalars[num], i - 1);
                ec_GFp_nistp_recode_scalar_bits(&sign, &digit, bits);
                /*
                 * select the point to add or subtract, in constant time
                 */
                select_point(digit, 17, pre_comp[num], tmp);
                smallfelem_neg(ftmp, tmp[1]); /* (X, -Y, Z) is the negative
                                               * point */
                copy_small_conditional(ftmp, tmp[1], (((limb) sign) - 1));
                felem_contract(tmp[1], ftmp);

                if (!skip) {
                    point_add(nq[0], nq[1], nq[2],
                              nq[0], nq[1], nq[2],
                              mixed, tmp[0], tmp[1], tmp[2]);
                } else {
                    smallfelem_expand(nq[0], tmp[0]);
                    smallfelem_expand(nq[1], tmp[1]);
                    smallfelem_expand(nq[2], tmp[2]);
                    skip = 0;
                }
            }
        }
    }
    felem_assign(x_out, nq[0]);
    felem_assign(y_out, nq[1]);
    felem_assign(z_out, nq[2]);
}

GmSSL SM2 依赖项

wang@ubuntu:~/project/sm2-base$ gcc -M sm2test.c | grep openssl
/usr/local/include/openssl/bn.h /usr/local/include/openssl/e_os2.h \
/usr/local/include/openssl/opensslconf.h /usr/include/inttypes.h \
/usr/local/include/openssl/ossl_typ.h \
/usr/local/include/openssl/crypto.h /usr/include/time.h \
/usr/local/include/openssl/stack.h \
/usr/local/include/openssl/safestack.h \
/usr/local/include/openssl/opensslv.h \
/usr/local/include/openssl/symhacks.h /usr/include/pthread.h \
/usr/local/include/openssl/ec.h /usr/local/include/openssl/asn1.h \
/usr/local/include/openssl/bio.h /usr/local/include/openssl/evp.h \
/usr/local/include/openssl/objects.h \
/usr/local/include/openssl/obj_mac.h /usr/local/include/openssl/rand.h \
/usr/local/include/openssl/sm2.h /usr/local/include/openssl/err.h \
/usr/local/include/openssl/lhash.h /usr/include/errno.h \
/usr/local/include/openssl/kdf2.h /usr/local/include/openssl/kdf.h \
/usr/local/include/openssl/x509.h /usr/local/include/openssl/buffer.h \
/usr/local/include/openssl/paillier.h /usr/local/include/openssl/sm9.h \
/usr/local/include/openssl/rsa.h /usr/local/include/openssl/dsa.h \
/usr/local/include/openssl/dh.h /usr/local/include/openssl/sha.h \
/usr/local/include/openssl/x509_vfy.h /usr/local/include/openssl/pkcs7.h \
/usr/local/include/openssl/ecies.h /usr/local/include/openssl/sm3.h