0%

SM2调研

大整数乘法

  • 基线算法

    使用 双精度 中间变量,算法复杂度 $O(n^2)$,进位累加次数 $n*n$

  • Comba 算法

    使用 三精度 中间变量,算法复杂度 $O(n^2)$,进位累加次数 $2*n$

  • Karatsuba 算法

    使用 分而治之 思想,算法复杂度 $O(n^{log_23}) \approx O(n^{1.585})$

  • Montgomery 算法(蒙哥马利算法)

    快速计算模乘、模幂运算(蒙哥马利算法

大数库

  • MIRACL

    MIRACL Cryptographic SDK: Multiprecision Integer and Rational Arithmetic Cryptographic Library is a C software library that is widely regarded by developers as the gold standard open source SDK for elliptic curve cryptography (ECC).

    针对椭圆曲线进行优化

  • GMP

    GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating-point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.

    主要用于整数、有理数和浮点数运算,针对椭圆曲线支持较少

  • OpenSSL

    包含大数库

测试

SM2-Sign 中的关键步骤为 $k*G$,其中 $G$ 为椭圆曲线生成元。由于椭圆曲线参数是确定的,因此可针对 $G$ 预先生成查找表,从而加速点乘运算。

SM2-Verify 中的关键步骤为 $s^{\prime}G + r^{\prime}P$, 由于 $P$ 为用户公钥,是可变参数,因此无法针对 $r^{\prime}*P$ 预先生成查找表,只能在验签过程中实时计算部分查找表加速验签过程。

  • SM2-GmSSL

    1. 不预先生成查找表(no EC_KEY_precompute_mult)

      wang@ubuntu:~/project/sm2-base$ ./main
      test gmssl sm2 speed in 10s
      sm2 verify count: 8176.000
      time cost: 11.314

      verify speed: 722.662287/s

      test gmssl sm2 verify speed in 10s
      gmssl sm2 verify count: 8176.000
      time cost: 13.246

      verify speed: 617.263539/s
    2. 预先生成乘法查找表(EC_KEY_precompute_mult)

      test gmssl sm2 speed in 10s
      sm2 verify count: 65520.000
      time cost: 18.185

      verify speed: 3603.007366/s

      test gmssl sm2 verify speed in 10s
      gmssl sm2 verify count: 8176.000
      time cost: 12.835

      verify speed: 637.009360/s
 wang@ubuntu:~$ gmssl speed sm2
                               sign    verify    sign/s verify/s
  <font color=red>256 bit sm2 (sm2p256v1)   0.0003s   0.0015s   3601.3    668.8</font>
  1. 最新版

    • 两个查找表—关闭 SM2_NO_CONST_TIME
      GmSSL 2.1.0 - OpenSSL 1.1.0d 03 Jan 2018
      built on: reproducible build, date unspecified
      options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
      compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DGMSSL_NO_TURBO -DOPENSSLDIR=”\”/home/wang/local/gmssl\”” -DENGINESDIR=”\”/home/wang/local/gmssl/lib/engines-1.1\”” -Wa,—noexecstack

                                sign    verify    sign/s verify/s
      

      256 bit sm2 (sm2p256v1) 0.0002s 0.0008s 5196.5 1306.8

                                encrypt decrypt   enc/s  dec/s
      

      256 bit sm2 (sm2p256v1) 0.0009s 0.0007s 1165.1 1397.3

    • 两个查找表—开启 SM2_NO_CONST_TIME
      GmSSL 2.1.0 - OpenSSL 1.1.0d 03 Jan 2018
      built on: reproducible build, date unspecified
      options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
      compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DSM2_NO_CONST_TIME -DGMSSL_NO_TURBO -DOPENSSLDIR=”\”/home/wang/local/gmssl\”” -DENGINESDIR=”\”/home/wang/local/gmssl/lib/engines-1.1\”” -Wa,—noexecstack

                                sign    verify    sign/s verify/s
      

      256 bit sm2 (sm2p256v1) 0.0002s 0.0008s 5203.7 1331.6

                                encrypt decrypt   enc/s  dec/s
      

      256 bit sm2 (sm2p256v1) 0.0009s 0.0007s 1160.1 1423.

    • SM2默认曲线——使用NISTP查表(关闭 ec_nistp_64_gcc_128, 开启 SM2_NO_CONST_TIME

      GmSSL 2.5.4 - OpenSSL 1.1.0d 19 Jun 2019
      built on: reproducible build, date unspecified
      options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
      compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DSM2_NO_CONST_TIME -DGMSSL_NO_TURBO -DOPENSSLDIR=”\”/home/wang/local/gmssl\”” -DENGINESDIR=”\”/home/wang/local/gmssl/lib/engines-1.1\”” -Wa,—noexecstack

                                sign    verify    sign/s verify/s
      

      256 bit sm2 (sm2p256v1) 0.0002s 0.0007s 6457.7 1340.3

                                encrypt decrypt   enc/s  dec/s
      

      256 bit sm2 (sm2p256v1) 0.0008s 0.0007s 1190.5 1406.4

    • SM2默认曲线——使用SM2P256查表(开启 ec_nistp_64_gcc_128, 开启 SM2_NO_CONST_TIME

      GmSSL 2.5.4 - OpenSSL 1.1.0d 19 Jun 2019
      built on: reproducible build, date unspecified
      options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
      compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DL_ENDIAN -D__arch64__ -DSM2_NO_CONST_TIME -DGMSSL_NO_TURBO -DOPENSSLDIR=”\”/home/wang/local/gmssl\”” -DENGINESDIR=”\”/home/wang/local/gmssl/lib/engines-1.1\”” -Wa,—noexecstack

                                sign    verify    sign/s verify/s
      

      256 bit sm2 (sm2p256v1) 0.0001s 0.0004s 7275.6 2594.0

                                encrypt decrypt   enc/s  dec/s
      

      256 bit sm2 (sm2p256v1) 0.0007s 0.0006s 1394.6 1691.9

    • SM2默认曲线——32个查找表(开启 ec_nistp_64_gcc_128, 开启 SM2_NO_CONST_TIME, 开启SM2P256TABLE

      GmSSL 2.5.4 - OpenSSL 1.1.0d 19 Jun 2019
      built on: reproducible build, date unspecified
      options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
      compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DSM2P256TABLE -DL_ENDIAN -D__arch64__ -DSM2_NO_CONST_TIME -DGMSSL_NO_TURBO -DOPENSSLDIR=”\”/home/wang/local/gmssl\”” -DENGINESDIR=”\”/home/wang/local/gmssl/lib/engines-1.1\”” -Wa,—noexecstack

                                sign    verify    sign/s verify/s
      

      256 bit sm2 (sm2p256v1) 0.0001s 0.0003s 12838.6 2915.4

                                encrypt decrypt   enc/s  dec/s
      

      256 bit sm2 (sm2p256v1) 0.0007s 0.0006s 1512.6 1718.2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# gmssl ecparam -text -noout -name sm2p256v1 -param_enc explicit
Field Type: prime-field
Prime:
00:ff:ff:ff:fe:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:
ff:ff:ff:ff:ff:ff:00:00:00:00:ff:ff:ff:ff:ff:
ff:ff:ff
A:
00:ff:ff:ff:fe:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:
ff:ff:ff:ff:ff:ff:00:00:00:00:ff:ff:ff:ff:ff:
ff:ff:fc
B:
28:e9:fa:9e:9d:9f:5e:34:4d:5a:9e:4b:cf:65:09:
a7:f3:97:89:f5:15:ab:8f:92:dd:bc:bd:41:4d:94:
0e:93
Generator (uncompressed):
04:32:c4:ae:2c:1f:19:81:19:5f:99:04:46:6a:39:
c9:94:8f:e3:0b:bf:f2:66:0b:e1:71:5a:45:89:33:
4c:74:c7:bc:37:36:a2:f4:f6:77:9c:59:bd:ce:e3:
6b:69:21:53:d0:a9:87:7c:c6:2a:47:40:02:df:32:
e5:21:39:f0:a0
Order:
00:ff:ff:ff:fe:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:
ff:ff:72:03:df:6b:21:c6:05:2b:53:bb:f4:09:39:
d5:41:23
Cofactor: 1 (0x1)

对于 P256 上的一个点 x,其大小为 256 bits = 32 bytes = 8 words = 4 dwords,对于定义在 P256 上的椭圆曲线点 G=(Gx, Gy),其大小为 8 dwords。考虑 $k*G$:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/*-
* Base point pre computation
* --------------------------
*
* Two different sorts of precomputed tables are used in the following code.
* Each contain various points on the curve, where each point is three field
* elements (x, y, z).
*
* For the base point table, z is usually 1 (0 for the point at infinity).
* This table has 2 * 16 elements, starting with the following:
* index | bits | point
* ------+---------+------------------------------
* 0 | 0 0 0 0 | 0G
* 1 | 0 0 0 1 | 1G
* 2 | 0 0 1 0 | 2^64G
* 3 | 0 0 1 1 | (2^64 + 1)G
* 4 | 0 1 0 0 | 2^128G
* 5 | 0 1 0 1 | (2^128 + 1)G
* 6 | 0 1 1 0 | (2^128 + 2^64)G
* 7 | 0 1 1 1 | (2^128 + 2^64 + 1)G
* 8 | 1 0 0 0 | 2^192G
* 9 | 1 0 0 1 | (2^192 + 1)G
* 10 | 1 0 1 0 | (2^192 + 2^64)G
* 11 | 1 0 1 1 | (2^192 + 2^64 + 1)G
* 12 | 1 1 0 0 | (2^192 + 2^128)G5
* 13 | 1 1 0 1 | (2^192 + 2^128 + 1)G
* 14 | 1 1 1 0 | (2^192 + 2^128 + 2^64)G
* 15 | 1 1 1 1 | (2^192 + 2^128 + 2^64 + 1)G
* followed by a copy of this with each element multiplied by 2^32.
*
* The reason for this is so that we can clock bits into four different
* locations when doing simple scalar multiplies against the base point,
* and then another four locations using the second 16 elements.
*
* Tables for other points have table[i] = iG for i in 0 .. 16. */
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141


/*
* select_point selects the |idx|th point from a precomputation table and
* copies it to out.
*/
static void select_point(const u64 idx, unsigned int size,
const smallfelem pre_comp[16][3], smallfelem out[3])
{
unsigned j;
u64 *outlimbs = &out[0][0];

#ifdef SM2_NO_CONST_TIME
const u64 *inlimbs = (u64 *)&pre_comp[idx][0][0];
for (j = 0; j < NLIMBS * 3; j++) {
outlimbs[j] = inlimbs[j];
}
#else
int i;
memset(out, 0, sizeof(*out) * 3);

for (i = 0; i < size; i++) {
const u64 *inlimbs = (u64 *)&pre_comp[i][0][0];
u64 mask = i ^ idx;
mask |= mask >> 4;
mask |= mask >> 2;
mask |= mask >> 1;
mask &= 1;
mask--;
for (j = 0; j < NLIMBS * 3; j++)
outlimbs[j] |= inlimbs[j] & mask;
}
#endif
}

/*
* Interleaved point multiplication using precomputed point multiples: The
* small point multiples 0*P, 1*P, ..., 17*P are in pre_comp[], the scalars
* in scalars[]. If g_scalar is non-NULL, we also add this multiple of the
* generator, using certain (large) precomputed multiples in g_pre_comp.
* Output point (X, Y, Z) is stored in x_out, y_out, z_out
*/
static void batch_mul(felem x_out, felem y_out, felem z_out,
const felem_bytearray scalars[],
const unsigned num_points, const u8 *g_scalar,
const int mixed, const smallfelem pre_comp[][17][3],
const smallfelem g_pre_comp[2][16][3])
{
int i, skip;
unsigned num, gen_mul = (g_scalar != NULL);
felem nq[3], ftmp;
smallfelem tmp[3];
u64 bits;
u8 sign, digit;

/* set nq to the point at infinity */
memset(nq, 0, sizeof(nq));


/*
* Loop over all scalars msb-to-lsb, interleaving additions of multiples
* of the generator (two in each of the last 32 rounds) and additions of
* other points multiples (every 5th round).
*/
skip = 1; /* save two point operations in the first
* round */
for (i = (num_points ? 255 : 31); i >= 0; --i) {
/* double */
if (!skip)
point_double(nq[0], nq[1], nq[2], nq[0], nq[1], nq[2]);

/* add multiples of the generator */
if (gen_mul && (i <= 31)) {
/* first, look 32 bits upwards */
bits = get_bit(g_scalar, i + 224) << 3;
bits |= get_bit(g_scalar, i + 160) << 2;
bits |= get_bit(g_scalar, i + 96) << 1;
bits |= get_bit(g_scalar, i + 32);
/* select the point to add, in constant time */
select_point(bits, 16, g_pre_comp[1], tmp);

if (!skip) {
/* Arg 1 below is for "mixed" */
point_add(nq[0], nq[1], nq[2],
nq[0], nq[1], nq[2], 1, tmp[0], tmp[1], tmp[2]);
} else {
smallfelem_expand(nq[0], tmp[0]);
smallfelem_expand(nq[1], tmp[1]);
smallfelem_expand(nq[2], tmp[2]);
skip = 0;
}

/* second, look at the current position */
bits = get_bit(g_scalar, i + 192) << 3;
bits |= get_bit(g_scalar, i + 128) << 2;
bits |= get_bit(g_scalar, i + 64) << 1;
bits |= get_bit(g_scalar, i);
/* select the point to add, in constant time */
select_point(bits, 16, g_pre_comp[0], tmp);
/* Arg 1 below is for "mixed" */
point_add(nq[0], nq[1], nq[2],
nq[0], nq[1], nq[2], 1, tmp[0], tmp[1], tmp[2]);
}

/* do other additions every 5 doublings */
if (num_points && (i % 5 == 0)) {
/* loop over all scalars */
for (num = 0; num < num_points; ++num) {
bits = get_bit(scalars[num], i + 4) << 5;
bits |= get_bit(scalars[num], i + 3) << 4;
bits |= get_bit(scalars[num], i + 2) << 3;
bits |= get_bit(scalars[num], i + 1) << 2;
bits |= get_bit(scalars[num], i) << 1;
bits |= get_bit(scalars[num], i - 1);
ec_GFp_nistp_recode_scalar_bits(&sign, &digit, bits);
/*
* select the point to add or subtract, in constant time
*/
select_point(digit, 17, pre_comp[num], tmp);
smallfelem_neg(ftmp, tmp[1]); /* (X, -Y, Z) is the negative
* point */
copy_small_conditional(ftmp, tmp[1], (((limb) sign) - 1));
felem_contract(tmp[1], ftmp);

if (!skip) {
point_add(nq[0], nq[1], nq[2],
nq[0], nq[1], nq[2],
mixed, tmp[0], tmp[1], tmp[2]);
} else {
smallfelem_expand(nq[0], tmp[0]);
smallfelem_expand(nq[1], tmp[1]);
smallfelem_expand(nq[2], tmp[2]);
skip = 0;
}
}
}
}
felem_assign(x_out, nq[0]);
felem_assign(y_out, nq[1]);
felem_assign(z_out, nq[2]);
}

GmSSL SM2 依赖项

wang@ubuntu:~/project/sm2-base$ gcc -M sm2test.c | grep openssl
/usr/local/include/openssl/bn.h /usr/local/include/openssl/e_os2.h \
/usr/local/include/openssl/opensslconf.h /usr/include/inttypes.h \
/usr/local/include/openssl/ossl_typ.h \
/usr/local/include/openssl/crypto.h /usr/include/time.h \
/usr/local/include/openssl/stack.h \
/usr/local/include/openssl/safestack.h \
/usr/local/include/openssl/opensslv.h \
/usr/local/include/openssl/symhacks.h /usr/include/pthread.h \
/usr/local/include/openssl/ec.h /usr/local/include/openssl/asn1.h \
/usr/local/include/openssl/bio.h /usr/local/include/openssl/evp.h \
/usr/local/include/openssl/objects.h \
/usr/local/include/openssl/obj_mac.h /usr/local/include/openssl/rand.h \
/usr/local/include/openssl/sm2.h /usr/local/include/openssl/err.h \
/usr/local/include/openssl/lhash.h /usr/include/errno.h \
/usr/local/include/openssl/kdf2.h /usr/local/include/openssl/kdf.h \
/usr/local/include/openssl/x509.h /usr/local/include/openssl/buffer.h \
/usr/local/include/openssl/paillier.h /usr/local/include/openssl/sm9.h \
/usr/local/include/openssl/rsa.h /usr/local/include/openssl/dsa.h \
/usr/local/include/openssl/dh.h /usr/local/include/openssl/sha.h \
/usr/local/include/openssl/x509_vfy.h /usr/local/include/openssl/pkcs7.h \
/usr/local/include/openssl/ecies.h /usr/local/include/openssl/sm3.h

BUG

  • src/bn_sm2p256.c

    当未定义 L_ENDIAN 时,编译的 BN 大数库有问题

参考链接

大整数算法[09] Comba乘法(原理)

Karatsuba Algorithm (for fast integer multiplication)

OpenSSL密码库算法笔记——第1.2.2章 comba乘法

img