CXX特性与语法记录

最近在读BakrN集成了Tensor Core的RISC-V GPGPU (vortex) 源码。当我看到这代码时，顿感压力山大。于是决定写下这篇笔记，记录一些C++的小特性和语法。

位域

struct float16{
  unsigned short sign     : 1;
  unsigned short exponent : 5;
  unsigned short mantissa : 10;
};

有如下内存分布：

 1 1       1 
 5 4       0 9                 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |exponent |     matissa       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~~~
 |
sign

C++

std::cout << "sizeof(float16): " << sizeof(float16) << std::endl;

Output

sizeof(float16): 2

所以，这个float16结构体事实上只占用了一个unsigned short的大小，16位。并且每一个段都有自己的名称，便于访问。

一个位域不可以具有超过其类型的长度，因此禁止出现unsigned int ui33: 33。

一个位域不可以跨越基本类型的边界，如果跨越，则将该域对齐到下一个基本类型开始的位置：

struct inst1{
    unsigned int opcode: 6;
    unsigned int ui30  : 30;
};

有如下内存分布：

 3         3 3 3 2
 7         2 1 0 9                                                         0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   opcode  |   |                          ui30                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            ~~~~~
              |
            unused

C++

std::cout << "sizeof(inst1): " << sizeof(inst1) << std::endl;

Output

sizeof(inst1): 8

同样，也可以人工制定不具名块，将下一个域放置到下一个基本类型开始的位置：

struct inst2{
    unsigned int opcode: 6;
    unsigned int       : 0;
    unsigned int ui12  : 12;
};

C++

std::cout << "sizeof(inst2): " << sizeof(inst2) << std::endl;

Output

sizeof(inst2): 8

在上个例子中，opcode与ui12即使长度小于一个基本类型（unsigned int），也还是会被分配到两个基本类型的空间中。

当然，请不要对其内部元素取址，C++编译器将会给你一个沉重的error。

`enum class`

枚举类enum class是C++11引入的语法，可以将枚举值限制在域内，禁止隐式转换，同时支持设定默认类型。

使用enum可能发生一些奇怪的事：

C++

enum class person{
  good = 0,
  bad
};

enum class colour{
  red = 0,
  blue
};

int main(int argc, char **argv) {
  std::cout << "good == red: " << (good == red) << std::endl;
  std::cout << "good == 0: " << (good == 0) << std::endl;
  return 0;
}

Output

good == red: 1
good == 0: 1

枚举值可以相互隐式转换，可能导致一些意料之外的状况。

当使用enum class时，枚举值被限制在class内，必须通过::访问。同时enum class禁止隐式转换，person::good == 0这类问题在编译时即可发现.

C++

enum class person{
  good = 0,
  bad
};

int main(int argc, char **argv) {
  std::cout << (person::good == 0) << std::endl;
  return 0;
}

Output

/main.cpp: In function 'int main(int, char**)':
/main.cpp:193:30: error: no match for 'operator==' (operand types are 'person' and 'int')
  193 |   std::cout << (person::good == 0) << std::endl;
      |                 ~~~~~~~~~~~~ ^~ ~
      |                         |       |
      |                         person  int

可变参数

可变参数...函数实际上是一个在C++中非常非常常见的东西，比如printf就是靠可变参数...实现的，因此printf可以接收任意个参数。即使格式化字符串不要求也是可以接收的：例如以下代码就是合法的。

printf("Nothing here", 1, 2, 3, 4, 5);

对于可变参数模板函数，花样就更多了：

template<int i, int i_end, typename Func, typename... Args>
inline void unrolled_for_func(Func &&func, Args &&... args) {
  if constexpr (i < i_end) {
    func(args...);
    unrolled_for_func<i + 1, i_end>(func, args...);
  }
}

以上代码可以执行i_end - i次Func(args)，手动展开了for循环。例如

C++

int main(int argc, char **argv) {
  unrolled_for_func<0, 3>(printf, "This is a %s func call...\n", "printf");
  return 0;
}

Output

This is a printf func call...
This is a printf func call...
This is a printf func call...

如果你注意到了if后的constexpr，就会想到没有这个constexpr会发生什么？

C++

template<int i, int i_end, typename Func, typename... Args>
inline void unrolled_for_func(Func &&func, Args &&... args) {
  if (i < i_end) {
    func(args...);
    unrolled_for_func<i + 1, i_end>(func, args...);
  }
}

Output

/main.cpp: In instantiation of 'void unrolled_for_func(Func&&, Args&& ...) [with int i = 899; int i_end = 3; Func = int (&)(const char*, ...); Args = {const char (&)[27], const char (&)[7]}]':
/main.cpp:186:36:   recursively required from 'void unrolled_for_func(Func&&, Args&& ...) [with int i = 1; int i_end = 3; Func = int (&)(const char*, ...); Args = {const char (&)[27], const char (&)[7]}]'
/main.cpp:186:36:   required from 'void unrolled_for_func(Func&&, Args&& ...) [with int i = 0; int i_end = 3; Func = int (&)(const char*, ...); Args = {const char (&)[27], const char (&)[7]}]'
/main.cpp:192:26:   required from here
/main.cpp:186:36: fatal error: template instantiation depth exceeds maximum of 900 (use '-ftemplate-depth=' to increase the maximum)
  186 |     unrolled_for_func<i + 1, i_end>(func, args...);
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
compilation terminated.

编译器会因为超出递归深度限制而终止编译。由此可见，C++编译器会将所有出现的unrolled_for_func实例化，而不会检查i < i_end条件。当使用constexpr关键字限制后，编译器才会发现i < i_end这个限制条件，从而终止实例化unrolled_for_func。

Reference

Nada, A., Sarda, G. M., & Lenormand, E. (2025). Cooperative Warp Execution in Tensor Core for RISC-V GPGPU. 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 1422–1436.