当前位置: 动力学知识库 > 问答 > 编程问答 >

c - assembly intrinsic to do a masked load

问题描述:

int main()

{

const int STRIDE=2,SIZE=8192;

int i=0;

double u[SIZE][STRIDE];

#pragma vector aligned

for(i=0;i<SIZE;i++)

{

u[i][STRIDE-1]= i;

}

printf("%lf\n",u[7][STRIDE-1]);

return 0;

}

The compiler uses xmm registers here. There is stride 2 access and I want to make the compiler ignore this and do a regular load of memory and then mask alternate bits so I would be using 50% of the SIMD registers. I need intrinsics which can be used to load and then mask the register bitwise before storing back to memory

P.S: I have never done assembly coding before

网友答案:

A masked store with a mask value as 0xAA (10101010)

网友答案:

You can't do a masked load (only a masked store). The easiest alternative would be to do a load and then mask it yourself (e.g. using intrinsics).

A potentially better alternative would be to change your array to "double u[STRIDE][SIZE];" so that you don't need to mask anything and don't end up with half an XMM register wasted/masked.

网友答案:

Without AVX, half a SIMD register is only one double anyway, so there seems little wrong with regular 64-bit stores.

If you want to use masked stores (MASKMOVDQU/MASKMOVQ), note that they write directly to DRAM just like the non-temporal stores like MOVNTPS. This may or may not be what you want. If the data fits in cache and you plan to read it soon, it is likely better not to use them.

Certain AMD processors can do a 64-bit non-temporal store from an XMM register using MOVNTSD; this may simplify things slightly compared to MASKMOVDQU).

分享给朋友:
您可能感兴趣的文章:
随机阅读: