Is there a cpu change and copy instruction that can be accessed from C #?

I need to take an 8-bit number on a 64-bit processor and shift it to the right 8 times. Every time I change the number, I need to shift the same 8-bit number behind it, so that in the end I get an 8-bit number that repeats 8 times. In the end, it will be a shift, add 8, shift add 8 ... etc., which ends in 40+ cycles (correct me if I am wrong).

Is there a way to perform this operation (shift and copy) in 1 cycle, so that in the end I get the same value?

long _value = 0; byte _number = 7; for (int i = 0; i < 8; i++) { _value = (_value << 8) + _number; } 

EDIT: I am trying to compare a character stream to identify keywords. I cannot use string.contains because the value of the string can be across the border of the buffer. In addition, the application should run on an embedded ARM processor, as well as on desktop and server processors. Memory usage and processor cycles are very important.

+6
source share
3 answers

Another idea would be to precompile everything for all byte values ​​in the lookup table.

 var lu = new long[256]; // init var n = 7; var v = lu[n]; 

Update

Some test results (in ms per 100,000,000 iterations):

  • Loop: 272
  • Unrolled: 207
  • Insecure: 351
  • Search: 250
  • HenkH: 216

Expanded Version:

 long _value = 0; byte _number = 7; _value = (_value + _number) << 8; _value = (_value + _number) << 8; _value = (_value + _number) << 8; _value = (_value + _number) << 8; _value = (_value + _number) << 8; _value = (_value + _number) << 8; _value = (_value + _number) << 8; _value = (_value + _number) << 8; 

Unsafe version:

 long _value = 0; byte _number = 7; byte* p = (byte*)&_value; *p++ = _number; *p++ = _number; *p++ = _number; *p++ = _number; *p++ = _number; *p++ = _number; *p++ = _number; *p++ = _number; 

Unfortunately, it fails: (

A search is just reading into an array.

Everything compiled for x64 / release.

+4
source

There is currently no direct relationship between the number of commands executed and the number of cpu cycles that are needed to execute them. You also believe that the operator in C # corresponds to one build command / cpu, which is also incorrect.

Your code looks right, as described in the description of your algorithm (note that long signed, use ulong for unsigned behavior).

If you want to use specialized cpu extensions (for example, mmx, sse, whatever) that can perform add-shift in one instruction, you need to use assembly code. But I'm not sure if such a specific instruction exists. This may depend on the type of processor you have.

You cannot use assembler code directly with C #, but you can use assembly with c (either as using a linked object, or making it an inline assembly). Compiled c code can be used from C # /. NET with interoperability .

But the first and important question for you: What are you trying to accomplish?

I doubt that performance is very important for your application, and even if you should honestly ask yourself if C # is the best language for your purpose.

+6
source

If you want it to be fast, you could at least expand your loop:

 ulong _value = 0; byte _number = 7; _value = _number; _value = (_value << 8) + _value; _value = (_value << 16) + _value; _value = (_value << 32) + _value; 

It will also have fewer branches.

+3
source

Source: https://habr.com/ru/post/897754/


All Articles