View previous topic :: View next topic |
Author |
Message |
fun1 Newbie cheater
Reputation: 0
Joined: 04 May 2025 Posts: 16
|
Posted: Sun May 04, 2025 6:11 am Post subject: load most/less signifcant bytes of memory dword |
|
|
easy question for the pros I have 2 words (each 2 bytes long) in different memory locations, I want to load one word in the most-significant(left) bits of a dword memory location, and the other word in its less-significant.
I came out with that code, it works, but its messy, lots of instructions, abuse 2 registers for such a simple task...
In my code I load [esi+1ED] in the left of [coordinates], and [esi+1E9] in the right of [coordinates]...
Code: |
mov ax,[esi+1ED]
shl eax,#16
mov bx,[esi+1E9]
add eax,ebx
mov [coordinates],eax
|
... a smarter way to do that...? Thanks
|
|
Back to top |
|
 |
ParkourPenguin I post too much
Reputation: 150
Joined: 06 Jul 2014 Posts: 4652
|
Posted: Sun May 04, 2025 2:33 pm Post subject: |
|
|
That's pretty much it. Compilers would use `movzx` to eliminate any false dependencies (modifying the lower 16 bits won't clear the entire register) and `or` in place of `add`, but those are both incredibly low-level optimizations you wouldn't ever notice.
Code: | movzx eax, word ptr [esi+1ED]
movzx ecx, word ptr [esi+1E9]
shl eax,#16
or eax,ecx
mov [coordinates],eax |
_________________
I don't know where I'm going, but I'll figure it out when I get there. |
|
Back to top |
|
 |
fun1 Newbie cheater
Reputation: 0
Joined: 04 May 2025 Posts: 16
|
Posted: Sun May 04, 2025 4:03 pm Post subject: |
|
|
I notice finally understood what means "MOVZX zero extends" and much appreciated the "or" ...
... dunno why, I was hoping there was a much quick/elegant way to copy 4 bytes... what about, treating the 2 words and [coordinates] as strings, and use the relative instructions (LODS,STOS,MOVS...)?
Last edited by fun1 on Sun May 04, 2025 4:59 pm; edited 1 time in total |
|
Back to top |
|
 |
ParkourPenguin I post too much
Reputation: 150
Joined: 06 Jul 2014 Posts: 4652
|
Posted: Sun May 04, 2025 4:51 pm Post subject: |
|
|
The problem is that those 2 halves of the 4-byte value you want are at different arbitrary memory addresses. The vast majority of instructions can access only 1 memory address at a time.
String instructions don't work that way.
_________________
I don't know where I'm going, but I'll figure it out when I get there. |
|
Back to top |
|
 |
fun1 Newbie cheater
Reputation: 0
Joined: 04 May 2025 Posts: 16
|
Posted: Sun May 04, 2025 5:04 pm Post subject: |
|
|
I see...
in case you feel like, only for the educational sake, would you mind to post a solution using strings instructions (LODS,STOS,MOVS...) just to understand the difference? thanks
|
|
Back to top |
|
 |
ParkourPenguin I post too much
Reputation: 150
Joined: 06 Jul 2014 Posts: 4652
|
Posted: Sun May 04, 2025 7:51 pm Post subject: |
|
|
Code: | // backup registers
push edi
push esi
pushfd
// setup registers for string ops
cld // clear direction flag (esi/edi are incremented on string ops)
add esi,1E9
lea edi,[coordinates]
// move 2-byte value from [esi] to [edi] (lower word of 4-byte value at coordinates)
movsw
// offset from original esi is now +1EB, make it +1ED by adding 2
add esi,2
// move 2-byte value from [esi] to [edi] (higher word of 4-byte value at coordinates)
movsw
// restore registers
popfd
pop esi
pop edi
| For 64-bit code, use rsi/rdi and pushfq / popfq
_________________
I don't know where I'm going, but I'll figure it out when I get there. |
|
Back to top |
|
 |
fun1 Newbie cheater
Reputation: 0
Joined: 04 May 2025 Posts: 16
|
Posted: Sun May 04, 2025 8:24 pm Post subject: |
|
|
very interesting...
excluding the registers/flags backup, the strings solution is 1 instruction more, so is it possible to conclude that it's more cpu consuming? Thanks
|
|
Back to top |
|
 |
ParkourPenguin I post too much
Reputation: 150
Joined: 06 Jul 2014 Posts: 4652
|
Posted: Sun May 04, 2025 9:10 pm Post subject: |
|
|
String ops are almost never used anywhere, so architecture vendors don't spend much effort to optimize them. The code using movzx and bitwise ops is better than using string ops in every regard I can think of.
Optimization of a CPU's microarchitecture is a very complicated topic to discuss. One instruction is not one clock cycle- e.g. `mov` instructions often take 0 clock cycles to execute (ignoring memory accesses), while `div` can take dozens of cycles. String ops are bad compared to `mov`, but even an L1 cache access to memory is probably more significant.
Again, these are micro-optimizations you'll never notice.
_________________
I don't know where I'm going, but I'll figure it out when I get there. |
|
Back to top |
|
 |
fun1 Newbie cheater
Reputation: 0
Joined: 04 May 2025 Posts: 16
|
Posted: Mon May 05, 2025 3:17 am Post subject: |
|
|
fully clear about the superiority of code using movzx and bitwise ops.
But you said also smtng very interesting about string ops are almost never used anywhere.
Could you pls elaborate a bit more about that, in concern to Java and/or C?
The reason I ask is, me personally in my higher-level codings, I love to solve most human issues by twisted strings manipulations, infact I would say I mostly use strings, some real strings junk addict...
So now I fear, every time I go for strings, I waste precious cpu cycles, making the waste indeed a lot more notice...
Or maybe, in Java and/or C, string manipulations are not implemented by assembler strings instructions (eg. MOVSW, etc.)?
In that case, would you suggest any alternate way to deal with strings, eg. in Java and/or C, for a better optimized cpu utilization? Thanks
|
|
Back to top |
|
 |
ParkourPenguin I post too much
Reputation: 150
Joined: 06 Jul 2014 Posts: 4652
|
Posted: Mon May 05, 2025 1:32 pm Post subject: |
|
|
Strings are used everywhere. x86 string ops (e.g. `movs`, `cmps`) are almost never used.
When you do some operation on a string in Java and/or C, the compiler generates basic instructions like `mov` and `cmp` to do those operations. The compiler will not use string ops.
If you're using any programming language, don't worry about assembly. The compiler is far smarter than the programmer.
_________________
I don't know where I'm going, but I'll figure it out when I get there. |
|
Back to top |
|
 |
|