| View previous topic :: View next topic |
| Author |
Message |
Csimbi I post too much
Reputation: 98
Joined: 14 Jul 2007 Posts: 3345
|
Posted: Tue Sep 14, 2010 11:55 am Post subject: [asm] Efficient string comparison |
|
|
I am looking for a fast string comparison routine in assembly; it might run several times a sec (depending how fast the user can click).
Comparing byte-by-byte is slow, so I thought I could do it in 4, 8, or even 16 byte groups.
- I have the addresses of the strings in the memory (which might not be aligned to a boundary).
- I know the length of one of the strings (which in practice, will be the maximum length to compare).
- Both strings are terminated with a NULL character, and unused memory may follow (need to avoid access violation).
- All strings are <=64 bytes, including the NULL character.
- The code should work on both AMD and Intel processors (which means the code must instructions that are supported by both).
- The code is to be injected into a game using CE's auto-assembler.
Naturally, the winner code will get +1rep.
Thank you.
PS. Yes, I used Google to find some examples, but those are either AMD or Intel specific. Regardless of that fact, I would like to see a code that's actually used in practice.
Last edited by Csimbi on Tue Sep 14, 2010 12:23 pm; edited 3 times in total |
|
| Back to top |
|
 |
justa_dude Grandmaster Cheater
Reputation: 23
Joined: 29 Jun 2010 Posts: 893
|
Posted: Tue Sep 14, 2010 12:08 pm Post subject: |
|
|
| memcmp is usually implemented with repe cmpsb. You're not ever going to get faster than linear time, and you'd have to do really awful coding to do worse than linear time. Assuming moderately sized strings, you're not going to see a huge performance delta between a great implementation and a naive one - a few executions a second doesn't sound terribly taxing.
|
|
| Back to top |
|
 |
Csimbi I post too much
Reputation: 98
Joined: 14 Jul 2007 Posts: 3345
|
Posted: Tue Sep 14, 2010 12:21 pm Post subject: |
|
|
Memory operations are taxing. That's one thing I'd like avoid; the fewer, the better. For a 10-byte string comparison, that's 20 memory operations using repe cmpsb.
Remember that most people can click 6-8 times a sec, gamers even faster.
|
|
| Back to top |
|
 |
Flyte Peanuts!!!!
Reputation: 6
Joined: 19 Apr 2006 Posts: 1887 Location: Canada
|
Posted: Tue Sep 14, 2010 12:27 pm Post subject: |
|
|
| Csimbi wrote: | Memory operations are taxing. That's one thing I'd like avoid; the fewer, the better. For a 10-byte string comparison, that's 20 memory operations using repe cmpsb.
Remember that most people can click 6-8 times a sec, gamers even faster. |
If computers had trouble calculating 20 memory operations 6-8 times a second, I most certainly would not be typing this post right now.
|
|
| Back to top |
|
 |
justa_dude Grandmaster Cheater
Reputation: 23
Joined: 29 Jun 2010 Posts: 893
|
Posted: Tue Sep 14, 2010 12:30 pm Post subject: |
|
|
| Profile first, optimize later. In practice, comparing one byte at a time versus a full-on SSIMD Boyer-Moore (sp?) nets like a 7% speedup, and that's assuming strings large enough to dwarf yours. With your strings being so small, are you even sure that taking the time to align boundaries is a computational savings? Profile first, optimize later.
|
|
| Back to top |
|
 |
Csimbi I post too much
Reputation: 98
Joined: 14 Jul 2007 Posts: 3345
|
Posted: Tue Sep 14, 2010 1:39 pm Post subject: |
|
|
| Flyte wrote: | | If computers had trouble calculating 20 memory operations 6-8 times a second, I most certainly would not be typing this post right now. |
Absolutely.
But we are talking about injecting code to a game, that is most likely using the processor to its maximum capacity.
| justa_dude wrote: | | Profile first, optimize later. |
That's a nice motto. I'll give it a go with memcmp - though I do not have a clue about profiling (other than a hick-up and/or and FPS hit that's directly visible in the game when I compare a 100 strings in a row).
| justa_dude wrote: | | In practice, comparing one byte at a time versus a full-on SSIMD Boyer-Moore (sp?) nets like a 7% speedup |
This article says different (hence my concern):
http://www.arstdesign.com/articles/fastsearch.html
| justa_dude wrote: | | , and that's assuming strings large enough to dwarf yours. With your strings being so small, are you even sure that taking the time to align boundaries is a computational savings? |
That's a good point. Taken. I found this:
http://www.freevec.org/function/strcmp
Seems that the sweet spot in around 50 chars; if it's shorter, there is not much benefit.
Most of the strings are shorter so, I'll stick with the naïve method.
Thanks for all inputs.
But, I would still like to see actual, working code for 50 strings longer than 50 chars
|
|
| Back to top |
|
 |
hcavolsdsadgadsg I'm a spammer
Reputation: 26
Joined: 11 Jun 2007 Posts: 5801
|
Posted: Tue Sep 14, 2010 7:36 pm Post subject: |
|
|
just use strcmp (or std::string) and be done with it, there's usually no point in trying to beat it.
several times a second is nothing.
|
|
| Back to top |
|
 |
Dark Byte Site Admin
Reputation: 471
Joined: 09 May 2003 Posts: 25859 Location: The netherlands
|
Posted: Tue Sep 14, 2010 8:23 pm Post subject: |
|
|
| Quote: | | depending how fast the user can click |
| Quote: | | All strings are <=64 bytes, including the NULL character. |
Yeah, optimizing the routine for a small string check routine isn't worth it for something that is triggered by the user
Even if the user can click 1000 times in a second then even the slowest string compare routine is fast enough (you could even pass it off to a text only basic interpreter and it'd be fast enough)
1000 clicks/second is 0.001 seconds between a click
on a 2GHz system that's 2.000.000 instructions the cpu can handle between a click (and not to mention that cpu tricks like prediction can increase that quite a bit)
so let's say every character iteration would take 5000 cpu cycles (really slow) and 64 characters, would make it take up a max of 320.000 instructions
That'd still leave 1.680.000 cpu cycles for other stuff like the game itself (84%, so taxes the cpu by 16%)
and the chance that all 1000 clicks get handled is pretty small as well.
Lets say you've put your key/click handler code on the game tick routine in the game. That's usually linked to the fps. So at most 60 to 120 fps/ticks, so at most 120 click handles a second
so 0.0083 seconds between each click
2GHz that'd be 16.666.666 instructions for the cpu between click handles
click handle took 320.000 cycles so leaves 16.346.666 cpu cycles for other stuff (which comes down to 98%, so the click handling taxes the cpu by 2% at most)
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
| Back to top |
|
 |
TROLOLOLOLOLOLOLOLOLOLOLO Expert Cheater
Reputation: -1
Joined: 27 Dec 2009 Posts: 100
|
Posted: Tue Sep 14, 2010 10:05 pm Post subject: |
|
|
| Dark Byte wrote: | | Quote: | | depending how fast the user can click |
| Quote: | | All strings are <=64 bytes, including the NULL character. |
Yeah, optimizing the routine for a small string check routine isn't worth it for something that is triggered by the user
Even if the user can click 1000 times in a second then even the slowest string compare routine is fast enough (you could even pass it off to a text only basic interpreter and it'd be fast enough)
1000 clicks/second is 0.001 seconds between a click
on a 2GHz system that's 2.000.000 instructions the cpu can handle between a click (and not to mention that cpu tricks like prediction can increase that quite a bit)
so let's say every character iteration would take 5000 cpu cycles (really slow) and 64 characters, would make it take up a max of 320.000 instructions
That'd still leave 1.680.000 cpu cycles for other stuff like the game itself (84%, so taxes the cpu by 16%)
and the chance that all 1000 clicks get handled is pretty small as well.
Lets say you've put your key/click handler code on the game tick routine in the game. That's usually linked to the fps. So at most 60 to 120 fps/ticks, so at most 120 click handles a second
so 0.0083 seconds between each click
2GHz that'd be 16.666.666 instructions for the cpu between click handles
click handle took 320.000 cycles so leaves 16.346.666 cpu cycles for other stuff (which comes down to 98%, so the click handling taxes the cpu by 2% at most) |
Like a fucking boss. God you're smart, how is it even that possible to be that computer literate
|
|
| Back to top |
|
 |
justa_dude Grandmaster Cheater
Reputation: 23
Joined: 29 Jun 2010 Posts: 893
|
Posted: Tue Sep 14, 2010 10:32 pm Post subject: |
|
|
| Csimbi wrote: | | ...a hick-up and/or and FPS hit that's directly visible in the game when I compare a 100 strings in a row... |
BTW, if the strings you're comparing are static (i.e., cheat codes), then you'll want to create a dictionary - a sorted data-structure. This will allow you, at the expense of a little extra memory, to compare each input (i.e., console command) against a single string instead of all possible strings.
|
|
| Back to top |
|
 |
|