View previous topic :: View next topic |
Author |
Message |
Slugsnack Grandmaster Cheater Supreme
Reputation: 71
Joined: 24 Jan 2007 Posts: 1857
|
Posted: Thu Mar 17, 2011 6:45 am Post subject: GPGPU Memory Scanner |
|
|
Just thought I'd throw this idea out there since I recently picked up CUDA (and some OpenCL). I was wondering on people's thoughts on creating a memory scanner using this technology. The problem of searching memory (especially small sizes) is extremely parallel and lends itself to GPGPU. I am sure significant performance gains can be made but not sure if I'm bothered. This project would be more for fun and learning although it is more than possible it could result in a faster CE.
The process I imagine is fairly simple:
- Copy pages to be scanned from main memory to GPU
- Each SIMD engine runs a warp/wavefront which fills a bitmap representing matches
- GPU result copied back to CPU
I think thread divergence can be avoided at stage 2 by using XOR operators and such.. Not sure on that one though.
Any thoughts or input ? Or has anyone attempted a similar project before ? Suggestions ?
|
|
Back to top |
|
 |
hcavolsdsadgadsg I'm a spammer
Reputation: 26
Joined: 11 Jun 2007 Posts: 5801
|
Posted: Thu Mar 17, 2011 4:07 pm Post subject: |
|
|
i'm not so sure this will lend itself to GPUs as well as you think.
|
|
Back to top |
|
 |
HomerSexual Grandmaster Cheater Supreme
Reputation: 5
Joined: 03 Feb 2007 Posts: 1657
|
Posted: Thu Mar 17, 2011 6:18 pm Post subject: |
|
|
slovach wrote: | i'm not so sure this will lend itself to GPUs as well as you think. |
Why not? GPUs are used for high performance computing all the time. (f@h for example)
_________________
|
|
Back to top |
|
 |
hcavolsdsadgadsg I'm a spammer
Reputation: 26
Joined: 11 Jun 2007 Posts: 5801
|
Posted: Thu Mar 17, 2011 8:18 pm Post subject: |
|
|
it doesn't sound like a meaningful amount of work. if you could get everything scannable into memory in the right format but then all kinds of gotchas to avoid stalling, etc etc. it sounds like it could be branchy which is the last thing GPUs want to see. performance may be iffy without certain hardware support.
i don't know too much about it but it doesn't seem like a very applicable GPU problem. encoding videos and crunching math for folding proteins sounds sensible enough, this doesn't.
|
|
Back to top |
|
 |
atom0s Moderator
Reputation: 205
Joined: 25 Jan 2006 Posts: 8585 Location: 127.0.0.1
|
Posted: Fri Mar 18, 2011 12:57 am Post subject: |
|
|
I could have sworn I've seen something like this before. I'll keep an eye out for it if I can find it again but this sounds really familiar.
_________________
- Retired. |
|
Back to top |
|
 |
Slugsnack Grandmaster Cheater Supreme
Reputation: 71
Joined: 24 Jan 2007 Posts: 1857
|
Posted: Fri Mar 18, 2011 3:27 am Post subject: |
|
|
slovach wrote: | it doesn't sound like a meaningful amount of work. if you could get everything scannable into memory in the right format but then all kinds of gotchas to avoid stalling, etc etc. it sounds like it could be branchy which is the last thing GPUs want to see. performance may be iffy without certain hardware support.
i don't know too much about it but it doesn't seem like a very applicable GPU problem. encoding videos and crunching math for folding proteins sounds sensible enough, this doesn't. |
the point of gpu processing is that you don't care about stalls because there are so many threads that instead of stalling the gpu simply switches to another one.
i think for shorter scans there is no point but there are times when ce would take minutes to scan particular somethings. don't remember which cases since i haven't used ce for a long time but i think it's something like scanning unknown initial. then scanning increased or something like that
HomerSexual wrote: | slovach wrote: | i'm not so sure this will lend itself to GPUs as well as you think. |
Why not? GPUs are used for high performance computing all the time. (f@h for example) |
gpu programming is not really for high performance computation as such. there are only particular problems that it is useful for. the main thing to note about gpus is that the emphasis is not on the latency and speed of an individual thread (as is with cpu) but instead on the throughput of many many different threads.
gpu programming is extremely fast for processing lots of different things using the same algorithm. that is, to run the same code through many threads with the only difference being the memory upon which they operate. that is why gpu is great for things like bruteforcing as well. however as you can note gpgpu has a very specific problem set domain.
|
|
Back to top |
|
 |
hcavolsdsadgadsg I'm a spammer
Reputation: 26
Joined: 11 Jun 2007 Posts: 5801
|
Posted: Fri Mar 18, 2011 11:46 am Post subject: |
|
|
you're right that there's a lot of latency on the GPU, and that the throughput is high but you can stall it all the same. the pipeline is long but the throughput masks the latency. it's one of the reasons why you have to batch work so aggressively, the GPU can't feed itself.
i think you can explicitly keep synchronization but in case of a stall its possible for other threads to finish before others. data dependency / memory accesses may be problematic.
|
|
Back to top |
|
 |
Slugsnack Grandmaster Cheater Supreme
Reputation: 71
Joined: 24 Jan 2007 Posts: 1857
|
Posted: Sun Mar 20, 2011 6:44 am Post subject: |
|
|
i don't understand where you think there is a problem of data dependency in scanning memory which is all reading. you can not get conflicts (and hence stalls) from just reading and comparing.
of course you will get compulsory cache misses when the data is first copied to gpu memory before it reaches the caches but this is unavoidable either way. in regards to synchronization, i don't regard this as a problem. we have functions that block till the gpu completes such as cudathreadsynchronize()
maybe it is worth it to code a proof of concept comparison kernel since i think the main cause of the lengthy times of scans on ce is likely cache misses.
|
|
Back to top |
|
 |
Dark Byte Site Admin
Reputation: 470
Joined: 09 May 2003 Posts: 25785 Location: The netherlands
|
Posted: Sun Mar 20, 2011 9:45 am Post subject: |
|
|
the speed in scanning is dependent on the speed of ram, how much ram you have, and the harddisk speed
If you don't have much ram (e.g less than 4GB or you're on a 32-bit os) scanning a full game will be affected by the paging system because windows will be busy paging in and out memory of the game almost constantly
Most of the time when scanning on an old system the time spent waiting for the harddisk is greater than the time spent on comparing the memory
also, it's a bad idea to use ce 5.X and older versions from an USB stick as they store the temp results on the folder it's running from
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
Back to top |
|
 |
Slugsnack Grandmaster Cheater Supreme
Reputation: 71
Joined: 24 Jan 2007 Posts: 1857
|
Posted: Sun Mar 20, 2011 5:14 pm Post subject: |
|
|
have you ever ran a profiler to determine what is the ratio of time spent computating compares and time to do memory loads ? if it is half, it may still be a worthwhile project
|
|
Back to top |
|
 |
|