The Notebook Review forums were hosted by TechTarget, who shut down them down on January 31, 2022. This static read-only archive was pulled by NBR forum users between January 20 and January 31, 2022, in an effort to make sure that the valuable technical information that had been posted on the forums is preserved. For current discussions, many NBR forum users moved over to NotebookTalk.net after the shutdown.
Problems? See this thread at archive.org.
← Previous page

    The 9262 vs XPS1730 3Dmark06 score fiasco

    Discussion in 'Sager and Clevo' started by WackMan, Apr 2, 2008.

  1. dexgo

    dexgo Freedom Fighter

    Reputations:
    320
    Messages:
    1,371
    Likes Received:
    2
    Trophy Points:
    56
    the PLL isn't written into any software to allow for this.

    so I am pretty much the FIRST person alive that successfully overclocked this beast.

    taaadaaa!
     
  2. eleron911

    eleron911 HighSpeedFreak

    Reputations:
    3,886
    Messages:
    11,104
    Likes Received:
    7
    Trophy Points:
    456
    So Dexgo, how about posting a new 3dmark06 score and updating your sig? :D
    12k is as much as some SLI dudes get. Pretty wicked.
     
  3. dexgo

    dexgo Freedom Fighter

    Reputations:
    320
    Messages:
    1,371
    Likes Received:
    2
    Trophy Points:
    56
    [​IMG]
     
    Last edited by a moderator: May 6, 2015
  4. Vedya

    Vedya There Is No Substitute...

    Reputations:
    2,846
    Messages:
    3,568
    Likes Received:
    0
    Trophy Points:
    105
    *bows down*
    :notworthy: :notworthy: :notworthy:

    GREAT JOB :)
     
  5. eleron911

    eleron911 HighSpeedFreak

    Reputations:
    3,886
    Messages:
    11,104
    Likes Received:
    7
    Trophy Points:
    456
    Now I just have to find the Dell scores and compare SM2 and Sm3 scores.
    As I said Dexgo, you`re the OCing legend around here :D
     
  6. Shyster1

    Shyster1 Notebook Nobel Laureate

    Reputations:
    6,926
    Messages:
    8,178
    Likes Received:
    0
    Trophy Points:
    205
    :notworthy: :notworthy: :notworthy: :notworthy: :notworthy:
    :notworthy: :notworthy: :notworthy: :notworthy:
    :notworthy: :notworthy: :notworthy:
    :notworthy: :notworthy:
    :notworthy:
     
  7. duane123

    duane123 Notebook Consultant

    Reputations:
    72
    Messages:
    233
    Likes Received:
    0
    Trophy Points:
    30
    Holy crap nd4spvdn over on the 1730 owners thread overclocked his card to 725/1812/1033 after volt modding it to 1.1. That just seems insane to me, although his CPU seems to be bottlenecking him in benches because he didn't get much better 3dmark results than me even with those insane clocks.

    It's tempting to try it though =p.
     
  8. duane123

    duane123 Notebook Consultant

    Reputations:
    72
    Messages:
    233
    Likes Received:
    0
    Trophy Points:
    30
    Grats dex, I knew you were determined and would find out what was going on ;).

    I wonder if it's the FSB mod that is making the difference in the SM scores. That would be my guess.
     
  9. eleron911

    eleron911 HighSpeedFreak

    Reputations:
    3,886
    Messages:
    11,104
    Likes Received:
    7
    Trophy Points:
    456
    Oh boy, I wonder who will hit 16k first :D
     
  10. dexgo

    dexgo Freedom Fighter

    Reputations:
    320
    Messages:
    1,371
    Likes Received:
    2
    Trophy Points:
    56
    thanks to this. it puts our rigs back in the running and who knows back on top>?
     
  11. duane123

    duane123 Notebook Consultant

    Reputations:
    72
    Messages:
    233
    Likes Received:
    0
    Trophy Points:
    30
    With the added CPU score I'm positive it will come out ahead in 3dmark scores, no question on that. This brought our SM scores within spitting distance of each other so add in the extra CPU and you are going to be way ahead. Which is the way it should be with you guys having the desktop CPUs.

    That said I'm still very much in love with my 1730 ;).
     
  12. psycroptik

    psycroptik Notebook Consultant

    Reputations:
    117
    Messages:
    246
    Likes Received:
    0
    Trophy Points:
    30
    MEEEEEE!!!!!
     
  13. Shyster1

    Shyster1 Notebook Nobel Laureate

    Reputations:
    6,926
    Messages:
    8,178
    Likes Received:
    0
    Trophy Points:
    205
    I'm curious about your thoughts on that, because, in my own semi-clueless manner, I suspect that the FSB is a big bottleneck on the quad-cores as compared to the dual-cores.
     
  14. wobble

    wobble Notebook Evangelist

    Reputations:
    68
    Messages:
    340
    Likes Received:
    0
    Trophy Points:
    30
    I would assume so too since CPU clock speed alone doesn't seem to have much effect on sm2/sm3 scores. Dex said his sm2 score went up considerably... don't know what happened to his sm3 scores.
     
  15. dexgo

    dexgo Freedom Fighter

    Reputations:
    320
    Messages:
    1,371
    Likes Received:
    2
    Trophy Points:
    56
    it is the reason that it went up. that much is clear.

    CPU makes a diff
     
  16. wobble

    wobble Notebook Evangelist

    Reputations:
    68
    Messages:
    340
    Likes Received:
    0
    Trophy Points:
    30
    How much did your sm3 scores go up, dex?
     
  17. eleron911

    eleron911 HighSpeedFreak

    Reputations:
    3,886
    Messages:
    11,104
    Likes Received:
    7
    Trophy Points:
    456
    about 400 points, he said it.
     
  18. duane123

    duane123 Notebook Consultant

    Reputations:
    72
    Messages:
    233
    Likes Received:
    0
    Trophy Points:
    30
    Right but the FSB went through a huge jump there as well, I'm thinking maybe that is what is making the big jump in SM? Mostly because in my own testing changing the CPU speed did effect SM but not in a major way. 1-2% at best, so I'm thinking maybe it's the FSB boost that is helping out.
     
  19. dexgo

    dexgo Freedom Fighter

    Reputations:
    320
    Messages:
    1,371
    Likes Received:
    2
    Trophy Points:
    56
    perhaps but the 6850 @3ghz doesn't do that good in scores.

    my oc'd quad is now as good as a stock qx6750.

    maybe my quad just came to life :D

    the Dragon has awoke from a deep Slumber. :D
     
  20. eleron911

    eleron911 HighSpeedFreak

    Reputations:
    3,886
    Messages:
    11,104
    Likes Received:
    7
    Trophy Points:
    456
    Let`s hope the dragon doesn`t burn himself. :)
     
  21. WackMan

    WackMan Notebook Guru

    Reputations:
    60
    Messages:
    65
    Likes Received:
    0
    Trophy Points:
    15
    I smell singed Dragon skin all the way from Canada.. :D
     
  22. wobble

    wobble Notebook Evangelist

    Reputations:
    68
    Messages:
    340
    Likes Received:
    0
    Trophy Points:
    30
    I'm starting to get lost in this thread...

    Did Justin post his Crysis benchmark scores for the dual processor in SLI?
     
  23. Shyster1

    Shyster1 Notebook Nobel Laureate

    Reputations:
    6,926
    Messages:
    8,178
    Likes Received:
    0
    Trophy Points:
    205
    That is actually very interesting because it seems to dovetail with my very, very shot-in-the dark hypothesis that the FSB is getting swamped in part by cache coherency transactions between the two dual-core CPUs that are stitched together to create the current quad-core CPUs. If that's the case, then OC'ing the FSB should, just by itself, help to alleviate the problem.
     
  24. nhat2991

    nhat2991 Notebook Consultant

    Reputations:
    54
    Messages:
    205
    Likes Received:
    0
    Trophy Points:
    30
    That's a really long writing @@
    Can I just make it short like this "The thread migration and cache coherency transactions between dies stop other parts accessing the CPU during these processes"
     
  25. ARGH

    ARGH Notebook Deity

    Reputations:
    391
    Messages:
    1,883
    Likes Received:
    24
    Trophy Points:
    56
    i don't believe we really got confirmation that the fsb actually runs at 1333. i think it runs at 1066 with the ability to use a 1333 cpu but mobo still stays at 1066.
     
  26. dexgo

    dexgo Freedom Fighter

    Reputations:
    320
    Messages:
    1,371
    Likes Received:
    2
    Trophy Points:
    56
    my motherboard supports 1333mhz natively.

    I got confirmation of this.

    I asked the vendors specifically there is threads of this if you look back.

    because I know if people tried to use the older motherboard with an e6850

    it would downclock to the 1066mhz fsb and only effectively give the user 2.4 or 2.6 ghz.

    but now since the newer mobo supports it it is actually 1333mhz.
     
  27. Shyster1

    Shyster1 Notebook Nobel Laureate

    Reputations:
    6,926
    Messages:
    8,178
    Likes Received:
    0
    Trophy Points:
    205
    Traffic-Cone.jpg Traffic-Cone.jpg

    That's one way of putting part of the problem. The other part of the problem is that, because the two dies do not share L2 cache, each coherency transaction that takes place cross-die takes up an additional 5.5 bus cycles in addition to the normal 14 core cycles that it would take if the coherency transaction was strictly intra-cache.

    Thus, cache coherency, by itself, adds significant overhead to the processor performance, and this overhead is substantially increased when the operating system or a poorly-written application causes a lot of what is known as false sharing - i.e., data items that look to the memory controller like shared data for which coherency must be maintained, but which in fact are not shared.

    An example would be a thread running on core 0 that has cached data that is interrupted, and is then re-scheduled to run on core 1 by the load-balancing algorithm of the scheduler - a problem that is exacerbated in the quad-core as compared to the dual-core because the greater number of available cores means that thread/core affinity is less likely to be respected.

    For example, typically the scheduler tries to keep threads on the same processor core; however, if a greater number of cores are available, the load-balancing algorithm, which spreads available threads as evenly as possible over available cores, will in effect "force" a core-change on a thread if the thread's original core has become involved in working on another core-intensive thread with greater priority even if thread/core affinity is enabled.

    Basically, as per Intel's own documentation on the development of the dual-core pentiums, a quad-core consists of two dual-core CPUs, each with their own separate L1 and a shared L2 cache, but which do not share L2 cache between the two dual-core CPUs.

    Based on the Intel whitepaper CMP Implementation in Systems Based on the Intel Core Duo Processor, a false-sharing transaction that requires a coherency transaction within the same shared L2 cache imposes a performance penalty equal to 14 core cycles (see Table 1); however, since the dual-core processors in the Core 2 Duo do not share L1 cache, a coherency transaction that goes from one L1 cache to the other L1 cache must transit the FSB, thereby imposing a performance penalty of 14 core cycles plus 5.5 bus cycles.

    Now, the situation with the L2 caches in the current quad-cores is most likely to be analogous to the situation with the L1 caches in the dual-core CPU - i.e., core0 and core2 do not share L2 cache with core1 and core3, and thus a false-sharing coherency transaction between cores 0/2 and cores 1/3 will have to transit the FSB, thereby imposing a latency penalty of 14 core cycles plus 5.5 bus cycles. In particular, if a data line in the L2 cache is treated as shared data between all 4 cores, a read/write by core0 to L2 cache that triggers a coherency transaction will impose the following latency penalty on the entire system:
    • core0 L2 to core2 L2 - 14 core cycles;
    • core0 L2 to core1 L2 - 14 core cycles plus 5.5 bus cycles; and
    • core0 L2 to core3 L2 - 14 core cycles plus 5.5 bus cycles.
    That results in a total latency penalty of 42 core cycles plus 11 bus cycles. In terms of the amount of time that takes, assuming a 2.4GHz core clock and a 1.066GHz bus clock (and ignoring any latency induced by the core/bus clock differences), this amounts to a time delay of approximately 0.02781 microseconds. That's a pretty big latency, particularly since during all or part of that time the cores aren't available and, most particularly, for the 11 bus cycles (about 0.0103 microseconds, or about 37% of the total latency) the bus is occupied by the coherency transactions and thus unavailable to any other device in the system. That means, for example, that even such mundane chores as DMA (direct memory access) for hard drives is blocked while the cores undergo a four-core false-sharing-induced coherency transaction.

    By comparison, if the FSB clock is boosted from 1.066GHz to 1.333GHz, the 42-cycle core latency is still there, but the 11-cycle bus latency drops to 0.00825 microseconds from 0.0103 microseconds - a 20% increase in performance due solely to shortening the latency penalty imposed by the necessity of performing cache coherency transactions across the FSB. This is a particularly useful performance increase since it reduces by 20% the time during which every other device is excluded from the FSB.

    Finally, it should be noted that a system running on a dual-core processor (such as the _Dell M-1730), will, at worst, only suffer a latency penalty of 14 core cycles plus 5.5 bus cycles due to a false-sharing-induced coherency transaction, and then only if the coherency is from L1 to L1; if the coherency is only from L2 to L2, then the latency is only 14 core cycles.

    That alone would explain the significant performance differences that dexgo was finding between the _Dell M-1730 and the quad-core NP9262 - even solely with respect to the graphics cards, since the cards typically must wait on the CPU to complete processing such as collision detection before they can render a scene (as well as with respect to any other functions the GPUs offload onto the CPU).

    Unfortunately, it is highly doubtful that this problem will be completely fixed for the owners of the current quad-cores (e.g., the Q6600 and Q6700) because Intel is moving to a shared L3 cache in the next generation of processors (the Nehalem architecture) as well as a new connection mechanism now officially called "Quickpath Interconnection" - the new connection will be a big improvement over the current FSB architecture, in significant part because it is a point-to-point connection instead of the single-bus connection of the current FSB, which means that other devices that use Quickpath Interconnect will not be excluded once one device accesses the interconnect.

    Because this particular latency problem is basically a minor roadbump from the perspective of a big company like Microsoft, I would also not expect Microsoft to expend any significant resources to improve the false-sharing performance of Windows to optimize the OS for the current quad-cores, so if any fixes are going to come, they will come, if at all, from application writers and driver writers like NVidia and/or Sager.

    I'm not going to hold my breath on that one, so basically, if you want to get the full effect out of your Q6600 or Q6700, the only apparently viable solution is the one dexgo figured out.

    I think we all owe dexgo a bigger debt of gratitude than many are willing to admit on the forum. Thanks dexgo. :notworthy:
     
  28. eleron911

    eleron911 HighSpeedFreak

    Reputations:
    3,886
    Messages:
    11,104
    Likes Received:
    7
    Trophy Points:
    456
    You forgot the cones :D
    Other than that,specific and to the point as usual :twitcy:
     
  29. Vedya

    Vedya There Is No Substitute...

    Reputations:
    2,846
    Messages:
    3,568
    Likes Received:
    0
    Trophy Points:
    105
    Shyster, Mabye you should change ur avy to the cones instead??? :p

    Dex are you planning on going sli anytime soon?
     
  30. Shyster1

    Shyster1 Notebook Nobel Laureate

    Reputations:
    6,926
    Messages:
    8,178
    Likes Received:
    0
    Trophy Points:
    205
    @eleron911
    @hkman

    Done and done!
     
  31. nhat2991

    nhat2991 Notebook Consultant

    Reputations:
    54
    Messages:
    205
    Likes Received:
    0
    Trophy Points:
    30
    WOW, that's really something. Thxs for these.
    p/s: it's really cracking my 16-years-old brain
     
  32. pasoleatis

    pasoleatis Notebook Deity

    Reputations:
    59
    Messages:
    948
    Likes Received:
    0
    Trophy Points:
    30
    Does this means that the q9450 will give a 20% increase in performance/tests compared to the q6700 because it has the 1.333 GHZ bus?
     
  33. dexgo

    dexgo Freedom Fighter

    Reputations:
    320
    Messages:
    1,371
    Likes Received:
    2
    Trophy Points:
    56
    I benchmarked tested my putter now oc'd and I am getting the same performance as a qx6800
     
  34. wobble

    wobble Notebook Evangelist

    Reputations:
    68
    Messages:
    340
    Likes Received:
    0
    Trophy Points:
    30
    You're very lucky.

    To the best of my recollection, when I was 16 I didn't even have a brain. Come to think of it, after reading Shyster's tome, I'm not completely sure I have one now. :eek:
     
  35. Shyster1

    Shyster1 Notebook Nobel Laureate

    Reputations:
    6,926
    Messages:
    8,178
    Likes Received:
    0
    Trophy Points:
    205
    View attachment 17423 View attachment 17423 View attachment 17424

    No. It does suggest that the Q9450 would have have some sort of improved performance over the Q6700; however, my discussion was too simple to support straight-forward, deterministic extrapolations of real-world performance.

    In terms of actual observed performance, I would suggest a perusal of the www.hardwarezone.com review comparing the Q9550 and Q9450 to the Q6600 and Q6700 (the " Review")

    As indicated in the Review, the clocks for the Q6700 and the Q9450 are, respectively, 2.67GHz and 2.66GHz (now, this is small potatoes, but that does mean that the Q6700 is, in fact, 10MHz faster than the Q9450, and that will have some indeterminate effect on the comparison).

    The Review also notes the following benchmark results for the two processors:
    1. SPEC CPU2000 1.3(SPECint_peak):
      • Q6700 - 2830
      • Q9450 - 3005
      SPEC CPU2000 1.3(SPECfp_peak):
      • Q6700 - 2910
      • Q9450 - 3249
    2. SPEC CPU2000 1.3(SPECint_rate2000)(4 users):
      • Q6700 - 114.0
      • Q9450 - 126.0
    3. SPEC CPU2000 1.3(SPECfp_rate2000)(4 users):
      • Q6700 - 78.8
      • Q9450 - 99.8
    4. SYSmark 2007 Preview v1.02(Overall):
      • Q6700 - 154
      • Q9450 - 152
    5. SYSmark 2007 Preview v1.02(Workload Breakdown)(3D workload):
      • Q6700 - 164
      • Q9450 - 173
    6. Futuremark PCMark05(CPU Score):
      • Q6700 - 8628
      • Q9450 - 8618
    7. Futuremark PCMark05(Memory Score):
      • Q6700 - 5658
      • Q9450 - 6305
    8. Lightwave 3D 7.5(Tracer-Radiosity)(2/4/8 Threads)(smaller is better):
      • Q6700 - 266.8/160.4/94.1
      • Q9450 - 250.0/150.1/81.6
    9. Lightwave 3D 7.5(Sunset)(2/4/8 Threads)(smaller is better):
      • Q6700 - 58.3/40.0/30.4
      • Q9450 - 56.2/38.7/28.4
    10. ....(I got tired of listing all of the d**ned benchmark results, if anyone wants to see them all, they should go read the Review for themselves)
    11. Futuremark 3DMark06(CPU Score):
      • Q6700 - 4276
      • Q9450 - 4483
    12. Futuremark 3DMark06(3DMark Score @ Defaults):
      • Q6700 - 12394
      • Q9450 - 12588
    13. ....(remaining benchmarks omitted, see the Review

    Now, just focusing on a few of the benchmarks listed above, I would note that on the CPU Score for 3dMark06, the Q9450 provides a score boost of 5% (i.e., its score is about 105% of the score for the Q6700). On the 3DMark06 Score @ Defaults, the Q9450 score is only about 102% of the score for the Q6700. Looking at the other scores, nothing seems to stand out as a trend indicating a 20% performance increase of the Q9450 over the Q6700, so the answer is almost certainly no, the Q9450 does not give a 20% performance boost over the Q6700 on account of the fact that the Q9450 runs on a native 1.333GHz FSB.

    Now, keep in mind, (1) the Q6700 is, actually, slightly faster than the Q9450, (2) there are lots of other factors that go into the performance of a CPU, and (3) last but most important, the 20% increase that I arrived at in my original post was only with respect to the latency penalty caused by coherency transactions going over the FSB - since that is not the only thing the CPU uses the bus for, even if the 20% increase is legitimately there - i.e., not just a figment of my imagination :D - that 20% increase would only apply with respect to the percentage of overall CPU time that was devoted to cache coherency transactions.

    For example, to take a totally hypothetical setup for illustrative purposes, suppose that a given setup could run 1,000 operations per second without any cache coherency transactions, but that the setup had to dedicate 30 operations to cache coherency transactions, leaving only 970 operations per second for "real" work. In that case, the performance of the system would be 970/sec.

    Now, suppose that OCing the FSB resulted in cache coherency transactions taking 20% less time, that would mean (in an approximate case - the example is for clarity and simple illustration, not technical accuracy :D ) that the coherency transactions that used to occupy 30 operations' worth of time now only take up the same time as 24 operations. As a result, per second, that setup can now do 6 more "real" operations, or a total of 976 "real" operations per second. In this latter case, the performance of the system would by 976/sec.

    976 is only 100.6186% of 970, so even though cache coherency transactions are now done with 20% more efficiency time-wise, the overall system does not receive a 20% overall performance boost.

    I hope that helps. :)
     
  36. Shyster1

    Shyster1 Notebook Nobel Laureate

    Reputations:
    6,926
    Messages:
    8,178
    Likes Received:
    0
    Trophy Points:
    205
    Aah, join the club - I lost mine so long ago, I've almost forgotten what it was like to have a brain. :D
     
  37. WackMan

    WackMan Notebook Guru

    Reputations:
    60
    Messages:
    65
    Likes Received:
    0
    Trophy Points:
    15
    Suddenly my eyes hurt and I have a splitting headache
     
  38. Shyster1

    Shyster1 Notebook Nobel Laureate

    Reputations:
    6,926
    Messages:
    8,178
    Likes Received:
    0
    Trophy Points:
    205
    Hmmm...., that is a bit of a problem. Here's the Cooks' Tour version of the Cooks' Tour version - cache coherency transactions in the quad-cores are causing the FSB to bottleneck the NP9262 in a noticeable way.
     
← Previous page