 Requesting Large Pages
 Shared Segment Pointer =  504403158265495552
 Segment Size (DW) =  268435456  (MB =  2048 )
 Vector  Size (DW) =  67108864  (MB =  512 )
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 rebind: num_parthds is  8
 Starting Initialization
 Done With Initialization
 a(1) 1.00000000000000000
 a(N) 0.000000000000000000E+00
 Base Offset =  67108864
 Incremental Offset =  2048
 Number of Threads =  8
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   67108864
 Offset     =          0
 The total memory requirement is 1536 MB
 You are running each test   5 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be      1 microseconds
 The tests below will each take a time on the order 
 of  111654  microseconds
    (=  111654  clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:       9230.0307       .1178       .1163       .1222
Scale:      9100.1022       .1186       .1180       .1198
Add:        8079.9400       .2005       .1993       .2016
Triad:      8239.3571       .1963       .1955       .1970
 Sum of a is =  101921587200000.000
 Sum of b is =  20384317440000.0000
 Sum of c is =  27179089920000.0000
 Base Offset =  67108864
 Incremental Offset =  2304
 Number of Threads =  8
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   67108864
 Offset     =          0
 The total memory requirement is 1536 MB
 You are running each test   5 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be      1 microseconds
 The tests below will each take a time on the order 
 of  114342  microseconds
    (=  114342  clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:       9307.2943       .1928       .1154       .1222
Scale:      9100.1022       .1947       .1180       .1198
Add:        9615.0088       .2617       .1675       .2016
Triad:      9487.6141       .2617       .1698       .1970
 Sum of a is =  101921587200000.000
 Sum of b is =  20384317440000.0000
 Sum of c is =  27179089920000.0000
 Base Offset =  67108864
 Incremental Offset =  2560
 Number of Threads =  8
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   67108864
 Offset     =          0
 The total memory requirement is 1536 MB
 You are running each test   5 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be      1 microseconds
 The tests below will each take a time on the order 
 of  114523  microseconds
    (=  114523  clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:       9307.2943       .2299       .1154       .1246
Scale:      9488.6735       .2280       .1132       .1198
Add:       10139.4896       .2792       .1588       .2016
Triad:     10372.7592       .2771       .1553       .1970
 Sum of a is =  101921587200000.000
 Sum of b is =  20384317440000.0000
 Sum of c is =  27179089920000.0000
 Base Offset =  67108864
 Incremental Offset =  2816
 Number of Threads =  8
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   67108864
 Offset     =          0
 The total memory requirement is 1536 MB
 You are running each test   5 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be      1 microseconds
 The tests below will each take a time on the order 
 of  113709  microseconds
    (=  113709  clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:       9365.2418       .2436       .1147       .1246
Scale:      9488.6735       .2452       .1132       .1214
Add:       10139.4896       .2912       .1588       .2016
Triad:     10372.7592       .2894       .1553       .1970
 Sum of a is =  101921587200000.000
 Sum of b is =  20384317440000.0000
 Sum of c is =  27179089920000.0000
 Base Offset =  67108864
 Incremental Offset =  3072
 Number of Threads =  8
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   67108864
 Offset     =          0
 The total memory requirement is 1536 MB
 You are running each test   5 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be      1 microseconds
 The tests below will each take a time on the order 
 of  113055  microseconds
    (=  113055  clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:       9365.2418       .2510       .1147       .1246
Scale:      9488.6735       .2513       .1132       .1214
Add:       10139.4896       .3101       .1588       .2016
Triad:     10372.7592       .3087       .1553       .1970
 Sum of a is =  101921587200000.000
 Sum of b is =  20384317440000.0000
 Sum of c is =  27179089920000.0000
bindprocessor successful: thread_self() 581777 cpu_id 2
bindprocessor successful: thread_self() 405589 cpu_id 3
bindprocessor successful: thread_self() 618541 cpu_id 7
bindprocessor successful: thread_self() 381029 cpu_id 6
bindprocessor successful: thread_self() 622695 cpu_id 1
bindprocessor successful: thread_self() 651441 cpu_id 0
bindprocessor successful: thread_self() 589853 cpu_id 5
bindprocessor successful: thread_self() 630995 cpu_id 4
