 Requesting Large Pages
 Shared Segment Pointer =  504403158265495552
 Segment Size (DW) =  268435456  (MB =  2048 )
 Vector  Size (DW) =  67108864  (MB =  512 )
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 Num_threads =  8
 rebind: num_parthds is  8
 Starting Initialization
 Done With Initialization
 a(1) 1.00000000000000000
 a(N) 0.000000000000000000E+00
 Base Offset =  67108864
 Incremental Offset =  2048
 Number of Threads =  8
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   67108864
 Offset     =          0
 The total memory requirement is 1536 MB
 You are running each test   5 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be      1 microseconds
 The tests below will each take a time on the order 
 of  93615  microseconds
    (=  93615  clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:      14346.1691       .0753       .0748       .0760
Scale:     14314.8193       .0752       .0750       .0754
Add:       13134.2402       .1230       .1226       .1234
Triad:     13140.8063       .1227       .1226       .1231
 Sum of a is =  101921066268750.000
 Sum of b is =  20384213253750.0000
 Sum of c is =  27178951005000.0000
 Base Offset =  67108864
 Incremental Offset =  2304
 Number of Threads =  8
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   67108864
 Offset     =          0
 The total memory requirement is 1536 MB
 You are running each test   5 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be      1 microseconds
 The tests below will each take a time on the order 
 of  93207  microseconds
    (=  93207  clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:      16106.3155       .1397       .0667       .0760
Scale:     15722.2515       .1405       .0683       .0754
Add:       15881.8475       .1869       .1014       .1234
Triad:     15793.3853       .1871       .1020       .1231
 Sum of a is =  101921066268750.000
 Sum of b is =  20384213253750.0000
 Sum of c is =  27178951005000.0000
 Base Offset =  67108864
 Incremental Offset =  2560
 Number of Threads =  8
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   67108864
 Offset     =          0
 The total memory requirement is 1536 MB
 You are running each test   5 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be      1 microseconds
 The tests below will each take a time on the order 
 of  93483  microseconds
    (=  93483  clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:      16106.3155       .1832       .0667       .0760
Scale:     16075.4716       .1805       .0668       .0754
Add:       15881.8475       .2202       .1014       .1234
Triad:     15793.3853       .2201       .1020       .1231
 Sum of a is =  101921066268750.000
 Sum of b is =  20384213253750.0000
 Sum of c is =  27178951005000.0000
 Base Offset =  67108864
 Incremental Offset =  2816
 Number of Threads =  8
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   67108864
 Offset     =          0
 The total memory requirement is 1536 MB
 You are running each test   5 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be      1 microseconds
 The tests below will each take a time on the order 
 of  93726  microseconds
    (=  93726  clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:      16119.6036       .2027       .0666       .0760
Scale:     16075.4716       .2015       .0668       .0754
Add:       15881.8475       .2336       .1014       .1234
Triad:     15960.4201       .2329       .1009       .1231
 Sum of a is =  101921066268750.000
 Sum of b is =  20384213253750.0000
 Sum of c is =  27178951005000.0000
 Base Offset =  67108864
 Incremental Offset =  3072
 Number of Threads =  8
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   67108864
 Offset     =          0
 The total memory requirement is 1536 MB
 You are running each test   5 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be      1 microseconds
 The tests below will each take a time on the order 
 of  92898  microseconds
    (=  92898  clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:      16119.6036       .2149       .0666       .0760
Scale:     16075.4716       .2142       .0668       .0754
Add:       15881.8475       .2497       .1014       .1251
Triad:     15960.4201       .2490       .1009       .1244
 Sum of a is =  101921066268750.000
 Sum of b is =  20384213253750.0000
 Sum of c is =  27178951005000.0000
bindprocessor successful: thread_self() 43665 cpu_id 6
bindprocessor successful: thread_self() 44715 cpu_id 2
bindprocessor successful: thread_self() 44403 cpu_id 3
bindprocessor successful: thread_self() 50407 cpu_id 7
bindprocessor successful: thread_self() 63391 cpu_id 5
bindprocessor successful: thread_self() 50143 cpu_id 4
bindprocessor successful: thread_self() 54519 cpu_id 0
bindprocessor successful: thread_self() 49719 cpu_id 1
