博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
3、OpenMP的分段sections、single、nowait、barrier、master、order
阅读量:4171 次
发布时间:2019-05-26

本文共 16836 字,大约阅读时间需要 56 分钟。

基本思想:sections在OpenMP里面属于分迭代任务的分担,总感觉这玩意像FPGA硬件逻辑代码中的分段模块

(1)分段执行代码逻辑sections

#pragma omp parallel sections    {#pragma omp section        for (int i = 0; i < num/2; i++) {          .....        }#pragma omp section        for (int i = num/2; i < num; i++) {          ........        }}

测试代码

#include 
#include
#include
using namespace std;using namespace chrono;void sequentialProgram(int num){ for(int i=0;i
(end_time-start_time).count()<<" seconds"<
(end_time-start_time).count()<<" seconds"<

测试结果,可以看到 两个for 循环会被分配到各个线程里面独立处理

F:\OpenMP\cmake-build-debug\OpenMP.exei=0 the current thread id: 0i=1 the current thread id: 0i=2 the current thread id: 0i=3 the current thread id: 0i=4 the current thread id: 0i=5 the current thread id: 0i=6 the current thread id: 0i=7 the current thread id: 0i=8 the current thread id: 0i=9 the current thread id: 0i=10 the current thread id: 0i=11 the current thread id: 0sequentialProgram elapse time: 0.0373026 secondsi=0 the current thread id: 1i=1 the current thread id: 1i=2 the current thread id: 1i=3 the current thread id: 1i=4 the current thread id: 1i=5 the current thread id: 1i=6 the current thread id: 0i=7 the current thread id: 0i=8 the current thread id: 0i=9 the current thread id: 0i=10 the current thread id: 0i=11 the current thread id: 0parallelProgram elapse time: 0.0211751 secondsProcess finished with exit code 0

(2)可以同时并行处理两个任务sections

#pragma omp parallel    {#pragma omp sections        {#pragma omp section            for (int i = 0; i < num/4; i++) {               .....            }#pragma omp section            for (int i =  num/4; i < num/2; i++) {               .....            }        }#pragma omp sections        {#pragma omp section            for (int i = num/2; i < 3*num/4; i++) {                .....            }#pragma omp section            for (int i =  3*num/4; i < num; i++) {                .....            }        }}

测试代码

#include 
#include
#include
using namespace std;using namespace chrono;void sequentialProgram(int num){ for(int i=0;i
(end_time-start_time).count()<<" seconds"<
(end_time-start_time).count()<<" seconds"<

测试结果,可以看到各个线程是独立运行的,这里发现 各个section还是按照代码的先后顺序执行的&同时还需要让各个section的任务量尽量均衡~

F:\OpenMP\cmake-build-debug\OpenMP.exei=0 the current thread id: 0i=1 the current thread id: 0i=2 the current thread id: 0i=3 the current thread id: 0i=4 the current thread id: 0i=5 the current thread id: 0i=6 the current thread id: 0i=7 the current thread id: 0i=8 the current thread id: 0i=9 the current thread id: 0i=10 the current thread id: 0i=11 the current thread id: 0sequentialProgram elapse time: 0.0212041 secondsi=0 the current thread id: 2i=1 the current thread id: 2i=2 the current thread id: 2i=3 the current thread id: 8i=4 the current thread id: 8i=5 the current thread id: 8i=6 the current thread id: 11i=7 the current thread id: 11i=8 the current thread id: 11i=9 the current thread id: 9i=10 the current thread id: 9i=11 the current thread id: 9parallelProgram elapse time: 0.0202144 secondsProcess finished with exit code 0

(3)单线程运行制导指令 single

#pragma omp parallel    {#pragma omp single      ....        for (int i = 0; i < num; i++) {            ....        }    };   #pragma omp parallel    {#pragma omp single        {          ....        }#pragma omp single nowait        {          ....        }        for (int i = 0; i < num; i++) {              ....        }    };

其中single保证,其限制的任务为单线程执行

测试代码

#include 
#include
#include
using namespace std;using namespace chrono;void sequentialProgram(int num){ for(int i=0;i
(end_time-start_time).count()<<" seconds"<
(end_time-start_time).count()<<" seconds"<

其中nowait保证 该线程也是为独立的线程执行,但是并不用等待线程执行结束,别的线程向下执行就行

F:\OpenMP\cmake-build-debug\OpenMP.exei=0 the current thread id: 0sequentialProgram elapse time: 0.0025314 secondsi am students the current thread id: 1i=0 the current thread id: 0i=0 the current thread id: 4i=0 the current thread id: 5i=0 the current thread id: 7i=0 the current thread id: 3i=0 the current thread id: 8i=0 the current thread id: 10i=0 the current thread id: 9i=0 the current thread id: 2i=0 the current thread id: 6i=0 the current thread id: 11i=0 the current thread id: 1--------------------i am students the current thread id: 5i am college the current thread id: 7i=0 the current thread id: 7i=0 the current thread id: 6i=0 the current thread id: 9i=0 the current thread id: 11i=0 the current thread id: 1i=0 the current thread id: 8i=0 the current thread id: 0i=0 the current thread id: 4i=0 the current thread id: 10i=0 the current thread id: 2i=0 the current thread id: 5i=0 the current thread id: 3parallelProgram elapse time: 0.0525391 secondsProcess finished with exit code 0

执行结果中这两个线程号(nowait)

i am college the current thread id: 7i=0 the current thread id: 7

另一种测试方法,在并行的for循环上、section段上都可以使用nowait,不别再等待同步点执行~

#include 
#include
#include
#include
#include
using namespace std;using namespace chrono;void sequentialProgram(int num){ for(int i=0;i
(end_time-start_time).count()<<" seconds"<
(end_time-start_time).count()<<" seconds"<

测试结果中看出,两个并行for循环,将真正的不存在先后到关系,开始并行执行

F:\OpenMP\cmake-build-debug\OpenMP.exei=0 the current thread id: 0i=1 the current thread id: 0i=2 the current thread id: 0i=3 the current thread id: 0i=4 the current thread id: 0i=5 the current thread id: 0i=6 the current thread id: 0i=7 the current thread id: 0i=8 the current thread id: 0i=9 the current thread id: 0i=10 the current thread id: 0i=11 the current thread id: 0sequentialProgram elapse time: 0.0194814 secondsA i=1 the current thread id: 1B i=7 the current thread id: 1A i=2 the current thread id: 2B i=8 the current thread id: 2A i=0 the current thread id: 0B i=6 the current thread id: 0A i=3 the current thread id: 3B i=9 the current thread id: 3A i=4 the current thread id: 4A i=5 the current thread id: 5B i=11 the current thread id: 5B i=10 the current thread id: 4parallelProgram elapse time: 0.0217952 secondsProcess finished with exit code 0

(4) 设置路障 等待前面的线程运行完成,才能往下运行 barrier

#pragma omp barrier{.....}

测试代码

#include 
#include
#include
#include
#include
using namespace std;using namespace chrono;void sequentialProgram(int num){ for(int i=0;i
(end_time-start_time).count()<<" seconds"<
(end_time-start_time).count()<<" seconds"<

测试结果

F:\OpenMP\cmake-build-debug\OpenMP.exei=0 the current thread id: 0i=1 the current thread id: 0i=2 the current thread id: 0i=3 the current thread id: 0i=4 the current thread id: 0i=5 the current thread id: 0i=6 the current thread id: 0i=7 the current thread id: 0i=8 the current thread id: 0i=9 the current thread id: 0i=10 the current thread id: 0i=11 the current thread id: 0sequentialProgram elapse time: 0.0240967 secondsA i=1 the current thread id: 1A i=2 the current thread id: 2A i=4 the current thread id: 4A i=3 the current thread id: 3A i=5 the current thread id: 5A i=0 the current thread id: 0B i=7 the current thread id: 1B i=9 the current thread id: 3B i=11 the current thread id: 5B i=8 the current thread id: 2B i=10 the current thread id: 4B i=6 the current thread id: 0parallelProgram elapse time: 0.0256972 secondsProcess finished with exit code 0

(5)设置只有一个线程去执行任务

#pragma omp masterfor (int i = 0; i < num; i++) { ......}

测试代码

#include 
#include
#include
#include
#include
using namespace std;using namespace chrono;void sequentialProgram(int num){ for(int i=0;i
(end_time-start_time).count()<<" seconds"<
(end_time-start_time).count()<<" seconds"<

测试结果  抛开single的功能点说明,single和master在某种意义很类似都是控制任务为单一线程去完成。

其中的order 可以保证线程依次顺序处理各个任务

#pragma omp for orderedfor (int i = 0; i < num; i++) { ....... }
F:\OpenMP\cmake-build-debug\OpenMP.exei=0 the current thread id: 0i=1 the current thread id: 0i=2 the current thread id: 0i=3 the current thread id: 0i=4 the current thread id: 0i=5 the current thread id: 0i=6 the current thread id: 0i=7 the current thread id: 0i=8 the current thread id: 0i=9 the current thread id: 0i=10 the current thread id: 0i=11 the current thread id: 0sequentialProgram elapse time: 0.0232429 secondsA i=1 the current thread id: 1A i=2 the current thread id: 2A i=3 the current thread id: 3A i=5 the current thread id: 5A i=11 the current thread id: 11A i=0 the current thread id: 0A i=4 the current thread id: 4A i=8 the current thread id: 8A i=6 the current thread id: 6A i=7 the current thread id: 7A i=9 the current thread id: 9A i=10 the current thread id: 10B i=0 the current thread id: 1B i=1 the current thread id: 1B i=2 the current thread id: 1B i=3 the current thread id: 1B i=4 the current thread id: 1B i=5 the current thread id: 1B i=6 the current thread id: 1B i=7 the current thread id: 1B i=8 the current thread id: 1B i=9 the current thread id: 1B i=10 the current thread id: 1B i=11 the current thread id: 1C i=0 the current thread id: 0C i=1 the current thread id: 0C i=2 the current thread id: 0C i=3 the current thread id: 0C i=4 the current thread id: 0C i=5 the current thread id: 0C i=6 the current thread id: 0C i=7 the current thread id: 0C i=8 the current thread id: 0C i=9 the current thread id: 0C i=10 the current thread id: 0C i=11 the current thread id: 0D i=0 the current thread id: 0D i=8 the current thread id: 8D i=2 the current thread id: 2D i=1 the current thread id: 1D i=9 the current thread id: 9D i=3 the current thread id: 3D i=5 the current thread id: 5D i=4 the current thread id: 4D i=11 the current thread id: 11D i=7 the current thread id: 7D i=10 the current thread id: 10D i=6 the current thread id: 6parallelProgram elapse time: 0.112705 secondsProcess finished with exit code 0

 测试ncnn 提供的yolov5.cpp源码 每次测试都不一样,好像大部分修改的快一点

代码片段

// anchor setting from yolov5/models/yolov5s.yaml    auto start_time=std::chrono::steady_clock::now();//#pragma omp parallel sections firstprivate(ex)    { //#pragma omp section    // stride 8    {        ncnn::Mat out;        ex.extract("output", out);        ncnn::Mat anchors(6);        anchors[0] = 10.f;        anchors[1] = 13.f;        anchors[2] = 16.f;        anchors[3] = 30.f;        anchors[4] = 33.f;        anchors[5] = 23.f;        generate_proposals(anchors, 8, in_pad, out, prob_threshold, objects);        proposals.insert(proposals.end(), objects.begin(), objects.end());      printf("the current thread id: %d\n",omp_get_thread_num());    } //#pragma omp  section    // stride 16    {        ncnn::Mat out;        ex.extract("781", out);        ncnn::Mat anchors(6);        anchors[0] = 30.f;        anchors[1] = 61.f;        anchors[2] = 62.f;        anchors[3] = 45.f;        anchors[4] = 59.f;        anchors[5] = 119.f;        generate_proposals(anchors, 16, in_pad, out, prob_threshold, objects);        proposals.insert(proposals.end(), objects.begin(), objects.end());        printf("the current thread id: %d\n",omp_get_thread_num());    } //#pragma omp section    // stride 32    {        ncnn::Mat out;        ex.extract("801", out);        ncnn::Mat anchors(6);        anchors[0] = 116.f;        anchors[1] = 90.f;        anchors[2] = 156.f;        anchors[3] = 198.f;        anchors[4] = 373.f;        anchors[5] = 326.f;        generate_proposals(anchors, 32, in_pad, out, prob_threshold, objects);        proposals.insert(proposals.end(), objects.begin(), objects.end());        printf("the current thread id: %d\n",omp_get_thread_num());    }    }    // sort all proposals by score from highest to lowest    auto end_time=std::chrono::steady_clock::now();    std::cout<<"output elapse time: "<
(end_time-start_time).count()<<" seconds"<

测试时间

the current thread id: 0the current thread id: 0the current thread id: 0output elapse time: 0.278379 secondsyolov5s elapse time: 0.338553 seconds15 = 0.54197 at 256.27 15.57 826.90 x 603.65

修改代码

auto start_time=std::chrono::steady_clock::now();    ncnn::Mat out0;    ex.extract("output", out0);    ncnn::Mat out1;    ex.extract("781", out1);    ncnn::Mat out2;    ex.extract("801", out2);#pragma omp parallel sections     {#pragma omp section        // stride 8        {            ncnn::Mat anchors(6);            anchors[0] = 10.f;            anchors[1] = 13.f;            anchors[2] = 16.f;            anchors[3] = 30.f;            anchors[4] = 33.f;            anchors[5] = 23.f;            generate_proposals(anchors, 8, in_pad, out0, prob_threshold, objects);            proposals.insert(proposals.end(), objects.begin(), objects.end());            printf("the current thread id: %d\n",omp_get_thread_num());        }#pragma omp  section        // stride 16        {            ncnn::Mat anchors(6);            anchors[0] = 30.f;            anchors[1] = 61.f;            anchors[2] = 62.f;            anchors[3] = 45.f;            anchors[4] = 59.f;            anchors[5] = 119.f;            generate_proposals(anchors, 16, in_pad, out1, prob_threshold, objects);            proposals.insert(proposals.end(), objects.begin(), objects.end());            printf("the current thread id: %d\n",omp_get_thread_num());        }#pragma omp section        // stride 32        {            ncnn::Mat anchors(6);            anchors[0] = 116.f;            anchors[1] = 90.f;            anchors[2] = 156.f;            anchors[3] = 198.f;            anchors[4] = 373.f;            anchors[5] = 326.f;            generate_proposals(anchors, 32, in_pad, out2, prob_threshold, objects);            proposals.insert(proposals.end(), objects.begin(), objects.end());            printf("the current thread id: %d\n",omp_get_thread_num());        }    }    // sort all proposals by score from highest to lowest    auto end_time=std::chrono::steady_clock::now();    std::cout<<"output elapse time: "<
(end_time-start_time).count()<<" seconds"<

测试时间 好像大部分比原来的快 ~

F:\window10_yolo5_mingw32\cmake-build-debug\window10_yolo5_mingw32.exethe current thread id: 5the current thread id: 3the current thread id: 11output elapse time: 0.244162 secondsyolov5s elapse time: 0.303863 seconds15 = 0.54197 at 256.27 15.57 826.90 x 603.65Process finished with exit cod

这样修改 耗时比较长

auto start_time=std::chrono::steady_clock::now();#pragma omp parallel sections firstprivate(ex)    {#pragma omp section        // stride 8        {            ncnn::Mat out;            ex.extract("output", out);            ncnn::Mat anchors(6);            anchors[0] = 10.f;            anchors[1] = 13.f;            anchors[2] = 16.f;            anchors[3] = 30.f;            anchors[4] = 33.f;            anchors[5] = 23.f;            generate_proposals(anchors, 8, in_pad, out, prob_threshold, objects);            proposals.insert(proposals.end(), objects.begin(), objects.end());            printf("the current thread id: %d\n",omp_get_thread_num());        }#pragma omp  section        // stride 16        {            ncnn::Mat out;            ex.extract("781", out);            ncnn::Mat anchors(6);            anchors[0] = 30.f;            anchors[1] = 61.f;            anchors[2] = 62.f;            anchors[3] = 45.f;            anchors[4] = 59.f;            anchors[5] = 119.f;            generate_proposals(anchors, 16, in_pad, out, prob_threshold, objects);            proposals.insert(proposals.end(), objects.begin(), objects.end());            printf("the current thread id: %d\n",omp_get_thread_num());        }#pragma omp section        // stride 32        {            ncnn::Mat out;            ex.extract("801", out);            ncnn::Mat anchors(6);            anchors[0] = 116.f;            anchors[1] = 90.f;            anchors[2] = 156.f;            anchors[3] = 198.f;            anchors[4] = 373.f;            anchors[5] = 326.f;            generate_proposals(anchors, 32, in_pad, out, prob_threshold, objects);            proposals.insert(proposals.end(), objects.begin(), objects.end());            printf("the current thread id: %d\n",omp_get_thread_num());        }    }    // sort all proposals by score from highest to lowest    auto end_time=std::chrono::steady_clock::now();    std::cout<<"output elapse time: "<
(end_time-start_time).count()<<" seconds"<

测试时间

F:\window10_yolo5_mingw32\cmake-build-debug\window10_yolo5_mingw32.exethe current thread id: 1the current thread id: 0the current thread id: 7output elapse time: 0.829006 secondsyolov5s elapse time: 0.895948 seconds15 = 0.54197 at 256.27 15.57 826.90 x 603.65

转载地址:http://dtyai.baihongyu.com/

你可能感兴趣的文章
工作流审批平台-审批功能
查看>>
商务智能-基本方法-特征与角度
查看>>
软件项目管理系统-项目管理-模块定义-开发笔记
查看>>
工作流审批平台-业务申请-申请书一览
查看>>
商务智能-基本方法-数据钻取
查看>>
C++程序员技术需求规划(发展方向)
查看>>
A Game of Thrones(59)
查看>>
2018.3.19
查看>>
A Game of Thrones(97)
查看>>
A Game of Thrones(98)
查看>>
2018.3.20
查看>>
2018.3.21
查看>>
2018.3.22
查看>>
2018.3.23
查看>>
A Game of Thrones(102)
查看>>
2018.4.29
查看>>
2018.4.30
查看>>
2018.4.31
查看>>
2018.4.32
查看>>
2018.4.33
查看>>