2020-05-09

tvm graph_runtime 分析

runtime总体逻辑

runtime总体逻辑是：读出编译好的运算图(包含了二进制代码和描述信息)；根据运算图信息为各个存储节点分配储存；构建可执行OP的函数体(实际是调用已经编译好的代码)；逐个执行可执行的OP。

代码逻辑

用户编译和运行深度学习模型的典型python代码片段如下所示

with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(net, target=target_n, params=params)

ctx = [cpu_ctx, gpu_ctx]
module = graph_runtime.create(graph, lib, ctx)
module.run()

在python端调用graph_runtime.create，会走到GraphRuntimeCreate，然后再到
GraphRuntime::Init创建runtime结构，并返回Module结构给python。python端通过module.run()方法来运行模型。

Module GraphRuntimeCreate(const std::string& sym_json,
                          const tvm::runtime::Module& m,
                          const std::vector<TVMContext>& ctxs) {
  auto exec = make_object<GraphRuntime>();
  exec->Init(sym_json, m, ctxs);
  return Module(exec);
}

void GraphRuntime::Init(const std::string& graph_json,
                        tvm::runtime::Module module,
                        const std::vector<TVMContext>& ctxs) {
  std::istringstream is(graph_json);
  dmlc::JSONReader reader(&is);
  this->Load(&reader);
  module_ = module;
  ctxs_ = ctxs;
  this->SetupStorage();
  this->SetupOpExecs();
  for (size_t i = 0; i < input_nodes_.size(); i++) {
    const uint32_t nid = input_nodes_[i];
    std::string& name = nodes_[nid].name;
    input_map_[name] = i;
  }
}

GraphRuntime::Init所做的主要工作包括两个部分，第一个是从json格式的string中读取出编译好的运算图(this->Load(&reader))，第二个是初始化运行环境(SetupStorage和SetupOpExecs)。
Load比较简单不展开。
SetupStorage的核心逻辑是从json读出需要存空间的各个矩阵信息，然后为其在对应的计算设备上分配内存(通过调用NDArray::Empty(shape, DLDataType{kDLFloat, 32, 1}, ctx)))。需要注意的点是，每一个设备上实际上只进行一次分配(分配最大所需的储存)。
SetupOpExecs的核心逻辑是把构建 OP函数体(实际功能前面已经编译好了，这里的函数体实际上只是去调用)和其所需要的参数args结构。

不支持OP级别的并行

当前tvm的graph_runtime就是一个简单的静态执行器。
比较典型的示例点就是下面的run函数。它的逻辑只是串行地逐个运行OP的函数体。

void GraphRuntime::Run() {
  // setup the array and requirements.
  for (size_t i = 0; i < op_execs_.size(); ++i) {
    if (op_execs_[i]) op_execs_[i]();
  }
}

并且，由于每一个OP都是同步执行的（也就是必须等待执行结果出来后，OP函数体才返回），所以runtime的顶层是不具备并行能力的。
理论上，tvm runtime当前不能支持cpu和gpu同时执行计算。（除非在一个OP植入异构执行代码，但是当前又没有构造对应OP的方法?）

多线程运行是由生成的函数来调用的。

cuda运算和copy操作都是直接执行，没有调用多线程执行。
#0 TVMBackendParallelLaunch (flambda=0x7ff20bc98a20, cdata=0x7fffce203b60, num_task=0) at /home/majiang/hd/opensource/tvm/src/runtime/thread_pool.cc:398
#1 0x00007ff20bc98688 in ?? ()
#2 0x00007ff1dbf3ab4e in tvm::runtime::<lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue *) const (
__closure=0x3664bf0, args=…, rv=0x7fffce203eb0) at /home/majiang/hd/opensource/tvm/src/runtime/library_module.cc:88
#3 0x00007ff1dbf3bfbc in std::_Function_handler<void(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue), tvm::runtime::WrapPackedFunc(TVMBackendPackedCFunc, const tvm::runtime::ObjectPtrtvm::runtime::Object&)::<lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue)> >::Minvoke(const std::_Any_data &, tvm::runtime::TVMArgs &&, tvm::runtime::TVMRetValue *&&) (__functor=…, __args#0=…, __args#1=@0x7fffce203e10: 0x7fffce203eb0) at /usr/include/c++/7/bits/std_function.h:316
#4 0x00007ff1db3b52ec in std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const (
this=0x36e8f10, args#0=…, __args#1=0x7fffce203eb0) at /usr/include/c++/7/bits/std_function.h:706
#5 0x00007ff1db3b4e32 in tvm::runtime::PackedFunc::CallPacked (this=0x36e8f10, args=…, rv=0x7fffce203eb0)
at /home/majiang/hd/opensource/tvm/include/tvm/runtime/packed_func.h:1040
#6 0x00007ff1dbfa6c55 in tvm::runtime::GraphRuntime::<lambda()>::operator()(void) const (closure=0x36e8f00)
at /home/majiang/hd/opensource/tvm/src/runtime/graph/graph_runtime.cc:402
#7 0x00007ff1dbfaa837 in std::_Function_handler<void(), tvm::runtime::GraphRuntime::CreateTVMOp(const tvm::runtime::TVMOpParam&, const std::vector&, size_t)::<lambda()> >::Minvoke(const std::_Any_data &) (__functor=…) at /usr/include/c++/7/bits/std_function.h:316
#8 0x00007ff1db439068 in std::function<void ()>::operator()() const (this=0x392ff70) at /usr/include/c++/7/bits/std_function.h:706
#9 0x00007ff1dbfa2fb9 in tvm::runtime::GraphRuntime::Run (this=0x4076790) at /home/majiang/hd/opensource/tvm/src/runtime/graph/graph_runtime.cc:56