VPI - Vision Programming Interface

3.2 Release

KLT Bounding Box Tracker

概述

此应用程序在输入视频中跟踪边界框,在每一帧上绘制它们,并将结果保存在视频文件中。您可以定义将用于处理的后端。

注意
输出将为灰度,因为该算法目前不支持彩色输入。

说明

命令行参数为

<后端> <输入视频> <输入边界框>

其中

  • backend: cpucuda;它定义了将执行处理的后端。
  • input video: 输入视频文件名,它接受 OpenCV 的 cv::VideoCapture 接受的所有视频类型。
  • input bboxes: 包含输入边界框以及它们出现帧的文件。该文件由多行组成,格式如下
       <frame> <bbox_x> <bbox_y> <bbox_width> <bbox_height>
    重要的是,这些行按照帧的升序排列。

这是一个示例

  • C++
    ./vpi_sample_06_klt_tracker cuda ../assets/dashcam.mp4 ../assets/dashcam_bboxes.txt
  • Python
    python3 main.py cuda ../assets/dashcam.mp4 ../assets/dashcam_bboxes.txt
    这是使用 CUDA 后端以及提供的示例视频和边界框。它将把跟踪的边界框渲染到 klt_cuda.mp4 中。

结果

跟踪结果
注意
视频输出需要支持 HTML5 且支持 H.264 mp4 视频解码的浏览器。

源代码

为了方便起见,以下代码也安装在 samples 目录中。

语言
28 from __future__ import print_function
29 
30 import sys
31 from argparse import ArgumentParser
32 import numpy as np
33 import vpi
34 import cv2
35 
36 
37 # Convert a colored input frame to grayscale (if needed)
38 # and then, if using PVA backend, convert it to 16-bit unsigned pixels;
39 # The converted frame is copied before wrapping it as a VPI image so
40 # later draws in the gray frame do not change the reference VPI image.
41 def convertFrameImage(inputFrame, backend)
42  if inputFrame.ndim == 3 and inputFrame.shape[2] == 3
43  grayFrame = cv2.cvtColor(inputFrame, cv2.COLOR_BGR2GRAY)
44  else
45  grayFrame = inputFrame
46  if backend == vpi.Backend.PVA
47  # PVA only supports 16-bit unsigned inputs,
48  # where each element is in 0-255 range, so
49  # no rescaling is needed.
50  grayFrame = grayFrame.astype(np.uint16)
51  grayImage = vpi.asimage(grayFrame.copy())
52  return grayFrame, grayImage
53 
54 
55 # Write the input gray frame to output video with
56 # input bounding boxes and predictions
57 def writeOutput(outVideo, cvGray, inBoxes, inPreds, colors, backend)
58  try
59  if cvGray.dtype == np.uint16
60  cvGray = cvGray.astype(np.uint8)
61  if cvGray.dtype != np.uint8
62  raise Exception('Input frame format must be grayscale, 8-bit unsigned')
63  cvGrayBGR = cv2.cvtColor(cvGray, cv2.COLOR_GRAY2BGR)
64 
65  # Tracking the number of valid bounding boxes in the current frame
66  numValidBoxes = 0
67 
68  # Draw the input bounding boxes considering the input predictions
69  with inBoxes.rlock_cpu(), inPreds.rlock_cpu() as pred
70  # Array of bounding boxes (bbox) and predictions (pred)
71  bbox = inBoxes.cpu().view(np.recarray)
72 
73  for i in range(inBoxes.size)
74  if bbox[i].tracking_status == vpi.KLTTrackStatus.LOST
75  # If the tracking status of the current bounding box is lost, skip it
76  continue
77 
78  # Gather information of the current (i) bounding box and prediction
79  # Prediction scaling width, height and x, y
80  predScaleWidth = pred[i][0, 0]
81  predScaleHeight = pred[i][1, 1]
82  predX = pred[i][0, 2]
83  predY = pred[i][1, 2]
84 
85  # Bounding box scaling width, height and x, y and bbox width, height
86  bboxScaleWidth = bbox[i].bbox.xform.mat3[0, 0]
87  bboxScaleHeight = bbox[i].bbox.xform.mat3[1, 1]
88  bboxX = bbox[i].bbox.xform.mat3[0, 2]
89  bboxY = bbox[i].bbox.xform.mat3[1, 2]
90  bboxWidth = bbox[i].bbox.width
91  bboxHeight = bbox[i].bbox.height
92 
93  # Compute corrected x, y and width, height (w, h) by proper adding
94  # bounding box and prediction x, y and by proper multiplying
95  # bounding box w, h with its own scaling and prediction scaling
96  x = bboxX + predX
97  y = bboxY + predY
98  w = bboxWidth * bboxScaleWidth * predScaleWidth
99  h = bboxHeight * bboxScaleHeight * predScaleHeight
100 
101  # Start point and end point of the bounding box for OpenCV drawing
102  startPoint = tuple(np.array([x, y], dtype=int))
103  endPoint = tuple(np.array([x, y], dtype=int) + np.array([w, h], dtype=int))
104 
105  # The color of the bounding box to be drawn
106  bboxColor = tuple([ int(c) for c in colors[0, i] ])
107  cv2.rectangle(cvGrayBGR, startPoint, endPoint, bboxColor, 2)
108 
109  # Incrementing the number of valid bounding boxes in the current frame
110  numValidBoxes += 1
111 
112  print(' Valid: {:02d} boxes'.format(numValidBoxes))
113 
114  outVideo.write(cvGrayBGR)
115  except Exception as e
116  print('Error while writing output video:\n', e, file=sys.stderr)
117  exit(1)
118 
119 
120 # ----------------------------
121 # Parse command line arguments
122 
123 parser = ArgumentParser()
124 parser.add_argument('backend', choices=['cpu','cuda','pva'],
125  help='Backend to be used for processing')
126 
127 parser.add_argument('input',
128  help='Input video')
129 
130 parser.add_argument('boxes',
131  help='Text file with bounding boxes description')
132 
133 args = parser.parse_args()
134 
135 if args.backend == 'cpu'
136  backend = vpi.Backend.CPU
137 elif args.backend == 'cuda'
138  backend = vpi.Backend.CUDA
139 else
140  assert args.backend == 'pva'
141  backend = vpi.Backend.PVA
142 
143 # -----------------------------
144 # Open input and output videos
145 
146 inVideo = cv2.VideoCapture(args.input)
147 
148 fourcc = cv2.VideoWriter_fourcc(*'MPEG')
149 inSize = (int(inVideo.get(cv2.CAP_PROP_FRAME_WIDTH)), int(inVideo.get(cv2.CAP_PROP_FRAME_HEIGHT)))
150 fps = inVideo.get(cv2.CAP_PROP_FPS)
151 
152 outVideo = cv2.VideoWriter('klt_python'+str(sys.version_info[0])+'_'+args.backend+'.mp4',
153  fourcc, fps, inSize)
154 
155 if not outVideo.isOpened()
156  print("Error creating output video", file=sys.stderr)
157  exit(1)
158 
159 # -----------------------------
160 # Reading input bounding boxes
161 
162 # All boxes is a dictionary of all bounding boxes to be tracked in the input video,
163 # where each value is a list of new bounding boxes to track at the frame indicated by its key
164 allBoxes = {}
165 totalNumBoxes = 0
166 
167 # Array capacity 0 means no restricted maximum number of bounding boxes
168 arrayCapacity = 0
169 
170 if backend == vpi.Backend.PVA
171  # PVA requires 128 array capacity or maximum number of bounding boxes
172  arrayCapacity = 128
173 
174 with open(args.boxes) as f
175  # The input file (f) should have one bounding box per lines as:
176  # "startFrame bboxX bboxY bboxWidth bboxHeight"; e.g.: "61 547 337 14 11"
177  for line in f.readlines()
178  line = line.replace('\n', '').replace('\r', '')
179  startFrame, x, y, w, h = [ float(v) for v in line.split(' ') ]
180  bb = (x, y, w, h)
181  if startFrame not in allBoxes
182  allBoxes[startFrame] = [bb]
183  else
184  allBoxes[startFrame].append(bb)
185  totalNumBoxes += 1
186  if totalNumBoxes == arrayCapacity
187  # Stop adding boxes if its total reached the array capacity
188  break
189 
190 curFrame = 0
191 curNumBoxes = len(allBoxes[curFrame])
192 
193 # ------------------------------------------------------------------------------
194 # Initialize VPI array with all input bounding boxes (same as C++ KLT sample)
195 
196 if arrayCapacity == 0
197  arrayCapacity = totalNumBoxes
198 
199 inBoxes = vpi.Array(arrayCapacity, vpi.Type.KLT_TRACKED_BOUNDING_BOX)
200 
201 inBoxes.size = totalNumBoxes
202 with inBoxes.wlock_cpu()
203  data = inBoxes.cpu().view(np.recarray)
204 
205  # Global index i of all bounding boxes data, starting at 0
206  i = 0
207 
208  for f in sorted(allBoxes.keys())
209  for bb in allBoxes[f]
210  # Each bounding box bb is a tuple of (x, y, w, h)
211  x, y, w, h = bb
212 
213  # The bounding box data is the identity for the scaling part,
214  # meaning no scaling, and the offset part is its position x, y
215  data[i].bbox.xform.mat3[0, 0] = 1
216  data[i].bbox.xform.mat3[1, 1] = 1
217  data[i].bbox.xform.mat3[2, 2] = 1
218  data[i].bbox.xform.mat3[0, 2] = x
219  data[i].bbox.xform.mat3[1, 2] = y
220 
221  # The bounding box data stores its width and height w, h
222  data[i].bbox.width = w
223  data[i].bbox.height = h
224 
225  # Initially all boxes have status tracked and update needed
226  data[i].tracking_status = vpi.KLTTrackStatus.TRACKED
227  data[i].template_status = vpi.KLTTemplateStatus.UPDATE_NEEDED
228 
229  # Incrementing the global index for the next bounding box
230  i += 1
231 
232 #-------------------------------------------------------------------------------
233 # Generate random colors for bounding boxes equal to the C++ KLT sample
234 
235 hues = np.zeros((totalNumBoxes,), dtype=np.uint8)
236 
237 if int(cv2.__version__.split('.')[0]) >= 3
238  cv2.setRNGSeed(1)
239  hues = cv2.randu(hues, 0, 180)
240 else
241  # Random differs in OpenCV-2.4
242  rng = cv2.cv.RNG(1)
243  hues = cv2.cv.fromarray(np.array([[ h for h in hues ]], dtype=np.uint8))
244  cv2.cv.RandArr(rng, hues, cv2.cv.CV_RAND_UNI, 0, 180)
245  hues = [ hues[0, i] for i in range(totalNumBoxes) ]
246 
247 colors = np.array([[ [int(h), 255, 255] for h in hues ]], dtype=np.uint8)
248 colors = cv2.cvtColor(colors, cv2.COLOR_HSV2BGR)
249 
250 #-------------------------------------------------------------------------------
251 # Initialize the KLT Feature Tracker algorithm
252 
253 # Load up first frame
254 validFrame, cvFrame = inVideo.read()
255 if not validFrame
256  print("Error reading first input frame", file=sys.stderr)
257  exit(1)
258 
259 # Convert OpenCV frame to gray returning also the VPI image for given backend
260 cvGray, imgTemplate = convertFrameImage(cvFrame, backend)
261 
262 # Create the KLT Feature Tracker object using the backend specified by the user
263 klt = vpi.KLTFeatureTracker(imgTemplate, inBoxes, backend=backend)
264 
265 #-------------------------------------------------------------------------------
266 # Main processing loop
267 
268 while validFrame
269  print('Frame: {:04d} ; Total: {:02d} boxes ;'.format(curFrame, curNumBoxes), end='')
270 
271  # Adjust input boxes and predictions to the current number of boxes
272  inPreds = klt.in_predictions()
273 
274  inPreds.size = curNumBoxes
275  inBoxes.size = curNumBoxes
276 
277  # Write current frame to the output video
278  writeOutput(outVideo, cvGray, inBoxes, inPreds, colors, backend)
279 
280  # Read next input frame
281  curFrame += 1
282  validFrame, cvFrame = inVideo.read()
283  if not validFrame
284  break
285 
286  cvGray, imgReference = convertFrameImage(cvFrame, backend)
287 
288  outBoxes = klt(imgReference)
289 
290  if curFrame in allBoxes
291  curNumBoxes += len(allBoxes[curFrame])
292 
293 outVideo.release()
294 
295 # vim: ts=8:sw=4:sts=4:et:ai
29 #include <opencv2/core/version.hpp>
30 #include <opencv2/imgcodecs.hpp>
31 #include <opencv2/imgproc/imgproc.hpp>
32 #include <opencv2/videoio.hpp>
33 #include <vpi/OpenCVInterop.hpp>
34 
35 #include <vpi/Array.h>
36 #include <vpi/Image.h>
37 #include <vpi/Status.h>
38 #include <vpi/Stream.h>
40 
41 #include <cstring> // for memset
42 #include <fstream>
43 #include <iostream>
44 #include <map>
45 #include <sstream>
46 #include <vector>
47 
48 #define CHECK_STATUS(STMT) \
49  do \
50  { \
51  VPIStatus status = (STMT); \
52  if (status != VPI_SUCCESS) \
53  { \
54  char buffer[VPI_MAX_STATUS_MESSAGE_LENGTH]; \
55  vpiGetLastStatusMessage(buffer, sizeof(buffer)); \
56  std::ostringstream ss; \
57  ss << vpiStatusGetName(status) << ": " << buffer; \
58  throw std::runtime_error(ss.str()); \
59  } \
60  } while (0);
61 
62 // 绘制边界框到图像中并保存到磁盘的实用工具。
63 static cv::Mat WriteKLTBoxes(VPIImage img, VPIArray boxes, VPIArray preds)
64 {
65  // 将 img 转换为 cv::Mat
66  cv::Mat out;
67  {
68  VPIImageData imgdata;
70 
72  VPIImageBufferPitchLinear &imgPitch = imgdata.buffer.pitch;
73 
74  int cvtype;
75  switch (imgPitch.format)
76  {
78  cvtype = CV_8U;
79  break;
80 
82  cvtype = CV_8S;
83  break;
84 
86  cvtype = CV_16UC1;
87  break;
88 
90  cvtype = CV_16SC1;
91  break;
92 
93  default
94  throw std::runtime_error("不支持的图像类型");
95  }
96 
97  cv::Mat cvimg(imgPitch.planes[0].height, imgPitch.planes[0].width, cvtype, imgPitch.planes[0].data,
98  imgPitch.planes[0].pitchBytes);
99 
100  if (cvimg.type() == CV_16U)
101  {
102  cvimg.convertTo(out, CV_8U);
103  cvimg = out;
104  out = cv::Mat();
105  }
106 
107  cvtColor(cvimg, out, cv::COLOR_GRAY2BGR);
108 
109  CHECK_STATUS(vpiImageUnlock(img));
110  }
111 
112  // 现在绘制边界框。
113  VPIArrayData boxdata;
114  CHECK_STATUS(vpiArrayLockData(boxes, VPI_LOCK_READ, VPI_ARRAY_BUFFER_HOST_AOS, &boxdata));
115 
116  VPIArrayData preddata;
117  CHECK_STATUS(vpiArrayLockData(preds, 118 
119  auto *pboxes = reinterpret_cast<VPIKLTTrackedBoundingBox *>(boxdata.buffer.aos.data);
120  auto *ppreds = reinterpret_cast<VPIHomographyTransform2D *>(preddata.buffer.aos.data);
121 
122  // Use random high-saturated colors
123  static std::vector<cv::Vec3b> colors;
124  if ((int)colors.size() != *boxdata.buffer.aos.sizePointer)
125  {
126  colors.resize(*boxdata.buffer.aos.sizePointer);
127 
128  cv::RNG rand(1);
129  for (size_t i = 0; i < colors.size(); ++i)
130  {
131  colors[i] = cv::Vec3b(rand.uniform(0, 180), 255, 255);
132  }
133  cvtColor(colors, colors, cv::COLOR_HSV2BGR);
134  }
135 
136  // For each tracked bounding box...
137  for (int i = 0; i < *boxdata.buffer.aos.sizePointer; ++i)
138  {
139  if (pboxes[i].trackingStatus == 1)
140  {
141  continue;
142  }
143 
144  float x, y, w, h;
145  x = pboxes[i].bbox.xform.mat3[0][2] + ppreds[i].mat3[0][2];
146  y = pboxes[i].bbox.xform.mat3[1][2] + ppreds[i].mat3[1][2];
147  w = pboxes[i].bbox.width * pboxes[i].bbox.xform.mat3[0][0] * ppreds[i].mat3[0][0];
148  h = pboxes[i].bbox.height * pboxes[i].bbox.xform.mat3[1][1] * ppreds[i].mat3[1][1];
149 
150  rectangle(out, cv::Rect(x, y, w, h), cv::Scalar(colors[i][0], colors[i][1], colors[i][2]), 2);
151  }
152 
153  CHECK_STATUS(vpiArrayUnlock(preds));
154  CHECK_STATUS(vpiArrayUnlock(boxes));
155 
156  return out;
157 }
158 
159 int main(int argc, char *argv[])
160 {
161  // OpenCV image that will be wrapped by a VPIImage.
162  // Define it here so that it's destroyed *after* wrapper is destroyed
163  cv::Mat cvTemplate, cvReference;
164 
165  // Arrays that will store our input bboxes and predicted transform.
166  VPIArray inputBoxList = NULL, inputPredList = NULL;
167 
168  // Other VPI objects that will be used
169  VPIStream stream = NULL;
170  VPIArray outputBoxList = NULL;
171  VPIArray outputEstimList = NULL;
172  VPIPayload klt = NULL;
173  VPIImage imgReference = NULL;
174  VPIImage imgTemplate = NULL;
175 
176  int retval = 0;
177  try
178  {
179  if (argc != 4)
180  {
181  throw std::runtime_error(std::string("Usage: ") + argv[0] + " <cpu|pva|cuda> <input_video> <bbox descr>");
182  }
183 
184  std::string strBackend = argv[1];
185  std::string strInputVideo = argv[2];
186  std::string strInputBBoxes = argv[3];
187 
188  // Load the input video
189  cv::VideoCapture invid;
190  if (!invid.open(strInputVideo))
191  {
192  throw std::runtime_error("Can't open '" + strInputVideo + "'");
193  }
194 
195  // Open the output video for writing using input's characteristics
196  int w = invid.get(cv::CAP_PROP_FRAME_WIDTH);
197  int h = invid.get(cv::CAP_PROP_FRAME_HEIGHT);
198  int fourcc = cv::VideoWriter::fourcc('M', 'P', 'E', 'G');
199  double fps = invid.get(cv::CAP_PROP_FPS);
200 
201  cv::VideoWriter outVideo("klt_" + strBackend + ".mp4", fourcc, fps, cv::Size(w, h));
202  if (!outVideo.isOpened())
203  {
204  throw std::runtime_error("Can't create output video");
205  }
206 
207  // Load the bounding boxes
208  // Format is: <frame number> <bbox_x> <bbox_y> <bbox_width> <bbox_height>
209  // Important assumption: bboxes must be sorted with increasing frame numbers.
210 
211  // These arrays will actually wrap these vectors.
212  std::vector<VPIKLTTrackedBoundingBox> bboxes;
213  int32_t bboxesSize = 0;
214  std::vector<VPIHomographyTransform2D> preds;
215  int32_t predsSize = 0;
216 
217  // Stores how many bboxes there are in each frame. Only
218  // stores when the bboxes count change.
219  std::map<int, size_t> bboxes_size_at_frame; // frame -> bbox count
220 
221  // PVA requires that array capacity is 128.
222  bboxes.reserve(128);
223  preds.reserve(128);
224 
225  // Read bounding boxes
226  {
227  std::ifstream in(strInputBBoxes);
228  if (!in)
229  {
230  throw std::runtime_error("Can't open '" + strInputBBoxes + "'");
231  }
232 
233  // For each bounding box,
234  int frame, x, y, w, h;
235  while (in >> frame >> x >> y >> w >> h)
236  {
237  if (bboxes.size() == 64)
238  {
239  throw std::runtime_error("Too many bounding boxes");
240  }
241 
242  // Convert the axis-aligned bounding box into our tracking
243  // structure.
244 
245  VPIKLTTrackedBoundingBox track = {};
246  // scale
247  track.bbox.xform.mat3[0][0] = 1;
248  track.bbox.xform.mat3[1][1] = 1;
249  // position
250  track.bbox.xform.mat3[0][2] = x;
251  track.bbox.xform.mat3[1][2] = y;
252  // must be 1
253  track.bbox.xform.mat3[2][2] = 1;
254 
255  track.bbox.width = w;
256  track.bbox.height = h;
257  track.trackingStatus = 0; // valid tracking
258  track.templateStatus = 1; // must update
259 
260  bboxes.push_back(track);
261 
262  // Identity predicted transform.
263  VPIHomographyTransform2D xform = {};
264  xform.mat3[0][0] = 1;
265  xform.mat3[1][1] = 1;
266  xform.mat3[2][2] = 1;
267  preds.push_back(xform);
268 
269  bboxes_size_at_frame[frame] = bboxes.size();
270  }
271 
272  if (!in && !in.eof())
273  {
274  throw std::runtime_error("Can't parse bounding boxes, stopped at bbox #" +
275  std::to_string(bboxes.size()));
276  }
277 
278  // Wrap the input arrays into VPIArray's
279  VPIArrayData data = {};
282  data.buffer.aos.capacity = bboxes.capacity();
283  data.buffer.aos.sizePointer = &bboxesSize;
284  data.buffer.aos.data = &bboxes[0];
285  CHECK_STATUS(vpiArrayCreateWrapper(&data, 0, &inputBoxList));
286 
288  data.buffer.aos.sizePointer = &predsSize;
289  data.buffer.aos.data = &preds[0];
290  CHECK_STATUS(vpiArrayCreateWrapper(&data, 0, &inputPredList));
291  }
292 
293  // Now parse the backend
294  VPIBackend backend;
295 
296  if (strBackend == "cpu")
297  {
298  backend = VPI_BACKEND_CPU;
299  }
300  else if (strBackend == "cuda")
301  {
302  backend = VPI_BACKEND_CUDA;
303  }
304  else if (strBackend == "pva")
305  {
306  backend = VPI_BACKEND_PVA;
307  }
308  else
309  {
310  throw std::runtime_error("Backend '" + strBackend +
311  "' not recognized, it must be either cpu, cuda or pva.");
312  }
313 
314  // Create the stream for the given backend.
315  CHECK_STATUS(vpiStreamCreate(backend, &stream));
316 
317  // Helper function to fetch a frame from input
318  int nextFrame = 0;
319  auto fetchFrame = [&invid, &nextFrame, backend]() {
320  cv::Mat frame;
321  if (!invid.read(frame))
322  {
323  return cv::Mat();
324  }
325 
326  // We only support grayscale inputs
327  if (frame.channels() == 3)
328  {
329  cvtColor(frame, frame, cv::COLOR_BGR2GRAY);
330  }
331 
332  if (backend == VPI_BACKEND_PVA)
333  {
334  // PVA only supports 16-bit unsigned inputs,
335  // where each element is in 0-255 range, so
336  // no rescaling needed.
337  cv::Mat aux;
338  frame.convertTo(aux, CV_16U);
339  frame = aux;
340  }
341  else
342  {
343  assert(frame.type() == CV_8U);
344  }
345 
346  ++nextFrame;
347  return frame;
348  };
349 
350  // 获取第一帧并将其包装到 VPIImage 中。
351  // 模板将基于此帧。
352  cvTemplate = fetchFrame();
353  CHECK_STATUS(vpiImageCreateWrapperOpenCVMat(cvTemplate, 0, &imgTemplate));
354 
355  // 创建参考图像包装器。现在我们先包装 cvTemplate,
356  // 仅用于创建包装器。稍后我们将其设置为包装实际的参考图像。
357  CHECK_STATUS(vpiImageCreateWrapperOpenCVMat(cvTemplate, 0, &imgReference));
358 
359  VPIImageFormat imgFormat;
360  CHECK_STATUS(vpiImageGetFormat(imgTemplate, &imgFormat));
361 
362  // 使用第一帧的特性,创建一个 KLT 边界框跟踪器负载。
363  // 我们将模板尺寸限制为 64x64。
364  CHECK_STATUS(vpiCreateKLTFeatureTracker(backend, cvTemplate.cols, cvTemplate.rows, imgFormat, NULL, &klt));
365 
366  // 我们将使用的参数。无需动态更改它们,因此只需在此处定义它们。
368  CHECK_STATUS(vpiInitKLTFeatureTrackerParams(&params));
369 
370  // 带有当前帧估计边界框的输出数组。
371  CHECK_STATUS(vpiArrayCreate(128, VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX, 0, &outputBoxList));
372 
373  // 带有估计的输入边界框变换的输出数组,以匹配输出边界框。
374  CHECK_STATUS(vpiArrayCreate(128, VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D, 0, &outputEstimList));
375 
376  size_t curNumBoxes = 0;
377 
378  do
379  {
380  size_t curFrame = nextFrame - 1;
381 
382  // 获取当前帧中边界框的数量。
383  auto tmp = --bboxes_size_at_frame.upper_bound(curFrame);
384  size_t bbox_count = tmp->second;
385 
386  assert(bbox_count >= curNumBoxes && "输入边界框必须按帧排序");
387 
388  // 当前帧是否有新的边界框?
389  if (curNumBoxes != bbox_count)
390  {
391  // 更新输入数组大小,新帧已在那里,因为我们已经填充了
392  // 这些数组包含所有输入边界框。
393  CHECK_STATUS(vpiArraySetSize(inputBoxList, bbox_count));
394  CHECK_STATUS(vpiArraySetSize(inputPredList, bbox_count));
395 
396  for (size_t i = 0; i < bbox_count - curNumBoxes; ++i)
397  {
398  std::cout << curFrame << " -> new " << curNumBoxes + i << std::endl;
399  }
400  assert(bbox_count <= bboxes.capacity());
401  assert(bbox_count <= preds.capacity());
402 
403  curNumBoxes = bbox_count;
404  }
405 
406  // 将此帧保存到磁盘。
407  outVideo << WriteKLTBoxes(imgTemplate, inputBoxList, inputPredList);
408 
409  // 获取新帧
410  cvReference = fetchFrame();
411 
412  // 视频结束了吗?
413  if (cvReference.data == NULL)
414  {
415  // 优雅地结束。
416  break;
417  }
418 
419  // 使参考包装器指向参考帧
420  CHECK_STATUS(vpiImageSetWrappedOpenCVMat(imgReference, cvReference));
421 
422  // 估计当前帧(参考帧)中的边界框,给定它们在前一帧(模板帧)中的位置。
423  // 帧(模板)。
424  CHECK_STATUS(vpiSubmitKLTFeatureTracker(stream, backend, klt, imgTemplate, inputBoxList, inputPredList,
425  imgReference, outputBoxList, outputEstimList, &params));
426 
427  // 等待处理完成。
428  CHECK_STATUS(vpiStreamSync(stream));
429 
430  // 现在输入和输出数组被锁定,以正确设置下一次迭代的输入。
431  // 输入数组将根据本次迭代中生成的跟踪信息进行更新。
432  VPIArrayData updatedBBoxData;
433  CHECK_STATUS(vpiArrayLockData(outputBoxList, VPI_LOCK_READ, VPI_ARRAY_BUFFER_HOST_AOS, &updatedBBoxData));
434 
435  VPIArrayData estimData;
436  CHECK_STATUS(vpiArrayLockData(outputEstimList, VPI_LOCK_READ, VPI_ARRAY_BUFFER_HOST_AOS, &estimData));
437 
438  // 由于这些数组实际上是外部数据的包装器,因此我们不需要检索
439  // VPI 数组内容,包装的缓冲区将直接更新。数组必须
440  // 无论如何都要锁定以进行读/写。
441  CHECK_STATUS(vpiArrayLock(inputBoxList, VPI_LOCK_READ_WRITE));
442  CHECK_STATUS(vpiArrayLock(inputPredList, VPI_LOCK_READ_WRITE));
443 
444  auto *updated_bbox = reinterpret_cast<VPIKLTTrackedBoundingBox *>(updatedBBoxData.buffer.aos.data);
445  auto *estim = reinterpret_cast<VPIHomographyTransform2D *>(estimData.buffer.aos.data);
446 
447  // 对于每个边界框,
448  for (size_t b = 0; b < curNumBoxes; ++b)
449  {
450  // 跟踪失败了吗?
451  if (updated_bbox[b].trackingStatus)
452  {
453  // 我们是否也必须更新输入边界框的跟踪状态?
454  if (bboxes[b].trackingStatus == 0)
455  {
456  std::cout << curFrame << " -> dropped " << b << std::endl;
457  bboxes[b].trackingStatus = 1;
458  }
459 
460  continue;
461  }
462 
463  // 必须更新此边界框的模板吗?
464  if (updated_bbox[b].templateStatus)
465  {
466  std::cout << curFrame << " -> update " << b << std::endl;
467 
468  // 这里通常有两种方法:
469  // 1. 使用特征检测器(例如
470  // \ref algo_harris_corners "Harris 角点检测器")重新定义边界框,或者
471  // 2. 使用 updated_bbox[b],它仍然有效,尽管跟踪
472  // 错误可能会随着时间累积。
473  //
474  // 我们将选择第二种方案,鲁棒性较差,但足够简单
475  // 以实现。
476  bboxes[b] = updated_bbox[b];
477 
478  // 发出信号通知输入,必须更新此边界框的模板。
479  bboxes[b].templateStatus = 1;
480 
481  // 预测的变换现在是恒等变换,因为我们重置了跟踪。
482  preds[b] = VPIHomographyTransform2D{};
483  preds[b].mat3[0][0] = 1;
484  preds[b].mat3[1][1] = 1;
485  preds[b].mat3[2][2] = 1;
486  }
487  else
488  {
489  // 通知输入,此边界框的模板不需要更新。
490  bboxes[b].templateStatus = 0;
491 
492  // 我们只需使用估计的变换来更新输入变换。
493  preds[b] = estim[b];
494  }
495  }
496 
497  // 我们已经完成对输入和输出数组的操作。
498  CHECK_STATUS(vpiArrayUnlock(inputBoxList));
499  CHECK_STATUS(vpiArrayUnlock(inputPredList));
500 
501  CHECK_STATUS(vpiArrayUnlock(outputBoxList));
502  CHECK_STATUS(vpiArrayUnlock(outputEstimList));
503 
504  // 下一帧的参考帧是当前帧的模板。
505  std::swap(imgTemplate, imgReference);
506  std::swap(cvTemplate, cvReference);
507  } while (true);
508  }
509  catch (std::exception &e)
510  {
511  std::cerr << e.what() << std::endl;
512  retval = 1;
513  }
514 
515  vpiStreamDestroy(stream);
516  vpiPayloadDestroy(klt);
517  vpiArrayDestroy(inputBoxList);
518  vpiArrayDestroy(inputPredList);
519  vpiArrayDestroy(outputBoxList);
520  vpiArrayDestroy(outputEstimList);
521  vpiImageDestroy(imgReference);
522  vpiImageDestroy(imgTemplate);
523 
524  return retval;
525 }
Functions and structures for dealing with VPI arrays.
#define VPI_IMAGE_FORMAT_U16
Single plane with one 16-bit unsigned integer channel.
Definition: ImageFormat.h:111
#define VPI_IMAGE_FORMAT_S16
Single plane with one 16-bit signed integer channel.
Definition: ImageFormat.h:120
#define VPI_IMAGE_FORMAT_S8
Single plane with one 8-bit signed integer channel.
Definition: ImageFormat.h:108
#define VPI_IMAGE_FORMAT_U8
Single plane with one 8-bit unsigned integer channel.
Definition: ImageFormat.h:100
Functions and structures for dealing with VPI images.
Declares functions that implement the KLT Feature Tracker algorithm.
Functions for handling OpenCV interoperability with VPI.
Declaration of VPI status codes handling functions.
Declares functions dealing with VPI streams.
VPIArrayBufferType bufferType
Type of array buffer.
Definition: Array.h:172
void * data
Points to the first element of the array.
Definition: Array.h:135
VPIArrayBuffer buffer
Stores the array contents.
Definition: Array.h:175
int32_t * sizePointer
Points to the number of elements in the array.
Definition: Array.h:122
VPIArrayBufferAOS aos
Array stored in array-of-structures layout.
Definition: Array.h:162
int32_t capacity
Maximum number of elements that the array can hold.
Definition: Array.h:126
VPIArrayType type
Type of each array element.
Definition: Array.h:118
VPIStatus vpiArraySetSize(VPIArray array, int32_t size)
Set the array size in elements.
VPIStatus vpiArrayUnlock(VPIArray array)
Releases the lock on array object.
VPIStatus vpiArrayLockData(VPIArray array, VPILockMode mode, VPIArrayBufferType bufType, VPIArrayData *data)
Acquires the lock on an array object and returns the array contents.
VPIStatus vpiArrayCreateWrapper(const VPIArrayData *data, uint64_t flags, VPIArray *array)
Create an array object by wrapping an existing host memory block.
void vpiArrayDestroy(VPIArray array)
Destroy an array instance.
VPIStatus vpiArrayCreate(int32_t capacity, VPIArrayType type, uint64_t flags, VPIArray *array)
Create an empty array instance.
VPIStatus vpiArrayLock(VPIArray array, VPILockMode mode)
Acquires the lock on an array object.
struct VPIArrayImpl * VPIArray
A handle to an array.
Definition: Types.h:232
@ VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX
VPIKLTTrackedBoundingBox element.
Definition: ArrayType.h:79
@ VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D
VPIHomographyTransform2D element.
Definition: ArrayType.h:78
@ VPI_ARRAY_BUFFER_HOST_AOS
Host-accessible array-of-structures.
Definition: Array.h:146
Stores information about array characteristics and contents.
Definition: Array.h:168
uint64_t VPIImageFormat
Pre-defined image formats.
Definition: ImageFormat.h:94
VPIImageBuffer buffer
Stores the image contents.
Definition: Image.h:241
VPIImagePlanePitchLinear planes[VPI_MAX_PLANE_COUNT]
Data of all image planes in pitch-linear layout.
Definition: Image.h:160
VPIImageBufferPitchLinear pitch
Image stored in pitch-linear layout.
Definition: Image.h:210
void * data
Pointer to the first row of this plane.
Definition: Image.h:141
VPIImageFormat format
Image format.
Definition: Image.h:152
VPIImageBufferType bufferType
Type of image buffer.
Definition: Image.h:238
int32_t height
Height of this plane in pixels.
Definition: Image.h:123
int32_t width
Width of this plane in pixels.
Definition: Image.h:119
int32_t pitchBytes
Difference in bytes of beginning of one row and the beginning of the previous.
Definition: Image.h:134
void vpiImageDestroy(VPIImage img)
Destroy an image instance.
struct VPIImageImpl * VPIImage
A handle to an image.
Definition: Types.h:256
VPIStatus vpiImageLockData(VPIImage img, VPILockMode mode, VPIImageBufferType bufType, VPIImageData *data)
Acquires the lock on an image object and returns the image contents.
VPIStatus vpiImageGetFormat(VPIImage img, VPIImageFormat *format)
Get the image format.
VPIStatus vpiImageUnlock(VPIImage img)
Releases the lock on an image object.
@ VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR
Host-accessible with planes in pitch-linear memory layout.
Definition: Image.h:172
Stores the image plane contents.
Definition: Image.h:150
Stores information about image characteristics and content.
Definition: Image.h:234
int8_t templateStatus
Status of the template related to this bounding box.
Definition: Types.h:504
int8_t trackingStatus
Tracking status of this bounding box.
Definition: Types.h:497
VPIBoundingBox bbox
Bounding box being tracked.
Definition: Types.h:490
VPIStatus vpiCreateKLTFeatureTracker(uint64_t backends, int32_t imageWidth, int32_t imageHeight, VPIImageFormat imageFormat, const VPIKLTFeatureTrackerCreationParams *params, VPIPayload *payload)
Creates payload for vpiSubmitKLTFeatureTracker.
VPIStatus vpiSubmitKLTFeatureTracker(VPIStream stream, uint64_t backend, VPIPayload payload, VPIImage templateImage, VPIArray inputBoxList, VPIArray inputPredictionList, VPIImage referenceImage, VPIArray outputBoxList, VPIArray outputEstimationList, const VPIKLTFeatureTrackerParams *params)
Runs KLT Feature Tracker on two frames.
VPIStatus vpiInitKLTFeatureTrackerParams(VPIKLTFeatureTrackerParams *params)
Initialize VPIKLTFeatureTrackerParams with default values.
Structure that defines the parameters for vpiCreateKLTFeatureTracker.
Stores a bounding box that is being tracked by KLT Tracker.
Definition: Types.h:488
VPIStatus vpiImageCreateWrapperOpenCVMat(const cv::Mat &mat, VPIImageFormat fmt, uint64_t flags, VPIImage *img)
Wraps a cv::Mat in an VPIImage with the given image format.
VPIStatus vpiImageSetWrappedOpenCVMat(VPIImage img, const cv::Mat &mat)
Redefines the wrapped cv::Mat of an existing VPIImage wrapper.
struct VPIPayloadImpl * VPIPayload
A handle to an algorithm payload.
Definition: Types.h:268
void vpiPayloadDestroy(VPIPayload payload)
Deallocates the payload object and all associated resources.
struct VPIStreamImpl * VPIStream
A handle to a stream.
Definition: Types.h:250
VPIStatus vpiStreamSync(VPIStream stream)
阻塞调用线程,直到此流队列中所有已提交的命令完成(队列为空)...
VPIBackend
VPI 后端类型。
定义: Types.h:91
void vpiStreamDestroy(VPIStream stream)
销毁一个流实例并释放所有硬件资源。
VPIStatus vpiStreamCreate(uint64_t flags, VPIStream *stream)
创建一个流实例。
@ VPI_BACKEND_CUDA
CUDA 后端。
定义: Types.h:93
@ VPI_BACKEND_PVA
PVA 后端。
定义: Types.h:94
@ VPI_BACKEND_CPU
CPU 后端。
定义: Types.h:92
float width
边界框宽度。
定义: Types.h:426
float height
边界框高度。
定义: Types.h:427
VPIHomographyTransform2D xform
定义边界框的左上角及其单应性。
定义: Types.h:425
float mat3[3][3]
定义单应性的 3x3 齐次矩阵。
定义: Types.h:405
@ VPI_LOCK_READ_WRITE
锁定内存以进行读取和写入。
定义: Types.h:631
@ VPI_LOCK_READ
仅锁定内存以进行读取。
定义: Types.h:617
存储通用的 2D 单应性变换。
定义: Types.h:404