logo

Shanghai Neardi Technology Co., Ltd. sales@neardi.com 86-021-20952021

Shanghai Neardi Technology Co., Ltd. company profile
News
Home > News >
Company News About From Algorithm Logic to Chip - side Deployment: The Evolution of YOLO Object Detection and Rockchip's Practice

From Algorithm Logic to Chip - side Deployment: The Evolution of YOLO Object Detection and Rockchip's Practice

2026-01-12
Latest company news about From Algorithm Logic to Chip - side Deployment: The Evolution of YOLO Object Detection and Rockchip's Practice

Standing at a crossroads, you only need a fleeting glance for your brain to instantly label everything in your field of vision: that red bus is pulling into the station, the child on the sidewalk is running, and a food delivery scooter is speeding by on the side. This almost intuitive reaction was once extremely difficult for computers to learn. That was until YOLO came along. You Only Look Once—at the moment an image is captured, classification and localization are completed simultaneously. It allowed object detection to bid farewell to exhaustive searches and, just like human intuition, truly endowed machines with the essence of real-time thinking.

latest company news about From Algorithm Logic to Chip - side Deployment: The Evolution of YOLO Object Detection and Rockchip's Practice  0

Visual "Intuition": The Regression Philosophy of YOLO

Before the birth of YOLO, the field of computer vision had long been dominated by the two-stage architecture. Back then, to detect an object, an algorithm first had to extract thousands of region proposals, and then classify them one by one. The genius of YOLO lies in that it completely overturned this cumbersome "proposal-then-verification" process and reconstructed object detection from a classification task into an end-to-end regression problem.

When you input an image into the YOLO network, it cuts the Gordian knot by directly dividing the image into an S*S grid. Each grid is not only a slice of the image, but also a feature point in the network output tensor.

latest company news about From Algorithm Logic to Chip - side Deployment: The Evolution of YOLO Object Detection and Rockchip's Practice  1

Integrated Tensor Prediction: Each grid directly predicts the coordinate information (x, y, w, h) of multiple bounding boxes, as well as a confidence score indicating whether an object is present here.

Parallel Classification and Localization: While predicting coordinates, each grid also calculates a set of class probabilities. This means that localization and classification are completed in a fully parallel manner within the output of the same layer of the neural network.

Global Feature Coupling: Thanks to the end-to-end design of the network, it has access to the global information of the entire image when making decisions. Compared with traditional algorithms that only focus on local region proposals, YOLO’s such "big-picture view" enables it to identify background noise more accurately, making it less likely to misclassify irregularly shaped clouds as birds.

YOLO in Industrial AI Vision

Many people think AI is distant, but honestly, YOLO has long been "competing fiercely" in corners unseen by us.

Smart Construction Sites: In tunnel construction sites filled with dust or with extremely poor lighting, YOLOv9 demonstrates extremely strong feature extraction capabilities.

Behavior Compliance Detection: It can not only identify the presence or absence of safety helmets and reflective vests, but also determine whether they are worn properly (e.g., whether the helmet strap is fastened, or the zipper is fully zipped) through detailed features.

High-concurrency Processing: It supports large-scale real-time detection of over 50 people per frame. Combined with infrared imaging technology, it realizes the leap from "manual monitoring" to "24/7 automatic early warning".

latest company news about From Algorithm Logic to Chip - side Deployment: The Evolution of YOLO Object Detection and Rockchip's Practice  2

Urban Governance: Urban management and comprehensive governance scenarios impose high requirements on the anti-interference capability of algorithms.

Static Governance: By combining historical image comparison and semantic segmentation, the system can accurately identify newly-built illegal structures, garbage accumulation or road occupation for business, and even automatically quantify the area and volume of violations.

Dynamic Security: Based on pose recognition (OpenPose/YOLO-Pose), the system can sensitively capture abnormal behaviors such as "person falling to the ground" and link with emergency medical systems. In dense crowds, it uses density clustering algorithm (DBSCAN) to monitor crowd density in real time and prevent stampede risks.

latest company news about From Algorithm Logic to Chip - side Deployment: The Evolution of YOLO Object Detection and Rockchip's Practice  3

Power Inspection: Multimodal Fusion in high-risk areas such as underground cable tunnels or high-voltage transmission towers: By fusing lidar point cloud and infrared thermal imaging, it can conduct non-contact detection of transformer abnormal heating, arrester leakage current or tower tilt (with an accuracy of 0.1°) from a distance of 30 meters.

Automatic Defect Judgment: For minor hidden dangers such as cable damage and bracket corrosion, the recognition accuracy exceeds 92%, which greatly improves operation and maintenance efficiency and ensures personnel safety.

latest company news about From Algorithm Logic to Chip - side Deployment: The Evolution of YOLO Object Detection and Rockchip's Practice  4

Forest Fire Prevention: For large-area, irregularly-shaped smoke and fire detection, YOLO demonstrates ultra-fast response capability.

Accurate Smoke and Fire Identification: Combining image features and thermal radiation data, it can distinguish wildfires, campfires or farmland burning within 2 seconds, with extremely strong anti-interference capability against clouds and vegetation shadows.

Situation Awareness: Integrating GIS geographic information and random forest model, the system can not only detect fire, but also predict the spread trend based on wind speed and terrain, providing visual maps for on-site scheduling.

Ultimate Computing Power Optimization for RK3588/RK3576

Honestly, benchmarking on a graphics card is just a warm-up. What truly enables YOLO to be deployed and implemented is porting it into chip-sized SoCs like Rockchip’s RK3588 or RK3576. This is not just a simple code migration, but an "extreme exploitation" of computing power, bandwidth, and memory. To achieve millisecond-level object detection on these SoC platforms, the following steps are typically required:

"Translate" the Model: The chip’s NPU (Neural Processing Unit) has its own specifications and cannot interpret PyTorch’s native .pt training files. Using RKNN-Toolkit2, the model is converted to ONNX format, then disassembled and reconstructed into the .rknn format that the chip can understand—watching complex operators be rearranged into the computation paths favored by the NPU.

"Slim Down" via Compression: Native FP32 (32-bit floating-point) models have an enormous number of parameters, imposing a heavy burden on the bandwidth and storage of embedded chips. Quantization algorithms compress weights and activations from 32-bit to 8-bit, reducing memory usage by a full 75%. This not only alleviates DDR bandwidth pressure but also effectively lowers computational power consumption.

"Data Transfer" Optimization: Even if the model is fast enough, the NPU will still "sit idle" if the CPU is busy moving video streams in memory. To avoid wasting a single millisecond, DMA-BUF zero-copy technology is used to enable video stream data sharing in video memory among the ISP, GPU, and NPU, completely eliminating CPU copy overhead. Combined with parallel logic for asynchronous inference, the next frame is already queued for processing while the current frame is still undergoing convolution operations. This seamless coordination is what allows real-time video streams to run smoothly on the chip.

Which YOLO Version Is Your "Go-to Choice"?

When deploying on embedded devices, the choice of version is not simply about "chasing the latest"; instead, it requires balancing computing power overhead, operator compatibility, and the accuracy requirements of specific tasks.

latest company news about From Algorithm Logic to Chip - side Deployment: The Evolution of YOLO Object Detection and Rockchip's Practice  5
Engineering Benchmark: YOLOv5

As the version with the most mature ecosystem, YOLOv5 boasts extremely high stability and deployment coverage in the industrial sector.

  • Technical Features: Adopts an Anchor-based mechanism with a flexible architecture (available in multiple scales from Nano to Huge).
  • Deployment Advantages: Rockchip’s RKNN toolchain provides the most comprehensive support for it with excellent operator compatibility, making it the first choice for pursuing rapid project deployment and high stability.
All-round Architecture: YOLOv8

YOLOv8 introduces an Anchor-free mechanism, achieving a unified architecture for detection, segmentation, and pose estimation (Pose).

  • Technical Features: Utilizes the C2f module to enhance feature flow and improves regression accuracy through a Decoupled Head.
  • Deployment Advantages: It strikes an excellent balance between accuracy and speed when handling multi-task parallelism (e.g., simultaneous object detection and human keypoint extraction), making it the mainstream solution on high-performance SoCs such as RK3588 at present.
End-to-End Performance Leap: YOLOv10

YOLOv10 has made breakthrough progress in addressing the post-processing bottleneck in real-time detection.

  • Technical Features: Introduces an NMS-free (Non-Maximum Suppression-free) strategy, eliminating non-determinism in inference latency through alignment design of one-to-many and one-to-one matching.
  • Deployment Advantages: At the edge, NMS often accounts for a significant portion of CPU time consumption. YOLOv10 completely resolves this performance loss, enabling the inference process to exhibit better linear stability on SoC hardware.
High-Precision Evolution: YOLOv11 and VajraV1

These represent the latest technological iterations for complex scenarios, focusing on capturing fine-grained features.

  • Technical Features: YOLOv11 optimizes lightweight attention mechanisms (C3k2/C2PSA), while VajraV1 is deeply customized for edge devices on this basis. By widening core convolutions and adopting low-rank guided design, it significantly improves robustness in complex environments.
  • Deployment Advantages: It has distinct advantages in dense object detection, occlusion scenarios, and high-precision pose perception (e.g., details of safety helmet wearing, fine-grained action recognition), representing the highest upper limit of detection accuracy achievable by the YOLO family on embedded devices to date.

The evolution of algorithms has lowered the threshold for perception, while the popularization of chips has expanded the boundaries of intelligence.

Events
Contacts
Contacts: Mr. Cola
Contact Now
Mail Us