Palletizing, also known as the 3D bin packing problem, is critical for optimizing space utilization and automating packing processes in the logistics industry. Handling mixed palletizing scenarios—where boxes of various sizes arrive in real-time—is particularly challenging. Existing methods often overlook practical constraints such as stability and robustness encountered in real-world applications.
In this work, we propose a practical mixed palletizing manipulator system designed for structured real-world warehouse environments. The system comprises two main components:
The complete manipulator system includes:
The vision-based box recognition process extracts box dimensions (width, height, depth) and orientation from RGB-D images through background subtraction, Canny edge detection, and Hough line detection.
The palletizing process is formulated as a Markov Decision Process (MDP):
We adopt a tree-based representation using a relational graph neural network that captures spatial relationships between placed boxes, enabling efficient exploration of valid placements while avoiding prohibitively large action spaces.
To ensure stable stacking during online deployment, we designed three reward components:
rtotal = ω1 · rUR + ω2 · rCoM + ω3 · rBSR
Maximizes the number of boxes stacked by measuring the ratio of occupied space to total pallet volume. This encourages the RL agent to stack as many boxes as possible.
Ensures stable stacking by keeping the overall center of mass close to the bottom-center of the pallet. The reward minimizes the 3D Euclidean distance between the CoM and pallet center, with extra weight on height.
Encourages placements that maximize the contact surface area between newly placed boxes and underlying boxes, reducing the probability of boxes falling and promoting space-efficient configurations.
To ensure applicability to real-world settings:
Implements a stability check where a box is considered stable only if its center of mass lies within the convex hull of underlying boxes (stability threshold > 80%). Invalid placements trigger selection of the next best action.
Adds 5% extra space between boxes (proportional to width and height) to account for placement tolerance and prevent collisions due to simulation-to-reality discrepancies.
Allows limited overhang (pallet extended by up to 10%, individual boxes up to 75% extension) to offset utility ratio reduction from margins while maintaining practical stability.
Experiments conducted in both simulations and real-world environments demonstrate that the manipulator system can handle complex palletizing tasks with high efficiency and high stability. The PMP-RL model successfully maximizes pallet volume utilization while the practical reward functions and convex hull checks ensure stable box configurations.
The full paper is published in IEEE Transactions on Automation Science and Engineering (November 2025).