1. Objective¶

Palletizing, also known as the 3D bin packing problem, is critical for optimizing space utilization and automating packing processes in the logistics industry. Handling mixed palletizing scenarios—where boxes of various sizes arrive in real-time—is particularly challenging. Existing methods often overlook practical constraints such as stability and robustness encountered in real-world applications.

In this work, we propose a practical mixed palletizing manipulator system designed for structured real-world warehouse environments. The system comprises two main components:

PMP-RL (Practical Mixed Palletizing with Reinforcement Learning): Facilitates stable and efficient box placement
CMPNet (Configuration-space Motion Planning Network): Achieves robust and collision-free robot movement

2. System Overview¶

The complete manipulator system includes:

An automated conveyor belt for incoming boxes
A camera-based recognition system using RGB-D sensors
The PMP-RL model for optimal placement decisions
CMPNet for real-time motion trajectory generation

The vision-based box recognition process extracts box dimensions (width, height, depth) and orientation from RGB-D images through background subtraction, Canny edge detection, and Hough line detection.

3. PMP-RL Framework¶

The palletizing process is formulated as a Markov Decision Process (MDP):

State s_t: Current pallet configuration and incoming box dimensions (b_w, b_h, b_d)
Action a_t: Placement coordinates (p_x, p_y, p_z) derived through convex hull checks
Reward r_t: Computed from utility ratio, center-of-mass alignment, and bottom surface ratio

We adopt a tree-based representation using a relational graph neural network that captures spatial relationships between placed boxes, enabling efficient exploration of valid placements while avoiding prohibitively large action spaces.

4. Reward Engineering¶

To ensure stable stacking during online deployment, we designed three reward components:

r_total = ω₁ · r_UR + ω₂ · r_CoM + ω₃ · r_BSR

Utility Ratio (r_UR)

Maximizes the number of boxes stacked by measuring the ratio of occupied space to total pallet volume. This encourages the RL agent to stack as many boxes as possible.

Center of Mass (r_CoM)

Ensures stable stacking by keeping the overall center of mass close to the bottom-center of the pallet. The reward minimizes the 3D Euclidean distance between the CoM and pallet center, with extra weight on height.

Bottom Surface Ratio (r_BSR)

Encourages placements that maximize the contact surface area between newly placed boxes and underlying boxes, reducing the probability of boxes falling and promoting space-efficient configurations.

5. Practical Considerations¶

To ensure applicability to real-world settings:

Convex Hull

Implements a stability check where a box is considered stable only if its center of mass lies within the convex hull of underlying boxes (stability threshold > 80%). Invalid placements trigger selection of the next best action.

Box Margin

Adds 5% extra space between boxes (proportional to width and height) to account for placement tolerance and prevent collisions due to simulation-to-reality discrepancies.

Overpallet

Allows limited overhang (pallet extended by up to 10%, individual boxes up to 75% extension) to offset utility ratio reduction from margins while maintaining practical stability.

6. Results¶

Experiments conducted in both simulations and real-world environments demonstrate that the manipulator system can handle complex palletizing tasks with high efficiency and high stability. The PMP-RL model successfully maximizes pallet volume utilization while the practical reward functions and convex hull checks ensure stable box configurations.

7. Supplementary Materials

IEEE TASE Paper

The full paper is published in IEEE Transactions on Automation Science and Engineering (November 2025).

Practical Mixed Palletizing Manipulator System: Incorporating Practical Reinforcement Learning and Configuration-Space Motion Planning