Practical Mixed Palletizing Manipulator System: Incorporating Practical Reinforcement Learning and Configuration-Space Motion Planning


1. Objective

Palletizing, also known as the 3D bin packing problem, is critical for optimizing space utilization and automating packing processes in the logistics industry. Handling mixed palletizing scenarios—where boxes of various sizes arrive in real-time—is particularly challenging. Existing methods often overlook practical constraints such as stability and robustness encountered in real-world applications.

In this work, we propose a practical mixed palletizing manipulator system designed for structured real-world warehouse environments. The system comprises two main components:

  • PMP-RL (Practical Mixed Palletizing with Reinforcement Learning): Facilitates stable and efficient box placement
  • CMPNet (Configuration-space Motion Planning Network): Achieves robust and collision-free robot movement

2. System Overview

System Diagram

The complete manipulator system includes:

  • An automated conveyor belt for incoming boxes
  • A camera-based recognition system using RGB-D sensors
  • The PMP-RL model for optimal placement decisions
  • CMPNet for real-time motion trajectory generation

The vision-based box recognition process extracts box dimensions (width, height, depth) and orientation from RGB-D images through background subtraction, Canny edge detection, and Hough line detection.

3. PMP-RL Framework

The palletizing process is formulated as a Markov Decision Process (MDP):

  • State st: Current pallet configuration and incoming box dimensions (bw, bh, bd)
  • Action at: Placement coordinates (px, py, pz) derived through convex hull checks
  • Reward rt: Computed from utility ratio, center-of-mass alignment, and bottom surface ratio

We adopt a tree-based representation using a relational graph neural network that captures spatial relationships between placed boxes, enabling efficient exploration of valid placements while avoiding prohibitively large action spaces.

4. Reward Engineering

To ensure stable stacking during online deployment, we designed three reward components:

rtotal = ω1 · rUR + ω2 · rCoM + ω3 · rBSR

Utility Ratio (rUR)

Maximizes the number of boxes stacked by measuring the ratio of occupied space to total pallet volume. This encourages the RL agent to stack as many boxes as possible.

Center of Mass (rCoM)

Center of Mass

Ensures stable stacking by keeping the overall center of mass close to the bottom-center of the pallet. The reward minimizes the 3D Euclidean distance between the CoM and pallet center, with extra weight on height.

Bottom Surface Ratio (rBSR)

Encourages placements that maximize the contact surface area between newly placed boxes and underlying boxes, reducing the probability of boxes falling and promoting space-efficient configurations.

5. Practical Considerations

To ensure applicability to real-world settings:

Convex Hull

Convex Hull

Implements a stability check where a box is considered stable only if its center of mass lies within the convex hull of underlying boxes (stability threshold > 80%). Invalid placements trigger selection of the next best action.

Box Margin

Adds 5% extra space between boxes (proportional to width and height) to account for placement tolerance and prevent collisions due to simulation-to-reality discrepancies.

Overpallet

Allows limited overhang (pallet extended by up to 10%, individual boxes up to 75% extension) to offset utility ratio reduction from margins while maintaining practical stability.

6. Results

Palletizing Results

Experiments conducted in both simulations and real-world environments demonstrate that the manipulator system can handle complex palletizing tasks with high efficiency and high stability. The PMP-RL model successfully maximizes pallet volume utilization while the practical reward functions and convex hull checks ensure stable box configurations.

7. Supplementary Materials

IEEE TASE Paper

The full paper is published in IEEE Transactions on Automation Science and Engineering (November 2025).