PF-Net 3D Point-Cloud Completion in Practice

The earlier portfolio projects are all 2D image / text / multimodal. This one switches the data: 3D unordered point sets. The task is point-cloud completion — given a partial cloud with a missing region, regenerate that region. It uses PF-Net (Point Fractal Network), a GAN architecture. From the course's ch.14 (3D point clouds); the code was read end to end.

The task: point-cloud completion

A point cloud is a set of N unordered 3D points — no pixel grid, no fixed neighbor structure. Completion: input a partial cloud (occluded / under-scanned), output the points of the full shape that are missing. PF-Net's trick is to generate only the missing region rather than reconstruct the whole shape — keep the known part, focus on the hole.

Data and self-supervised cropping

The dataset is ShapeNet-Part (shapenet_part_loader.py), 16 categories. Each shape is resampled to npoints=2048 and normalized to the unit sphere: subtract the centroid, divide by the max radius.

The supervision signal comes from self-supervised cropping (Train_PFNet.py):

Pick a random direction from 5 fixed viewpoints
Sort all points by squared distance to that viewpoint
Zero out the nearest crop_point_num=512 points (the hole)
Those 512 cropped points become the real_center ground truth to regenerate

Crop-and-complete, no extra labels needed.

Multi-scale FPS encoder (MRE feature extraction)

The partial cloud is fed at 3 resolutions via FPS downsampling: point_scales_list = [2048, 1024, 512]. Each scale runs a PointNet-style Convlayer:

# Convlayer: per scale, independent
Conv2d 1 → 64 → 64 → 128 → 256 → 512 → 1024  # per-point MLP
max-pool                                        # symmetric aggregation (order-invariant)
# concat the global feature of all three scales → 1920-d latent

Max-pooling guarantees permutation invariance (point order doesn't matter); concatenating three scales yields a 1920-d global latent.

Pyramid / fractal decoder (hierarchical prediction)

The decoder _netG is the heart of PF-Net — a coarse-to-fine three-level pyramid, each level a residual refinement:

Level	Output	How
center1	64 pts	an FC head predicts a coarse skeleton straight from the latent
center2	128 pts	adds a residual on top of center1
fine	512 pts	adds another residual on top of center2 → fills the missing region

That is what "Fractal" means: generate the cloud hierarchically and self-similarly, coarse to fine.

Discriminator

_netlocalD is a local discriminator:

Conv2d 1 → 64 → 128 → 256
max-pool
FC 448 → 256 → 128 → 16 → 1

It judges whether the fine region is a real point cloud or generated, pushing the generator toward a more realistic surface.

Loss: Chamfer Distance + adversarial

Reconstruction quality uses a symmetric Chamfer Distance (PointLoss, ×100). The generator's total loss:

errG = (1 − wtl2)·BCE_adv + wtl2·errG_l2          # wtl2 = 0.95
errG_l2 = CD(fine, gt) + α₁·CD(center1, key1) + α₂·CD(center2, key2)

key1 / key2 are FPS-downsampled versions of the GT missing region, so all three levels are supervised. wtl2=0.95 means reconstruction dominates, adversarial assists; the per-level weights ramp with epoch.

Training config

batch_size      8
epochs          201
optimizer       Adam(lr=1e-4, betas=(0.9, 0.999), weight_decay=1e-3)
scheduler       StepLR(step_size=40, gamma=0.2)
D_choose        1     # toggles adversarial training on/off

What this signals (what it adds)

Unordered 3D set data: FPS downsampling, Chamfer Distance, permutation-invariant max-pooling — a data paradigm entirely unlike the portfolio's 2D / multimodal work
Encoder-decoder + GAN, beyond 2D: not classification / detection but structured generation
Multi-scale residual design: the coarse→fine pyramid, and why residual-per-level beats regressing 512 points directly
3D point-cloud lineage: connects to the course's PointNet → PointNet++ → PF-Net → RPMNet thread

Demo strategy

What the demo replays

The demo replays one completion: full cloud → FPS-crop the nearest 512 points along one of 5 viewpoints, leaving a hole → multi-scale FPS pyramid [2048,1024,512] → the pyramid decoder center1(64) → center2(128) → fine(512) fills the missing region level by level → a Chamfer Distance readout + an adversarial-loss toggle. The crop strategy, point_scales_list, the 1920-d latent, the three-level residual decoder, the errG/errG_l2 composite loss, and the hyperparameters all come from the course's 3D point-cloud source (read end to end). The course is HLS-streamed video with no subtitles, so the architecture was reconstructed from code + chapter titles (high confidence); weights are not shipped, the cloud shape / coords / CD values are illustrative, and no model runs in the browser — it only demonstrates the real crop → encode → hierarchical-completion mechanism.

Public preview can be enabled later without redesigning the case-study layout