All Work Logs
Day 5
Work Done
- Trained and evaluated policies from full training runs.
- Plotted training curves from json log.
WIP
- Analyzing training results.
- Writing documentation.
TODO
- Complete documentation and wrap-up.
- Understanding
- Methodology section: diffusion model training, evaluation.
- Data preprocessing: how image and low-dimension tasks utilize data.
Day 4
Work Done
- Evaluated policies from full training runs.
- Completed dataset analysis notebook.
WIP
- Training CNN-based policy model on state-based dataset.
TODO
- Training
- Compare results between CNN and transformer-based policy models.
- Plot training curves from json log.
- Analyze training results. Compare with original paper.
- Documentation
- Methodology section: diffusion model training, evaluation.
- Data preprocessing: how image and low-dimension tasks utilize data.
Day 3
Work Done
- Dataset analysis: played with both versions of dataset in
data_analysis.ipynb
. diffusion_policy
- Successfully installed and imported as a package after being patched with necessary
__init__.py
files. - Understood success rate and metrics and how evaluation works.
- Successfully installed and imported as a package after being patched with necessary
WIP
- Still figuring out how this repository should look like. Feel like I'd only need some basic python scripts to launch the training and evaluation jobs. Then why bother installing the original repo?
TODO
- Look into colab notebooks.
- Training understanding
- Policy input/output.
- How is a diffusion model trained?
- Hyperparameters.
- Evaluation understanding: how does validation work?
- PushT environment: compare it to
lerobot/gym-pusht
. - Data preprocessing
- How image and low-dimension tasks utilize data.
Day 2
Work Done
Training with real-stanford/diffusion_policy
Ran two experiment setups using the real-stanford/diffusion_policy
repository.
- Transformer + state-based observations.
- UNet + image-based observations.
Both configurations were trained on two datasets, making a 2x2 matrix. Most default settings were adopted, except for the number of epochs and learning rate scheduler. The number fo epochs was set to 1000 for all cases in order to get a quick taste, and the learning rate scheduler is set to constant to make sure the model goes far enough.
Insert table
Analysis
Interestingly, models trained on dataset v1 both outperformed the ones trained on dataset v2. Dataset v2 is roughly double the size of v1, thus I initially thought scaling law would also work. Two potential reasons:
- Quality in data: maybe v2 yields lower quality?
- Larger dataset requires longer training time.
On the other hand, state-based model outperforms image-based model. I think this is expected since intuitively there definitely will be estimation errors when using vision.
WIP
- Still fighting with environment setup 😢.
- Currently maintaining two environments: one for
diffusion_policy
, the other for my repository.
- Currently maintaining two environments: one for
- Change of plan again.
- Don't want to fork
diffusion_policy
\(\to\) temporarily use git submodule instead for reproduction purposes. - Tried to install
diffusion_policy
withpip
orconda
but have no luck. The flat layout and the lack of__init__.py
prohibited me from importing it as a module. - Colab notebook is almost a self-contained training + evaluation code. Adopt that but instead structure it into a
tiny_dp
python submodule.
- Don't want to fork
TODO
- Look into colab notebooks.
- Training understanding
- Policy input/output.
- How is a diffusion model trained?
- Hyperparameters.
- Evaluation understanding:
- How do validation and test work?
- Definition of metrics: success rate, reward.
- PushT environment: compare it to
lerobot/gym-pusht
. - Data preprocessing
- How image and low-dimension tasks utilize data.
- Convert to lerobot-style dataset?
Day 1
Work Done
Test Evaluation Script and Environment
Ran the evaluation command from lerobot/diffusion_policy
.
python -m lerobot.scripts.eval --policy.path=lerobot/diffusion_pusht --output_dir ./output --env.type=pusht --eval.n_episodes=500 --eval.batch_size=50
And the results:
Mine | lerobot/diffusion_pusht | Paper | |
---|---|---|---|
Average max. overlap ratio | 0.962 | 0.955 | 0.957 |
Success rate for 500 episodes (%) | 64.2 | 65.4 | 64.2 |
Dataset Discovery
I opened up a jupyter notebook playground and fiddled with the data a little bit. Here's the structure of the data with zarr.open_group(path).tree()
:
/
├── data
│ ├── action (N, 2) float32
│ ├── img (N, 96, 96, 3) float32
│ ├── keypoint (N, 9, 2) float32
│ ├── n_contacts (N, 1) float32
│ └── state (N, 5) float32
└── meta
└── episode_ends (K,) int64
Initiallly, I compared it to the lerobot/pusht
dataset released by HuggingFace. However, the entries are so different that it's difficult to match them. I printed the arrays, displayed the images, trying to get a sense of what those values mean. Here's my attempt:
episode_ends
marks the ending scene/index of each episode. Use this to split the data into K rounds and label them with episode indices from 0 to K - 1.state
: I make my assumptions by simultaneously looking at the corresponding image.- The first two numbers are the position of the tooltip.
- 3rd & 4th are the positions of the T-shaped object.
- 5th looks like the orientation of the object in radian.
img
visualizes the current state (potentially given thekeypoint
s andstate
s).
At some point I came up with the idea that I should also check the dataset released by the authors of the original paper. BINGO!
real-stanford/diffusion_policy
Naturally, the next step is to discover the diffusion policy paper and code. Their README suggests running the notebook in colab, but I failed to open it due to some issues. I then turned to the example commands in the README.
I started with the low-dimension setup. Using the exact configuration from the paper, my results (0.944@750 and 0.948@950) seemed to match the authors' checkpoints. However, this is entirely based on the name of the checkpoint. Further investigation is required to determine whether this is a successful reproduction.
WIP
- Reproducing both image and low-dimension experiments.
- Naively training on custom v1 dataset with only swapping the dataset itself.
TODO
Focus on real-stanford/diffusion_policy
.
- Look into colab notebooks.
- Training understanding
- Policy input/output.
- How is a diffusion model trained?
- Hyperparameters.
- Evaluation understanding:
- How do validation and test work?
- Definition of metrics: success rate, reward.
- PushT environment: compare it to
lerobot/gym-pusht
. - Data preprocessing
- How image and low-dimension tasks utilize data.
- Convert to lerobot-style dataset?
- Setup local
wandb
? - Code cleanup and commit.
Random Notes
AttributeError: 'Space' object has no attribute 'add_collision_handler'
Looks like pymunk
removed the method after version 7.0. Running uv add 'pymunk<7'
solves the issue.
Environment Preparation for diffusion_policy
Setting up the environment wasn't the easiest. This is a two-year-old project. Huggingface libraries have been moving fast and not afraid of breaking things. Python's dependency management via conda
and pip
isn't the best1. All three factors lead to hours of fixing module/attribute not found errors and nonexistence of valid version combinations. Eventually, I had a fragile but working environment. Time for running some code!
Project start
Work Done
- Watched the youtube video of diffusion policy presentation. My takeaways:
- Human's vision reaction time is ~300ms. If the training data is collected by human, each action sequence/chunk should be at the same order of magnitudes to that1.
- Diffusion policy works well both in joint-space and action-space. However, working in action space requires a good IK.
- Created repository.
- Draft plan:
huggingface/lerobot
: start from the training and evaluation scripts there. Maybe reproducelerobot/diffusion_pusht
if feasible.- Per request, use
huggingface/gym-pusht
for simulation environment. - Maybe Material for MkDocs for documentation and report.
- Or maybe just the paper-style, good-old \(\LaTeX\).
uv
for package management? Not sure if this would work since most of the environment requiresconda
/mamba
for non-python dependencies.marimo
orjupyter notebook
for interactive sessions? Or use the jupyter notebook extension for mkdocs.
TODO
- Understand difference between DDPM and DDIM2.
- Fiddle with
lerobot/diffusion_pusht
.- Understand the workflow.
- Get a feeling of how resource-hungry are the training & evaluation scripts.
- Discover the custom pusht dataset.
- Perhaps read the paper?