I’ll attempt to summarize my professional background, experiences, and what I do for a living. For an even denser format, please refer to my resume.

Education

  • I studied mechanical engineering during my bachelors at National Taiwan University between 2014 and 2018.
    • I had an interest on control systems because controlling robots sounds cool, the math is beautiful, and I wasn’t smart enough to understand fluid dynamics 😵‍💫
  • After that, I pursued my masters in robotics at the University of Michigan, Ann Arbor, and began to pivot towards programming and machine learning.

Work

My first and only job is at an IC design house in Taiwan. We design neural network accelerators for vision and language tasks.

My title was system software engineer in the algorithm department. My projects can be summarized as neural network compression, with the goal of reducing the memory footprint and accelerating throughput simultaneously. Therefore, my responsibility is more of an overlap between:

  • Research scientist: research and verify state-of-the-art compression algorithms, including pruning, quantization, and knowledge distillation, both post-training and training-aware. This includes
    • Literature review
    • Prototype and implement novel algorithms
    • Design and execute experiments
    • Providing recipes
  • Software engineer: integrating the algorithms into company’s toolchain. This includes
    • Library design and development
    • CI/CD
    • Documentation

My work serves as the first and foremost stage of model deployment. They operate on the python/machine learning framework level, e.g. PyTorch, TensorFlow. It can be described as

  • Input: model architecture and floating-point model weights in its training framework’s format.
  • Target: some compression ratio.
  • Variables:
    • Architecture: modify kernels and operators by replacing/fusing them.
    • Weights:
      • Pruning: set some portion of weights to 0.
      • Quantization: represent weight values in lower-bit formats, e.g. int4, fp4, newer ones like mxfp4.
  • Metric: minimize accuracy loss and preserve model behavior. The metric dependes on the model type and the target application.
    • Image classification tasks: accuracy, precision, recall, F1.
    • Language modeling: zero-shot benchmark scores.
    • Model behavior: KL divergence.
  • Output: an optimized architecture and compressed model weights, still compatible with the original training framework.

After this initial compression stage, the optimized model will be handed over to the deployment toolchain, where it’ll be parsed and compiled into lower-level operations.