Preface
This is a series of notes during my preparation for a DevOps interview. I asked Gemini a bunch of questions back and forth based on my understanding and relative background. I am the one who edits and produces the end result. However, I haven’t fully verify the correctness of all the details and thus the content can be misleading.
Python
- Scope: manage python packages under
site-packages/. - Tools:
pipwithrequirements.txtorpyrpoject.toml.uvwithuv.lock.
- Limitation: only manages python packages. Since python is considered as a glue language and a lot of the underlying code is in other languages, e.g., C or CUDA, managing python dependencies does not guarantee bit-by-bit perfection.
Example Failure
- The Scenario: You develop a script that uses
opencv-python. On your machine (Ubuntu), it runs perfectly. You push it to a minimal Alpine Linux Docker container.- The Error:
ImportError: libGL.so.1: cannot open shared object file: No such file or directory.- Why it happened:
uvensured you had theopencvPython package. However,opencvrequires a system-level C++ library (libGL) to render images.uvdoesn’t manage OS libraries, so the code crashes because the new machine is “too lean.”
Conda
- Scope: manages both python packages and layer 2 (C-libraries).
- Can install
cuda-toolkitorlibgccwithout requiringsudo.
- Can install
- Tools:
conda/minicondamamba/micromamba
- Limitation: borrows the core C library (
glibc) and the dynamic loader from the host OS.
Example Failure
- The Scenario: You use a high-end workstation with the latest Ubuntu 24.04. You create a Conda environment and successfully install the latest
PyTorchandCUDAtoolkit. You then copy that environment to an old production server running CentOS 7.- The Error:
/lib64/libc.so.6: version 'GLIBC_2.28' not found.- Why it happened: Conda installs almost everything, but it borrows the core C library (
glibc) from the Host OS. Ubuntu 24.04 has a very newglibc, but CentOS 7 has a very old one. The binaries inside Conda are “too modern” for the old OS “heart.”
Nix
NOTE
I included Nix because I was interested in Nix and NixOS and the declarative approach. However, my personal experience of running NixOS on my laptop is both interesting and suffering. Skill issue I suppose.
- A declarative, functional tool to define inputs (dependencies) and build immutable, identical environments.
- Scope: above the Linux kernel, alongside the filesystem.
- It mostly ignores the standard paths, however. It uses its own packages from
/nix/store.
- It mostly ignores the standard paths, however. It uses its own packages from
- Tools:
nix-shellnix-developflake.nixwithflake.lock
- Limitation
- Non FHS-compliant: doesn’t utilize standard or hard-coded paths. Precompiled binaries and global states won’t work. Everything needs to be explicitly declared.
- Steep learning curve & ecosystem barrier. Not the industry standard.
- Still dependent on the host Linux kernel. If the code uses some brand-new feature of Linux kernel, the program running on an older one will crash.
Example Failure
- The Scenario: You use Nix to build a perfect, reproducible environment. You need to use a proprietary binary tool (pre-compiled) to quantize your model. You add it to your Nix shell.
- The Error:
bash: ./binary-tool: No such file or directory(even though the file is clearly there!).- Why it happened: The binary is hard-coded to look for a loader at
/lib64/ld-linux-x86-64.so.2. Because Nix is not FHS-compliant, that path doesn’t exist. Nix is so “pure” that it breaks “dirty” software that expects a standard Linux layout.
Container
- Imperatively builds the environment. Takes snapshots of every layer.
- Scope: above the Linux kernel.
- Compared to Nix, containers bring their own isolated filesystem.
- Tools:
dockerpodman
- Limitation
- Container still depend on the host Linux Kernel and hardware drivers.
- Since it captures the entire OS, containers can take more space than Nix environments.
- Container builds are not always consistent, e.g.
apt updatemight lead to differences between images built in different times.
Example Failure
- The Scenario: You build a Docker image that includes the NVIDIA CUDA drivers and your ML code. It works on your laptop. You deploy it to a server that has a different version of the physical NVIDIA kernel driver installed on the host.
- The Error:
CUDA error: unknown errororNo CUDA-capable device is detected.- Why it happened: Docker isolates the filesystem, but it must share the Linux Kernel and the GPU Driver with the host. If the library inside the container version is incompatible with the driver version on the host’s metal, the “box” cannot talk to the “hardware.”
Virtual Machine
- Hosts the OS, core libraries (
glibc), C libraries, and language-specific packages. - Sits on top of the hypervisor. The hypervisor acts as an abstract layer of the underlying hardware and exposes a generic virtual hardware to the VM.
- Scope: almost everything.
- Tools:
- VirtualBox
- VMWare
- Hyper-V
- Limitation: resource-intensive, higher overhead, lower portability (?)
Example Failure
- The Scenario: You create a VM on your modern Intel laptop. You enable AVX-512 (advanced math instructions) to speed up your ML code. You ship the entire VM image to an older server with a CPU that doesn’t support AVX-512.
- The Error:
Illegal instruction (core dumped).- Why it happened: A VM replicates the OS and the Kernel, but it still passes instructions to the physical CPU. If the code is compiled for a specific hardware feature (Instruction Set) that doesn’t exist on the target CPU, it will crash. Even a VM cannot “emulate” missing physical transistors without a massive performance hit.
Comparisons
Different Layers of Environment/Dependency Management
| Layer | Component | Managed by |
|---|---|---|
| 3. Language | Python, PyTorch, NumPy | pip, uv, conda |
| 2. Distribution | CUDA, C++ libraires (BLAS), Compilers | conda, nix |
| 1. OS Core | Kernel, glibc (Core C library), Drivers | Docker, NixOS |
| Component | Standard App | Nix Flake | Container | VM |
|---|---|---|---|---|
| Libraries (glibc) | Host | Brings its own | Brings its own | Brings its own |
File system (/) | Host | Host | Brings its own | Brings its own |
| Kernel | Host | Host | Shared host kernel | Brings its own |
| Layer | Component | Standard App | Nix Flake | Container (Docker) | Virtual Machine |
|---|---|---|---|---|---|
| 3. Application | Python, PyTorch, NumPy | Host / pip | Brings its own | Brings its own | Brings its own |
| 2. Distribution | C++ / CUDA / BLAS | Host / apt | Brings its own | Brings its own | Brings its own |
| 1.5 Core Libs | glibc (The C Library) | Host | Brings its own | Brings its own | Brings its own |
| 1. OS Core | Root FS (/etc, /var) | Host | Host | Brings its own | Brings its own |
| 0.5 Hardware | Kernel Drivers (NVIDIA) | Host | Host | Shared Host | Brings its own* |
| 0. Kernel | The Linux Kernel | Host | Host | Shared Host | Brings its own |
Comparison between Environment Management Technologies
| Technology | Isolation Level | Primary Focus | Shared w/ Host | Reproducibility | Where it Fails |
|---|---|---|---|---|---|
| uv/pip | Language | Python versions & packages | Everything except python site-packages | Medium, depends on host OS/C-libs. | System libraries (.so files) |
| Conda | User-space | Python + non-Python binary dependencies (CUDA, C++) | Host kernel and core C-library (glibc) | High, but fails across vastly different OS versions. | Core OS libraries (glibc) |
| Nix | Logical/Pure | Entire dependency graph | Host kernel | Extreme, highest logical guarantee | Standard paths (precompiled binaries) |
| Container | OS Filesystem | The entire root filesystem (user-space) | Host kernel only | Very high, the industry standard for portability | Kernel version & hardware drivers |
| VM | Hardware | The entire machine, including the OS kernel | Physical hardware via Hypervisor | Total, can run a different kernel entirely | Physical CPU instructions |
Container vs. Nix
| Feature | Container | Nix |
|---|---|---|
| Philosophy | Snapshotting | Functional |
| Storage | Layers: a stack of filesystem changes | Store: a flat collection of unique, hashed packages |
| Mutation | Mutable during build | Immutable |
| Compatibility | High: runs almost any Linux software out-of-the-box | Low due to not FHS compliant |
| Efficiency | Medium: images include a full OS | High: only installs exactly what’s needed |