Nvidia GPU Device Plugin
Name: nvidia-gpu
The Nvidia device plugin is used to expose Nvidia GPUs to Nomad. The Nvidia plugin is built into Nomad and does not need to be downloaded separately.
Fingerprinted Attributes
Attribute | Unit |
---|---|
memory | MiB |
power | W (Watt) |
bar1 | MiB |
driver_version | string |
cores_clock | MHz |
memory_clock | MHz |
pci_bandwidth | MB/s |
display_state | string |
persistence_mode | string |
Runtime Environment
The nvidia-gpu
device plugin exposes the following environment variables:
NVIDIA_VISIBLE_DEVICES
- List of Nvidia GPU IDs available to the task.
Additional Task Configurations
Additional environment variables can be set by the task to influence the runtime environment. See Nvidia's documentation.
Installation Requirements
In order to use the nvidia-gpu
the following prerequisites must be met:
- GNU/Linux x86_64 with kernel version > 3.10
- NVIDIA GPU with Architecture > Fermi (2.1)
- NVIDIA drivers >= 340.29 with binary
nvidia-smi
Docker Driver Requirements
In order to use the Nvidia driver plugin with the Docker driver, please follow
the installation instructions for
nvidia-docker
.
Plugin Configuration
The nvidia-gpu
device plugin supports the following configuration in the agent
config:
ignored_gpu_ids
(array<string>: [])
- Specifies the set of GPU UUIDs that should be ignored when fingerprinting.fingerprint_period
(string: "1m")
- The period in which to fingerprint for device changes.
Restrictions
The Nvidia integration only works with drivers who natively integrate with Nvidia's container runtime library.
Nomad has tested support with the docker
driver and plans to
bring support to the built-in exec
and java
drivers. Support for lxc
should be possible by installing the
Nvidia hook but is not
tested or documented by Nomad.
Examples
Inspect a node with a GPU:
Display detailed statistics on a node with a GPU:
Run the following example job to see that that the GPU was mounted in the container: