Skip to content

[Meta] Improve kubernetes backend #3126

@un-def

Description

@un-def

Essential:

  • Request resources according to the dstack configuration
  • Multi-node support (distributed tasks running on fleets with cluster placement)

Strategic:

  • AMD GPUs support
  • Allow to configure multiple clusters per backend (e.g. per region)
  • Auto-scaling support (ideally, find a way to support it for any clouds)

Improvements:

  • Update the jump pod: use a lightweight image, restrict SSH access (see TODOs in _create_jump_pod_service)
  • Test and update (if required) the gateway functionality on managed/self-hosted Kubernetes other than EKS (see TODO in KubernetesCompute.create_gateway)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions