Kubernetes v1.36 Beta: Adjusting Job Resources on the Fly for Suspended Workloads
Introduction
Kubernetes v1.36 elevates the ability to modify container resource requests and limits in the pod template of a suspended Job from alpha to beta. Initially introduced in v1.35, this feature empowers queue controllers and cluster administrators to tweak CPU, memory, GPU, and extended resource specifications on a Job while it remains suspended, before it begins or resumes execution. This capability addresses a critical gap in batch and machine learning workflows where resource demands are not always known at Job creation time.
Why Mutable Pod Resources for Suspended Jobs?
Batch and machine learning workloads often face fluctuating resource requirements that depend on current cluster capacity, queue priorities, and the availability of specialized hardware like GPUs. Before this feature, once a Job’s pod template was set, its resource fields were immutable. If a queue controller such as Kueue determined that a suspended Job should run with different resources, the only recourse was to delete and recreate the Job entirely. That approach meant losing metadata, status, and history—an expensive and disruptive process.
This new functionality offers a more graceful path: a specific Job instance triggered by a CronJob can progress with reduced resources rather than failing outright when the cluster is heavily loaded. It also allows queue controllers to optimize resource allocation dynamically, improving overall cluster utilization and Job success rates.
Example: Machine Learning Training Job
Consider a machine learning training Job that initially requests 4 GPUs:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
labels:
app.kubernetes.io/name: trainer
spec:
suspend: true
template:
metadata:
annotations:
kubernetes.io/description: "ML training, ID abcd123"
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
limits:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
restartPolicy: NeverA queue controller managing cluster resources might determine that only 2 GPUs are available. With this feature, the controller can update the Job’s resource requests before resuming it:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
labels:
app.kubernetes.io/name: trainer
spec:
suspend: true
template:
metadata:
annotations:
kubernetes.io/description: "ML training, ID abcd123"
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "4"
memory: "16Gi"
example-hardware-vendor.com/gpu: "2"
limits:
cpu: "4"
memory: "16Gi"
example-hardware-vendor.com/gpu: "2"
restartPolicy: NeverAfter the resources are updated, the controller resumes the Job by setting spec.suspend to false, and the new Pods are created with the adjusted resource specifications. This process avoids deletion and preserves all associated metadata and history.
How It Works
The Kubernetes API server relaxes the immutability constraint on pod template resource fields specifically for Jobs that are suspended. No new API types are introduced; the existing Job and pod template structures accommodate the change through a controlled relaxation of validation rules. The feature is enabled by default in v1.36 as a beta feature, meaning cluster operators can rely on it without needing to explicitly enable a feature gate.
Key technical aspects include:
- Resource field mutability is allowed only when
spec.suspendis true. - Changes apply to container-level resource requests and limits, including extended resources.
- The controller or user must modify the Job object and set the new pod template resources; the API server validates the changes.
- When the Job is resumed (suspend set to false), the new pod template is used to create Pods.
Use Cases for Mutable Resources
- Queue Controllers: Kueue and similar controllers can adjust resources based on cluster availability and job priorities, reducing the need for preemption or job rejection.
- CronJob Adaptability: A CronJob-driven Job can downgrade its resource footprint during periods of high cluster load, ensuring it still runs (albeit slower) rather than failing.
- Cost Optimization: Administrators can delay resource-intensive Jobs until cheaper or more abundant compute becomes available, then adjust resources accordingly before resumption.
Benefits and Limitations
This feature provides significant operational flexibility for batch and ML workloads. However, it comes with some important considerations:
- Scope: Only Jobs with
spec.suspend: truecan have their pod template resources modified. Active, running Jobs remain immutable for resource changes. - Metadata preservation: Unlike the delete-and-recreate approach, all Job metadata (labels, annotations, status) is retained.
- Security: Only users or controllers with update permission on the Job can modify resources, maintaining existing access controls.
Getting Started
To use this feature, you need a Kubernetes cluster running v1.36 or later. The feature is enabled by default. You can suspend a Job by setting spec.suspend: true, update the pod template’s resources section, and then resume the Job. For queue controllers, integrate with the Kubernetes API to watch suspended Jobs and apply resource modifications programmatically.
For more details, refer to the official Kubernetes documentation on job suspension and resource management for containers.
Conclusion
The promotion of mutable pod resources for suspended Jobs to beta in Kubernetes v1.36 marks a meaningful step toward more intelligent and resource-efficient batch processing. By allowing on-the-fly adjustments without data loss, it strengthens the platform’s suitability for dynamic, large-scale workloads. As Kubernetes continues to evolve, features like this underscore the commitment to providing flexible, observable, and adaptable scheduling mechanisms.
Related Articles
- Reinforcement Learning Beyond Temporal Difference: A Divide-and-Conquer Approach
- Your Complete Roadmap to IT Fundamentals: From Zero to Confident Explorer
- New 'Design Organism' Framework Ends Design Manager vs Lead Designer Conflict
- 7 Ways AI Is Transforming Database Management (Without Replacing Your DBA)
- Break Down Org Chart Silos: Why Design Managers and Lead Designers Must Embrace Overlap, Experts Say
- AWS Unveils Game-Changing AI Agents and More at What’s Next Event 2026
- Mastering Markdown on GitHub: A Beginner's Q&A Guide
- 8 Key Insights on Leveraging AI for Database Management