Training#
HyperPodPytorchJob#
- class sagemaker.hyperpod.training.hyperpod_pytorch_job.HyperPodPytorchJob[source]#
Bases:
_HyperPodPytorchJobHyperPod PyTorch job for distributed training on Amazon SageMaker HyperPod clusters.
This class provides methods to create, manage, and monitor PyTorch training jobs on SageMaker HyperPod clusters orchestrated by Amazon EKS.
- create(debug=False)[source]#
Create and submit the HyperPod PyTorch job to the Kubernetes cluster.
Parameters:
Parameter
Type
Description
debug
bool, optional
Enable debug logging. Defaults to False.
Raises:
Exception: If the job creation fails or Kubernetes API call fails
Usage Examples
>>> job = HyperPodPytorchJob(metadata=Metadata(name="my-job"), ...) >>> job.create() >>> >>> # Create with debug logging >>> job.create(debug=True)
- classmethod list(namespace=None) List[HyperPodPytorchJob][source]#
List all HyperPod PyTorch jobs in the specified namespace.
Parameters:
Parameter
Type
Description
namespace
str, optional
The Kubernetes namespace to list jobs from. If None, uses the default namespace from current context.
Returns:
List[HyperPodPytorchJob]: List of HyperPodPytorchJob instances found in the namespace
Raises:
Exception: If the Kubernetes API call fails or jobs cannot be retrieved
Notes
This method requires a valid kubeconfig to be available and will automatically load it if not already loaded.
Usage Examples
>>> jobs = HyperPodPytorchJob.list() >>> print(f"Found {len(jobs)} jobs") >>> >>> # List jobs in specific namespace >>> jobs = HyperPodPytorchJob.list(namespace="my-namespace")
- delete()[source]#
Delete the HyperPod PyTorch job from the Kubernetes cluster.
Raises:
Exception: If the job deletion fails or Kubernetes API call fails
Usage Examples
>>> job = HyperPodPytorchJob.get("my-job") >>> job.delete()
- exec_command(command: List[str], pod: str | None = None, all_pods: bool = False, container: str | None = None)[source]#
Execute a command in one or all pods associated with this job.
- classmethod get(name, namespace=None) HyperPodPytorchJob[source]#
Get a specific HyperPod PyTorch job by name.
Parameters:
Parameter
Type
Description
name
str
The name of the HyperPod PyTorch job to retrieve
namespace
str, optional
The Kubernetes namespace to search in. If None, uses the default namespace from current context.
Returns:
HyperPodPytorchJob: The requested HyperPod PyTorch job instance
Raises:
Exception: If the job is not found or Kubernetes API call fails
Usage Examples
>>> job = HyperPodPytorchJob.get("my-job") >>> print(job.metadata.name) >>> >>> # Get job from specific namespace >>> job = HyperPodPytorchJob.get("my-job", namespace="my-namespace")
- refresh() HyperPodPytorchJob[source]#
Refresh the job status by fetching the latest state from the Kubernetes cluster.
Returns:
HyperPodPytorchJob: The updated job instance with refreshed status
Raises:
Exception: If the refresh operation fails or Kubernetes API call fails
Usage Examples
>>> job = HyperPodPytorchJob.get("my-job") >>> updated_job = job.refresh() >>> print(updated_job.status)
- list_pods() List[str][source]#
List all pods associated with this HyperPod PyTorch job.
Returns:
List[str]: List of pod names associated with this job
Raises:
Exception: If listing pods fails or Kubernetes API call fails
Usage Examples
>>> job = HyperPodPytorchJob.get("my-job") >>> pods = job.list_pods() >>> print(f"Found {len(pods)} pods: {pods}")
- get_logs_from_pod(pod_name: str, container: str | None = None) str[source]#
Get logs from a specific pod associated with this HyperPod PyTorch job.
Parameters:
Parameter
Type
Description
pod_name
str
The name of the pod to get logs from
container
str, optional
The container name within the pod. If None, uses the first container.
Returns:
str: The log output from the specified pod and container
Raises:
Exception: If getting logs fails or Kubernetes API call fails
Usage Examples
>>> job = HyperPodPytorchJob.get("my-job") >>> pods = job.list_pods() >>> logs = job.get_logs_from_pod(pods[0]) >>> print(logs) >>> >>> # Get logs from specific container >>> logs = job.get_logs_from_pod(pods[0], container="pytorch")
HyperPodPytorchJob Configs#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Conditions[source]#
Bases:
BaseModelJobCondition describes current state of a job.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.JobPods[source]#
Bases:
BaseModelObjectReference contains enough information to let you inspect or modify the referred object.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ManagerPods[source]#
Bases:
BaseModelPod Manager pods
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PodManagerStatuses[source]#
Bases:
BaseModelObjectReference contains enough information to let you inspect or modify the referred object.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Tolerations[source]#
Bases:
BaseModelThe pod this Toleration is attached to tolerates any taint that matches the triple <key,value,effect> using the matching operator <operator>.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PodSetInfo[source]#
Bases:
BaseModelDEPRECATED podSetInfo to include pod set information provided by Kueue in podSetInfos PodSetInformation assigned to the HyperPodPytorchJob’s PodSet by Kueue podSetInfo is retained here to support operator upgrade
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- tolerations: List[Tolerations] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PodSetInfos[source]#
Bases:
BaseModelPodSetInformation contains the data that Kueue wants to inject into an admitted PodSpecTemplate
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- tolerations: List[Tolerations] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Metadata[source]#
Bases:
BaseModelStandard object’s metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.MatchExpressions[source]#
Bases:
BaseModelA node selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.MatchFields[source]#
Bases:
BaseModelA node selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Preference[source]#
Bases:
BaseModelA node selector term, associated with the corresponding weight.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- matchExpressions: List[MatchExpressions] | None#
- matchFields: List[MatchFields] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PreferredDuringSchedulingIgnoredDuringExecution[source]#
Bases:
BaseModelAn empty preferred scheduling term matches all objects with implicit weight 0 (i.e. it’s a no-op). A null preferred scheduling term matches no objects (i.e. is also a no-op).
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- preference: Preference#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.NodeSelectorTerms[source]#
Bases:
BaseModelA null or empty node selector term matches no objects. The requirements of them are ANDed. The TopologySelectorTerm type implements a subset of the NodeSelectorTerm.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- matchExpressions: List[MatchExpressions] | None#
- matchFields: List[MatchFields] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.RequiredDuringSchedulingIgnoredDuringExecution[source]#
Bases:
BaseModelIf the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to an update), the system may or may not try to eventually evict the pod from its node.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- nodeSelectorTerms: List[NodeSelectorTerms]#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.NodeAffinity[source]#
Bases:
BaseModelDescribes node affinity scheduling rules for the pod.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- preferredDuringSchedulingIgnoredDuringExecution: List[PreferredDuringSchedulingIgnoredDuringExecution] | None#
- requiredDuringSchedulingIgnoredDuringExecution: RequiredDuringSchedulingIgnoredDuringExecution | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PodAffinity[source]#
Bases:
BaseModelDescribes pod affinity scheduling rules (e.g. co-locate this pod in the same node, zone, etc. as some other pod(s)).
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- preferredDuringSchedulingIgnoredDuringExecution: List[PreferredDuringSchedulingIgnoredDuringExecution] | None#
- requiredDuringSchedulingIgnoredDuringExecution: List[RequiredDuringSchedulingIgnoredDuringExecution] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PodAntiAffinity[source]#
Bases:
BaseModelDescribes pod anti-affinity scheduling rules (e.g. avoid putting this pod in the same node, zone, etc. as some other pod(s)).
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- preferredDuringSchedulingIgnoredDuringExecution: List[PreferredDuringSchedulingIgnoredDuringExecution] | None#
- requiredDuringSchedulingIgnoredDuringExecution: List[RequiredDuringSchedulingIgnoredDuringExecution] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Affinity[source]#
Bases:
BaseModelIf specified, the pod’s scheduling constraints
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- nodeAffinity: NodeAffinity | None#
- podAffinity: PodAffinity | None#
- podAntiAffinity: PodAntiAffinity | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ConfigMapKeyRef[source]#
Bases:
BaseModelSelects a key of a ConfigMap.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.FieldRef[source]#
Bases:
BaseModelSelects a field of the pod: supports metadata.name, metadata.namespace,
metadata.labels['<KEY>'],metadata.annotations['<KEY>'], spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ResourceFieldRef[source]#
Bases:
BaseModelSelects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.SecretKeyRef[source]#
Bases:
BaseModelSelects a key of a secret in the pod’s namespace
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ValueFrom[source]#
Bases:
BaseModelSource for the environment variable’s value. Cannot be used if value is not empty.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- configMapKeyRef: ConfigMapKeyRef | None#
- resourceFieldRef: ResourceFieldRef | None#
- secretKeyRef: SecretKeyRef | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Env[source]#
Bases:
BaseModelEnvVar represents an environment variable present in a Container.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ConfigMapRef[source]#
Bases:
BaseModelThe ConfigMap to select from
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.SecretRef[source]#
Bases:
BaseModelThe Secret to select from
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.EnvFrom[source]#
Bases:
BaseModelEnvFromSource represents the source of a set of ConfigMaps
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- configMapRef: ConfigMapRef | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Exec[source]#
Bases:
BaseModelExec specifies the action to take.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.HttpHeaders[source]#
Bases:
BaseModelHTTPHeader describes a custom header to be used in HTTP probes
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.HttpGet[source]#
Bases:
BaseModelHTTPGet specifies the http request to perform.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- httpHeaders: List[HttpHeaders] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Sleep[source]#
Bases:
BaseModelSleep represents the duration that the container should sleep before being terminated.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.TcpSocket[source]#
Bases:
BaseModelDeprecated. TCPSocket is NOT supported as a LifecycleHandler and kept for the backward compatibility. There are no validation of this field and lifecycle hooks will fail in runtime when tcp handler is specified.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PostStart[source]#
Bases:
BaseModelPostStart is called immediately after a container is created. If the handler fails, the container is terminated and restarted according to its restart policy. Other management of the container blocks until the hook completes. More info: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PreStop[source]#
Bases:
BaseModelPreStop is called immediately before a container is terminated due to an API request or management event such as liveness/startup probe failure, preemption, resource contention, etc. The handler is not called if the container crashes or exits. The Pod’s termination grace period countdown begins before the PreStop hook is executed. Regardless of the outcome of the handler, the container will eventually terminate within the Pod’s termination grace period (unless delayed by finalizers). Other management of the container blocks until the hook completes or until the termination grace period is reached. More info: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Lifecycle[source]#
Bases:
BaseModelActions that the management system should take in response to container lifecycle events. Cannot be updated.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Grpc[source]#
Bases:
BaseModelGRPC specifies an action involving a GRPC port.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.LivenessProbe[source]#
Bases:
BaseModelPeriodic probe of container liveness. Container will be restarted if the probe fails. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Ports[source]#
Bases:
BaseModelContainerPort represents a network port in a single container.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ReadinessProbe[source]#
Bases:
BaseModelPeriodic probe of container service readiness. Container will be removed from service endpoints if the probe fails. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ResizePolicy[source]#
Bases:
BaseModelContainerResizePolicy represents resource resize policy for the container.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Claims[source]#
Bases:
BaseModelResourceClaim references one entry in PodSpec.ResourceClaims.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Resources[source]#
Bases:
BaseModelCompute Resources required by this container. Cannot be updated. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.AppArmorProfile[source]#
Bases:
BaseModelappArmorProfile is the AppArmor options to use by this container. If set, this profile overrides the pod’s appArmorProfile. Note that this field cannot be set when spec.os.name is windows.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Capabilities[source]#
Bases:
BaseModelThe capabilities to add/drop when running containers. Defaults to the default set of capabilities granted by the container runtime. Note that this field cannot be set when spec.os.name is windows.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.SeLinuxOptions[source]#
Bases:
BaseModelThe SELinux context to be applied to the container. If unspecified, the container runtime will allocate a random SELinux context for each container. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. Note that this field cannot be set when spec.os.name is windows.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.SeccompProfile[source]#
Bases:
BaseModelThe seccomp options to use by this container. If seccomp options are provided at both the pod & container level, the container options override the pod options. Note that this field cannot be set when spec.os.name is windows.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.WindowsOptions[source]#
Bases:
BaseModelThe Windows specific settings applied to all containers. If unspecified, the options from the PodSecurityContext will be used. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. Note that this field cannot be set when spec.os.name is linux.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.SecurityContext[source]#
Bases:
BaseModelSecurityContext defines the security options the container should be run with. If set, the fields of SecurityContext override the equivalent fields of PodSecurityContext. More info: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- appArmorProfile: AppArmorProfile | None#
- capabilities: Capabilities | None#
- seLinuxOptions: SeLinuxOptions | None#
- seccompProfile: SeccompProfile | None#
- windowsOptions: WindowsOptions | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.StartupProbe[source]#
Bases:
BaseModelStartupProbe indicates that the Pod has successfully initialized. If specified, no other probes are executed until this completes successfully. If this probe fails, the Pod will be restarted, just as if the livenessProbe failed. This can be used to provide different probe parameters at the beginning of a Pod’s lifecycle, when it might take a long time to load data or warm a cache, than during steady-state operation. This cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.VolumeDevices[source]#
Bases:
BaseModelvolumeDevice describes a mapping of a raw block device within a container.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.VolumeMounts[source]#
Bases:
BaseModelVolumeMount describes a mounting of a Volume within a container.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Containers[source]#
Bases:
BaseModelA single application container that you want to run within a pod.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- livenessProbe: LivenessProbe | None#
- readinessProbe: ReadinessProbe | None#
- resizePolicy: List[ResizePolicy] | None#
- securityContext: SecurityContext | None#
- startupProbe: StartupProbe | None#
- volumeDevices: List[VolumeDevices] | None#
- volumeMounts: List[VolumeMounts] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Options[source]#
Bases:
BaseModelPodDNSConfigOption defines DNS resolver options of a pod.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.DnsConfig[source]#
Bases:
BaseModelSpecifies the DNS parameters of a pod. Parameters specified here will be merged to the generated DNS configuration based on DNSPolicy.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.EphemeralContainers[source]#
Bases:
BaseModelAn EphemeralContainer is a temporary container that you may add to an existing Pod for user-initiated activities such as debugging. Ephemeral containers have no resource or scheduling guarantees, and they will not be restarted when they exit or when a Pod is removed or restarted. The kubelet may evict a Pod if an ephemeral container causes the Pod to exceed its resource allocation. To add an ephemeral container, use the ephemeralcontainers subresource of an existing Pod. Ephemeral containers may not be removed or restarted.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- livenessProbe: LivenessProbe | None#
- readinessProbe: ReadinessProbe | None#
- resizePolicy: List[ResizePolicy] | None#
- securityContext: SecurityContext | None#
- startupProbe: StartupProbe | None#
- volumeDevices: List[VolumeDevices] | None#
- volumeMounts: List[VolumeMounts] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.HostAliases[source]#
Bases:
BaseModelHostAlias holds the mapping between IP and hostnames that will be injected as an entry in the pod’s hosts file.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ImagePullSecrets[source]#
Bases:
BaseModelLocalObjectReference contains enough information to let you locate the referenced object inside the same namespace.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.InitContainers[source]#
Bases:
BaseModelA single application container that you want to run within a pod.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- livenessProbe: LivenessProbe | None#
- readinessProbe: ReadinessProbe | None#
- resizePolicy: List[ResizePolicy] | None#
- securityContext: SecurityContext | None#
- startupProbe: StartupProbe | None#
- volumeDevices: List[VolumeDevices] | None#
- volumeMounts: List[VolumeMounts] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Os[source]#
Bases:
BaseModelSpecifies the OS of the containers in the pod. Some pod and container fields are restricted if this is set. If the OS field is set to linux, the following fields must be unset: -securityContext.windowsOptions If the OS field is set to windows, following fields must be unset: - spec.hostPID - spec.hostIPC - spec.hostUsers - spec.securityContext.appArmorProfile - spec.securityContext.seLinuxOptions - spec.securityContext.seccompProfile - spec.securityContext.fsGroup - spec.securityContext.fsGroupChangePolicy - spec.securityContext.sysctls - spec.shareProcessNamespace - spec.securityContext.runAsUser - spec.securityContext.runAsGroup - spec.securityContext.supplementalGroups - spec.securityContext.supplementalGroupsPolicy - spec.containers[*].securityContext.appArmorProfile - spec.containers[*].securityContext.seLinuxOptions - spec.containers[*].securityContext.seccompProfile - spec.containers[*].securityContext.capabilities - spec.containers[*].securityContext.readOnlyRootFilesystem - spec.containers[*].securityContext.privileged - spec.containers[*].securityContext.allowPrivilegeEscalation - spec.containers[*].securityContext.procMount - spec.containers[*].securityContext.runAsUser - spec.containers[*].securityContext.runAsGroup
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ReadinessGates[source]#
Bases:
BaseModelPodReadinessGate contains the reference to a pod condition
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ResourceClaims[source]#
Bases:
BaseModelPodResourceClaim references exactly one ResourceClaim, either directly or by naming a ResourceClaimTemplate which is then turned into a ResourceClaim for the pod. It adds a name to it that uniquely identifies the ResourceClaim inside the Pod. Containers that need access to the ResourceClaim reference it with this name.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.SchedulingGates[source]#
Bases:
BaseModelPodSchedulingGate is associated to a Pod to guard its scheduling.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.LabelSelector[source]#
Bases:
BaseModelLabelSelector is used to find matching pods. Pods that match this label selector are counted to determine the number of pods in their corresponding topology domain.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- matchExpressions: List[MatchExpressions] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.NamespaceSelector[source]#
Bases:
BaseModelA label query over the set of namespaces that the term applies to. The term is applied to the union of the namespaces selected by this field and the ones listed in the namespaces field. null selector and null or empty namespaces list means “this pod’s namespace”. An empty selector ({}) matches all namespaces.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- matchExpressions: List[MatchExpressions] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.TopologySpreadConstraints[source]#
Bases:
BaseModelTopologySpreadConstraint specifies how to spread matching pods among the given topology.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- labelSelector: LabelSelector | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.AwsElasticBlockStore[source]#
Bases:
BaseModelawsElasticBlockStore represents an AWS Disk resource that is attached to a kubelet’s host machine and then exposed to the pod. More info: https://kubernetes.io/docs/concepts/storage/volumes#awselasticblockstore
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.AzureDisk[source]#
Bases:
BaseModelazureDisk represents an Azure Data Disk mount on the host and bind mount to the pod.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.AzureFile[source]#
Bases:
BaseModelazureFile represents an Azure File Service mount on the host and bind mount to the pod.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Cephfs[source]#
Bases:
BaseModelcephFS represents a Ceph FS mount on the host that shares a pod’s lifetime
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Cinder[source]#
Bases:
BaseModelcinder represents a cinder volume attached and mounted on kubelets host machine. More info: https://examples.k8s.io/mysql-cinder-pd/README.md
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Items[source]#
Bases:
BaseModelMaps a string key to a path within a volume.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ConfigMap[source]#
Bases:
BaseModelconfigMap represents a configMap that should populate this volume
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.NodePublishSecretRef[source]#
Bases:
BaseModelnodePublishSecretRef is a reference to the secret object containing sensitive information to pass to the CSI driver to complete the CSI NodePublishVolume and NodeUnpublishVolume calls. This field is optional, and may be empty if no secret is required. If the secret object contains more than one secret, all secret references are passed.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Csi[source]#
Bases:
BaseModelcsi (Container Storage Interface) represents ephemeral storage that is handled by certain external CSI drivers (Beta feature).
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- nodePublishSecretRef: NodePublishSecretRef | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.DownwardApi[source]#
Bases:
BaseModeldownwardAPI represents downward API about the pod that should populate this volume
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.EmptyDir[source]#
Bases:
BaseModelemptyDir represents a temporary directory that shares a pod’s lifetime. More info: https://kubernetes.io/docs/concepts/storage/volumes#emptydir
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.DataSource[source]#
Bases:
BaseModeldataSource field can be used to specify either: * An existing VolumeSnapshot object (snapshot.storage.k8s.io/VolumeSnapshot) * An existing PVC (PersistentVolumeClaim) If the provisioner or an external controller can support the specified data source, it will create a new volume based on the contents of the specified data source. When the AnyVolumeDataSource feature gate is enabled, dataSource contents will be copied to dataSourceRef, and dataSourceRef contents will be copied to dataSource when dataSourceRef.namespace is not specified. If the namespace is specified, then dataSourceRef will not be copied to dataSource.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.DataSourceRef[source]#
Bases:
BaseModeldataSourceRef specifies the object from which to populate the volume with data, if a non-empty volume is desired. This may be any object from a non-empty API group (non core object) or a PersistentVolumeClaim object. When this field is specified, volume binding will only succeed if the type of the specified object matches some installed volume populator or dynamic provisioner. This field will replace the functionality of the dataSource field and as such if both fields are non-empty, they must have the same value. For backwards compatibility, when namespace isn’t specified in dataSourceRef, both fields (dataSource and dataSourceRef) will be set to the same value automatically if one of them is empty and the other is non-empty. When namespace is specified in dataSourceRef, dataSource isn’t set to the same value and must be empty. There are three important differences between dataSource and dataSourceRef: * While dataSource only allows two specific types of objects, dataSourceRef allows any non-core object, as well as PersistentVolumeClaim objects. * While dataSource ignores disallowed values (dropping them), dataSourceRef preserves all values, and generates an error if a disallowed value is specified. * While dataSource only allows local objects, dataSourceRef allows objects in any namespaces. (Beta) Using this field requires the AnyVolumeDataSource feature gate to be enabled. (Alpha) Using the namespace field of dataSourceRef requires the CrossNamespaceVolumeDataSource feature gate to be enabled.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Selector[source]#
Bases:
BaseModelselector is a label query over volumes to consider for binding.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- matchExpressions: List[MatchExpressions] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.VolumeClaimTemplateSpec[source]#
Bases:
BaseModelThe specification for the PersistentVolumeClaim. The entire content is copied unchanged into the PVC that gets created from this template. The same fields as in a PersistentVolumeClaim are also valid here.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- dataSource: DataSource | None#
- dataSourceRef: DataSourceRef | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.VolumeClaimTemplate[source]#
Bases:
BaseModelWill be used to create a stand-alone PVC to provision the volume. The pod in which this EphemeralVolumeSource is embedded will be the owner of the PVC, i.e. the PVC will be deleted together with the pod. The name of the PVC will be
<pod name>-where<volume name>is the name from thePodSpec.Volumesarray entry. Pod validation will reject the pod if the concatenated name is not valid for a PVC (for example, too long). An existing PVC with that name that is not owned by the pod will not be used for the pod to avoid using an unrelated volume by mistake. Starting the pod is then blocked until the unrelated PVC is removed. If such a pre-created PVC is meant to be used by the pod, the PVC has to updated with an owner reference to the pod once the pod exists. Normally this should not be necessary, but it may be useful when manually reconstructing a broken cluster. This field is read-only and no changes will be made by Kubernetes to the PVC after it has been created. Required, must not be nil.- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- volumeClaimTemplateSpec: VolumeClaimTemplateSpec#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Ephemeral[source]#
Bases:
BaseModelephemeral represents a volume that is handled by a cluster storage driver. The volume’s lifecycle is tied to the pod that defines it - it will be created before the pod starts, and deleted when the pod is removed. Use this if: a) the volume is only needed while the pod runs, b) features of normal volumes like restoring from snapshot or capacity tracking are needed, c) the storage driver is specified through a storage class, and d) the storage driver supports dynamic volume provisioning through a PersistentVolumeClaim (see EphemeralVolumeSource for more information on the connection between this volume type and PersistentVolumeClaim). Use PersistentVolumeClaim or one of the vendor-specific APIs for volumes that persist for longer than the lifecycle of an individual pod. Use CSI for light-weight local ephemeral volumes if the CSI driver is meant to be used that way - see the documentation of the driver for more information. A pod can use both types of ephemeral volumes and persistent volumes at the same time.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- volumeClaimTemplate: VolumeClaimTemplate | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Fc[source]#
Bases:
BaseModelfc represents a Fibre Channel resource that is attached to a kubelet’s host machine and then exposed to the pod.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.FlexVolume[source]#
Bases:
BaseModelflexVolume represents a generic volume resource that is provisioned/attached using an exec based plugin.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Flocker[source]#
Bases:
BaseModelflocker represents a Flocker volume attached to a kubelet’s host machine. This depends on the Flocker control service being running
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.GcePersistentDisk[source]#
Bases:
BaseModelgcePersistentDisk represents a GCE Disk resource that is attached to a kubelet’s host machine and then exposed to the pod. More info: https://kubernetes.io/docs/concepts/storage/volumes#gcepersistentdisk
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.GitRepo[source]#
Bases:
BaseModelgitRepo represents a git repository at a particular revision. DEPRECATED: GitRepo is deprecated. To provision a container with a git repo, mount an EmptyDir into an InitContainer that clones the repo using git, then mount the EmptyDir into the Pod’s container.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Glusterfs[source]#
Bases:
BaseModelglusterfs represents a Glusterfs mount on the host that shares a pod’s lifetime. More info: https://examples.k8s.io/volumes/glusterfs/README.md
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.HostPath[source]#
Bases:
BaseModelhostPath represents a pre-existing file or directory on the host machine that is directly exposed to the container. This is generally used for system agents or other privileged things that are allowed to see the host machine. Most containers will NOT need this. More info: https://kubernetes.io/docs/concepts/storage/volumes#hostpath
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Image[source]#
Bases:
BaseModelimage represents an OCI object (a container image or artifact) pulled and mounted on the kubelet’s host machine. The volume is resolved at pod startup depending on which PullPolicy value is provided: - Always: the kubelet always attempts to pull the reference. Container creation will fail If the pull fails. - Never: the kubelet never pulls the reference and only uses a local image or artifact. Container creation will fail if the reference isn’t present. - IfNotPresent: the kubelet pulls if the reference isn’t already present on disk. Container creation will fail if the reference isn’t present and the pull fails. The volume gets re-resolved if the pod gets deleted and recreated, which means that new remote content will become available on pod recreation. A failure to resolve or pull the image during pod startup will block containers from starting and may add significant latency. Failures will be retried using normal volume backoff and will be reported on the pod reason and message. The types of objects that may be mounted by this volume are defined by the container runtime implementation on a host machine and at minimum must include all valid types supported by the container image field. The OCI object gets mounted in a single directory (spec.containers[*].volumeMounts.mountPath) by merging the manifest layers in the same way as for container images. The volume will be mounted read-only (ro) and non-executable files (noexec). Sub path mounts for containers are not supported (spec.containers[*].volumeMounts.subpath). The field spec.securityContext.fsGroupChangePolicy has no effect on this volume type.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Iscsi[source]#
Bases:
BaseModeliscsi represents an ISCSI Disk resource that is attached to a kubelet’s host machine and then exposed to the pod. More info: https://examples.k8s.io/volumes/iscsi/README.md
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Nfs[source]#
Bases:
BaseModelnfs represents an NFS mount on the host that shares a pod’s lifetime More info: https://kubernetes.io/docs/concepts/storage/volumes#nfs
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PersistentVolumeClaim[source]#
Bases:
BaseModelpersistentVolumeClaimVolumeSource represents a reference to a PersistentVolumeClaim in the same namespace. More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#persistentvolumeclaims
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PhotonPersistentDisk[source]#
Bases:
BaseModelphotonPersistentDisk represents a PhotonController persistent disk attached and mounted on kubelets host machine
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PortworxVolume[source]#
Bases:
BaseModelportworxVolume represents a portworx volume attached and mounted on kubelets host machine
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ClusterTrustBundle[source]#
Bases:
BaseModelClusterTrustBundle allows a pod to access the
spec.trustBundlefield of ClusterTrustBundle objects in an auto-updating file. Alpha, gated by the ClusterTrustBundleProjection feature gate. ClusterTrustBundle objects can either be selected by name, or by the combination of signer name and a label selector. Kubelet performs aggressive normalization of the PEM contents written into the pod filesystem. Esoteric PEM features such as inter-block comments and block headers are stripped. Certificates are deduplicated. The ordering of certificates within the file is arbitrary, and Kubelet may change the order over time.- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- labelSelector: LabelSelector | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Secret[source]#
Bases:
BaseModelsecret information about the secret data to project
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ServiceAccountToken[source]#
Bases:
BaseModelserviceAccountToken is information about the serviceAccountToken data to project
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Sources[source]#
Bases:
BaseModelProjection that may be projected along with other supported volume types. Exactly one of these fields must be set.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- clusterTrustBundle: ClusterTrustBundle | None#
- downwardAPI: DownwardApi | None#
- serviceAccountToken: ServiceAccountToken | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Projected[source]#
Bases:
BaseModelprojected items for all in one resources secrets, configmaps, and downward API
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Quobyte[source]#
Bases:
BaseModelquobyte represents a Quobyte mount on the host that shares a pod’s lifetime
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Rbd[source]#
Bases:
BaseModelrbd represents a Rados Block Device mount on the host that shares a pod’s lifetime. More info: https://examples.k8s.io/volumes/rbd/README.md
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ScaleIo[source]#
Bases:
BaseModelscaleIO represents a ScaleIO persistent volume attached and mounted on Kubernetes nodes.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Storageos[source]#
Bases:
BaseModelstorageOS represents a StorageOS volume attached and mounted on Kubernetes nodes.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.VsphereVolume[source]#
Bases:
BaseModelvsphereVolume represents a vSphere volume attached and mounted on kubelets host machine
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Volumes[source]#
Bases:
BaseModelVolume represents a named volume in a pod that may be accessed by any container in the pod.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- awsElasticBlockStore: AwsElasticBlockStore | None#
- downwardAPI: DownwardApi | None#
- flexVolume: FlexVolume | None#
- gcePersistentDisk: GcePersistentDisk | None#
- persistentVolumeClaim: PersistentVolumeClaim | None#
- photonPersistentDisk: PhotonPersistentDisk | None#
- portworxVolume: PortworxVolume | None#
- vsphereVolume: VsphereVolume | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Spec[source]#
Bases:
BaseModelSpecification of the desired behavior of the pod. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- containers: List[Containers]#
- ephemeralContainers: List[EphemeralContainers] | None#
- hostAliases: List[HostAliases] | None#
- imagePullSecrets: List[ImagePullSecrets] | None#
- initContainers: List[InitContainers] | None#
- readinessGates: List[ReadinessGates] | None#
- resourceClaims: List[ResourceClaims] | None#
- schedulingGates: List[SchedulingGates] | None#
- securityContext: SecurityContext | None#
- tolerations: List[Tolerations] | None#
- topologySpreadConstraints: List[TopologySpreadConstraints] | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Template[source]#
Bases:
BaseModeltemplate is the Pod template. The only allowed fields in template.metadata are labels and annotations. If requests are omitted for a container or initContainer, they default to the limits if they are explicitly specified for the container or initContainer. During admission, the rules in nodeSelector and nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution that match the keys in the nodeLabels from the ResourceFlavors considered for this Workload are used to filter the ResourceFlavors that can be assigned to this podSet.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ReplicaSpec[source]#
Bases:
BaseModelReplicaSpec is a description of the replica
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ElasticPolicy[source]#
Bases:
BaseModelElasticPolicy defines the elastic training policy
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.LogMonitoringConfiguration[source]#
Bases:
BaseModelLogMonitoringRule defines the criteria used to detect a SLOW or HANGING job
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.RestartPolicy[source]#
Bases:
BaseModelAdditional restart limiting configurations
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.RunPolicy[source]#
Bases:
BaseModelRunPolicy
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- logMonitoringConfiguration: List[LogMonitoringConfiguration] | None#
- restartPolicy: RestartPolicy | None#
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.PodSets[source]#
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.Pods[source]#
Bases:
BaseModelDEPRECATED pods to include job pods status in jobPods associated with replicaSpecs pods is retained here to support operator upgrade
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.ElasticScalingStatus[source]#
Bases:
BaseModelElasticScalingStatus represents the current state of elastic scaling operations
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.RestartStatus[source]#
Bases:
BaseModelAdditional restart limiting status
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.FaultyPodInstanceList[source]#
Bases:
BaseModelFaultyPodInstanceRecord tracks faulty pod/instances for each restart
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config.HyperPodPytorchJobStatus[source]#
Bases:
BaseModelHyperPodPytorchJobStatus defines the observed state of HyperPodPytorchJob
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- conditions: List[Conditions] | None#
- latestFaultyPodInstanceList: FaultyPodInstanceList | None#
- managerPods: ManagerPods | None#
- podManagerStatuses: List[PodManagerStatuses] | None#
- podSetInfos: List[PodSetInfos] | None#
- restartStatus: RestartStatus | None#
- elasticScalingStatus: ElasticScalingStatus | None#