Phase 107: AI Infrastructure and Distributed Training

Phase 107 of the AI Encyclopedia — AI Infrastructure and Distributed Training. Topics 2121–2140.

Part of the AI Encyclopedia · Phase 107 of 130 · Topics 2121–2140

This phase covers AI Infrastructure and Distributed Training. Below are the 20 concepts grouped under this phase — each is a future article in the Insightful AI World encyclopedia.

2121 GPU Computing

2122 CUDA Basics

2123 TPU Systems

2124 AI Accelerators

2125 Distributed Training

2126 Data Parallelism

2127 Model Parallelism

2128 Tensor Parallelism

2129 Pipeline Parallelism

2130 ZeRO Optimization

2131 Gradient Accumulation

2132 Mixed Precision Training

2133 Activation Checkpointing

2134 Memory Optimization

2135 Checkpoint Management

2136 Cluster Scheduling

2137 Training Fault Tolerance

2138 Large-scale Data Loading

2139 Compute Cost Estimation

2140 Training Infrastructure Design

← Phase 106

All phases

Phase 108 →