🔹 1. vSAN Performance Tuning
Scenario 1: vSAN Resync Impact & Congestion Issues
🔹 Symptoms:
- High vSAN congestion impacting VM performance.
- Resync operations slowing down active workloads.
🔹 Tuning Steps:
1️⃣ Check vSAN Cluster Congestion Levels
bashesxcli vsan debug object list
- Look for congested components (
Congestion:
value > 60% is bad).
2️⃣ Enable Adaptive Resync (If Not Already Enabled)
bashesxcfg-advcfg -s 1 /VSAN/ResyncTrafficThrottling
- Helps balance resync vs. VM workloads dynamically.
3️⃣ Manually Throttle Resync Traffic (If Needed)
bashesxcfg-advcfg -s <value> /VSAN/ResyncIopsLimit
- Set
<value>
based on workload requirements (default: 100 MB/s).
✅ Best Practices:
- Use Adaptive Resync instead of manual limits for most cases.
- Monitor resync impact via vSAN Performance Service.
Scenario 2: vSAN RAID Level Performance Optimization
🔹 Symptoms:
- High latency on RAID-5/6 vSAN policies.
- Excessive write amplification causing IO bottlenecks.
🔹 Tuning Steps:
1️⃣ Check RAID Level in Storage Policies
bashesxcli vsan policy get
- Ensure RAID-1 (Mirroring) is used for latency-sensitive workloads.
- RAID-5/6 should only be used for space savings (not high performance).
2️⃣ Increase Stripe Width for RAID-5/6
bashesxcli vsan policy set --stripe=<value>
- Use
stripe=4
or higher for better parallelism.
✅ Best Practices:
- Use RAID-1 for performance, RAID-5/6 for space efficiency.
- If using RAID-5/6, increase stripe width to optimize parallel I/O.
Scenario 3: vSAN Deduplication & Compression Tuning
🔹 Symptoms:
- High CPU utilization due to vSAN deduplication.
- High write amplification slowing down NVMe-based vSAN.
🔹 Tuning Steps:
1️⃣ Check If Deduplication & Compression Are Enabled
bashesxcli vsan cluster get
2️⃣ Disable Deduplication If Using High-Performance NVMe Storage
bashesxcli vsan storage policy set --allow-dedup=0
- Deduplication/compression are CPU-intensive; disable if latency-sensitive.
✅ Best Practices:
- Use deduplication/compression for capacity savings, not for performance-critical workloads.
- If using high-speed NVMe, disable deduplication for lower write latency.
🔹 2. NVMe Optimization
Scenario 4: NVMe Queue Depth & IOPS Tuning
🔹 Symptoms:
- NVMe storage experiencing high latency under load.
- ESXi logs show “Nvme Queue Depth Exceeded” errors.
🔹 Tuning Steps:
1️⃣ Check NVMe Queue Depth
bashesxcli storage core device list -d nvmeX | grep "Queue Depth"
- Default queue depth may be too low for high IOPS workloads.
2️⃣ Increase NVMe Queue Depth (If Needed)
bashesxcli system settings advanced set -o /Disk/NVMe/QueueDepth -i 128
- Increase value to 128 or 256 based on workload requirements.
3️⃣ Enable NVMe Polling to Reduce Latency
bashesxcli system settings advanced set -o /Disk/NVMe/Polling -i 1
- Reduces CPU interrupts and lowers NVMe read latency.
✅ Best Practices:
- Increase queue depth if handling high IOPS workloads.
- Enable polling mode for ultra-low latency NVMe tuning.
Scenario 5: NVMe Interrupt Moderation Optimization
🔹 Symptoms:
- High CPU utilization under heavy NVMe I/O load.
🔹 Tuning Steps:
1️⃣ Check Interrupt Moderation Settings
bashesxcli system settings advanced list -o /Net/InterruptModeration
- Default may be too aggressive, leading to high CPU usage.
2️⃣ Enable Interrupt Moderation for High IOPS Environments
bashesxcli system settings advanced set -o /Net/InterruptModeration -i 1
- Reduces CPU load while maintaining high throughput.
✅ Best Practices:
- Use interrupt moderation to balance CPU usage vs. I/O performance.
🔹 3. Fibre Channel (FC) Performance Tuning
Scenario 6: FC Queue Depth & Buffer Credit Tuning
🔹 Symptoms:
- High latency on FC-connected datastores.
- Storage LUN queue depth issues slowing down performance.
🔹 Tuning Steps:
1️⃣ Check HBA Queue Depth
bashesxcli storage core device list -d vmhbaX
- Look for
Queue Depth
values below 64 (too low).
2️⃣ Increase LUN Queue Depth (If Needed)
bashesxcli system settings advanced set -o /Disk/MaxQueueDepth -i 128
- Higher values help increase parallel I/O requests.
3️⃣ Check Buffer Credits on FC Switch (Cisco/Brocade)
- On Cisco MDS: bashCopyEdit
show fc buffers
- On Brocade: bashCopyEdit
portbuffershow
- Ensure adequate buffer credits for high-bandwidth storage links.
✅ Best Practices:
- Increase LUN queue depth if handling high IOPS workloads.
- Tune buffer credits to avoid FC congestion issues.
Scenario 7: NPIV & Zoning Optimization
🔹 Symptoms:
- LUN masking issues causing random FC disconnections.
- NPIV-enabled VMs unable to see LUNs.
🔹 Tuning Steps:
1️⃣ Verify NPIV Is Enabled in ESXi
bashesxcli storage san npiv status
2️⃣ Check If the NPIV WWPNs Are Logged Into the Fabric
bashesxcli storage san fc wwn list
3️⃣ Ensure Proper Zoning Configuration
- Single Initiator – Multiple Targets zoning for best performance.
- Avoid overlapping zones that cause LUN conflicts.
✅ Best Practices:
- Use NPIV for VM direct FC access, but ensure correct zoning.
- Avoid excessive zoning overlaps, which can cause FC pathing issues.