Skip to content

Performance#

Utilizing an iGPU for AI Acceleration

Key Points

  • iGPUs (built directly into many modern server CPUs) can provide accelerated AI inference for small to medium-sized AI models
  • An iGPU can deliver lower heat output, reduced power consumption, and solid throughput compared to CPU‑only execution or discrete GPUs
  • Using iGPU enables AI workloads to take advantage of onboard GPU power that would often be underutilized
More Information

An iGPU (integrated graphics processing unit) is built directly into the CPU rather than existing as a separate graphics card. Within VergeOS, an iGPU can be used for private AI acceleration, allowing you to leverage hardware already present in many modern servers. This approach reduces the processing burden on CPU cores while typically requiring less power and cooling than dedicated add‑on GPUs. Because iGPUs are optimized for parallel matrix and vector operations, they are generally more efficient than CPUs for inference workloads.

iGPUs share system memory with the CPU, which can be a bottleneck. However, for small to medium models this limitation is usually manageable. If your goal is maximum performance, discrete GPU hardware remains the preferred option.

For lightweight inference tasks (e.g., small language models), iGPUs are often more power‑efficient. For heavy training workloads, neither CPU nor iGPU is ideal.

On personal servers running modest AI models, iGPU inference offers a sweet spot: less heat, lower power draw, and decent throughput.

Benefits of Using an iGPU

  • Lower power consumption and reduced cooling requirements compared to CPU‑only execution
  • Leverages hardware already included in CPUs, avoiding additional purchase costs
  • Better throughput per watt than CPU cores
  • Offloads compute burden from CPU cores, improving overall system responsiveness

Modern servers often include capable iGPUs. Even older or less powerful servers can provide meaningful acceleration by utilizing this built‑in hardware.

Prerequisites

  • Server CPU must have an integrated GPU (iGPU)
  • Verify iGPU is enabled in BIOS settings
  • IOMMU/VT-d (Intel) or AMD-Vi (AMD) may need to be enabled for passthrough

High-Level Steps

To use iGPU for private AI models, the following high-level steps are necessary:

  • Create Resource Group: type = "Host GPU"
  • Add iGPU device(s) from physical nodes to the resource group
  • Assign the resource group to one or more AI models

Configuring a Resource Group with an iGPU

  1. Navigate to Infrastructure > Nodes and double‑click the node containing the iGPU.
  2. Select the PCI Devices card or left‑menu option.
  3. From the device list, set the Type filter (top of the Type column) to Display Controller to show only display devices.
  4. Select the iGPU device from the filtered list. ⚠️ Be careful when selecting devices for resource groups. Choosing the wrong device can cause issues.
  5. Click Make Resource in the left menu.
  6. Create a new Resource Group:
    If no groups exist, the entry form will appear automatically.
    Otherwise, select Attach to: --New Group-- and click Next
  7. Configure Resource Group fields:
    * Name: Provide a descriptive name, e.g. "iGPU"
    * Type: Select Host GPU
    * Max vRAM: Limit the amount of system RAM available to the iGPU (default = 0; no limit).
    • On systems running other workloads, set a max vRAM to prevent contention between workloads and iGPU usage.
    • If max vRAM is set too low, models may fail to load and produce errors.
  8. Click Submit to save the new Resource Group with the selected iGPU.
  9. Follow the prompts at the top of the dashboard to View the node, place the node into Maintenance Mode (see message at top of node dashboard), and Reload the driver (prompted at the top of the dashboard once the node is in maintenance mode).
  10. Exit Maintenance Mode on the node once the driver reload completes.
  11. Navigate to Infrastructure > Resources > Groups and verify your new resource group appears with Enabled checked. Double-click the group to confirm the Node Resources section lists your iGPU device.

Adding Additional Node iGPUs to the Same Resource Group

Pooling iGPUs Across Nodes

By adding iGPUs from multiple nodes into the same resource group, you create a shared pool of acceleration resources that AI models can draw from. For more details, see Resource Groups

To add more iGPUs:

  • Repeat steps 1–5 above.
  • When prompted for a resource group to Attach to, select the existing Host GPU resource group.
  • Follow dashboard prompts to place the node into maintenance mode and reload the driver.
  • Exit maintenance mode after the reload completes.

Assigning the iGPU Resource Group to a Model

Once configured, assign the Host GPU resource group to your AI model:

  1. Navigate to Private AI > Models.
  2. Select the AI model you want to accelerate.
  3. Click Edit in the left menu.
  4. In the GPU Resource Group Allocation field, select your iGPU resource group.
  5. Click Submit to save.

The model will now draw from any available iGPUs in the resource group. For more information on Private AI configuration, see Private AI Configuration.

Understanding VergeOS VM Memory Management

Overview

VergeOS takes a different approach to virtual machine memory management compared to platforms like VMware and Nutanix. Understanding how VergeOS handles memory allocation and reporting is essential for effective capacity planning, performance optimization, and troubleshooting. This article explains why VergeOS memory usage reporting differs from guest operating system reports and the advantages of this design choice.

What You'll Learn

  • Why VergeOS shows allocated memory rather than active memory usage
  • How VergeOS memory management differs from memory ballooning platforms
  • The performance and reliability benefits of VergeOS's approach
  • Best practices for monitoring memory across host and guest levels
  • How this design improves capacity planning and workload migration reliability

Key Concepts

Memory Allocation vs. Memory Usage

Memory Allocation: The amount of physical RAM reserved by the hypervisor for a virtual machine, regardless of how much the guest OS and applications are actively using.

Memory Usage: The amount of memory actually consumed by applications and the operating system within the virtual machine.

In VergeOS, when you assign 8GB of RAM to a VM, the hypervisor immediately reserves 8GB of physical memory on the host, even if the guest OS shows only 2GB in use.

Why VergeOS Shows Allocated Memory

When memory is allocated to a VM, the hypervisor must reserve that full amount in physical RAM regardless of what applications inside the VM are actually using. This is because the guest operating system could request access to any portion of its allocated memory at any time, and the hypervisor must guarantee that memory is available.

VergeOS displays this reserved/allocated memory because it represents the actual physical resources consumed on the host, providing a true picture of resource utilization for capacity planning and performance management.

How VergeOS Differs from Other Platforms

VergeOS Approach: No Memory Ballooning

VergeOS intentionally avoids memory ballooning techniques used by other virtualization platforms. Key characteristics include:

  • Allocated vs. Used: VergeOS shows what's allocated to each VM, which typically isn't the same as guest-level usage
  • Performance First: This eliminates ballooning overhead and complexity
  • Predictable Resource Allocation: What you see is exactly what's reserved on the physical host

Traditional Ballooning Approach

Many virtualization platforms use memory ballooning drivers that:

  • Dynamically report memory usage to the hypervisor
  • Allow memory "overcommitment" by sharing unused memory between VMs
  • Require special drivers (balloon drivers) within each guest OS
  • Create complexity in memory management and potential performance impacts

Benefits of VergeOS's Memory Management Design

1. Predictable Performance

By eliminating balloon driver overhead, VergeOS provides more predictable VM performance. There's no dynamic memory management that could impact application response times or cause unexpected memory pressure.

2. Simplified Capacity Planning

With clear allocation visibility, administrators can easily calculate: - Total memory committed across all VMs - Available memory capacity for new workloads - Resource utilization without complex ballooning calculations

3. Enhanced Reliability

VergeOS avoids dynamic memory management issues that can occur with ballooning, such as: - Memory reclamation delays - Guest OS memory pressure during balloon inflation - Potential application instability during memory operations

4. Guaranteed Migration Success

Critical for High Availability: VergeOS's approach ensures predictable workload migration during node failures. Since the full allocated memory is always reserved, the system can guarantee that all VMs can be migrated to available nodes without memory overcommitment surprises.

If VergeOS used memory ballooning, it could not ensure reliable migration of all workloads to another node during a failure scenario, as the actual memory requirements might exceed the target node's capacity when balloons are deflated.

Migration Reliability

Memory ballooning can create unpredictable migration scenarios. When VMs that appeared to use less memory suddenly require their full allocation during migration, target nodes may lack sufficient resources, potentially causing migration failures during critical moments.

Memory Monitoring Best Practices

Host-Level Monitoring (VergeOS Dashboard)

Use VergeOS dashboards to monitor: - Total allocated memory across all VMs on each node - Available physical memory for new VM deployments
- Memory utilization trends for capacity planning - Node memory status during maintenance and migration operations

Guest-Level Monitoring

Within each VM, use appropriate tools to monitor: - Application memory consumption for performance tuning - Operating system memory usage for optimization - Memory leaks or excessive usage by specific processes - Guest-level performance metrics for troubleshooting

Combined Monitoring Strategy

For comprehensive memory management:

  1. Capacity Planning: Use VergeOS allocation data to plan hardware expansion
  2. Performance Optimization: Use guest-level data to tune applications
  3. Troubleshooting: Compare host allocation with guest usage to identify issues
  4. Resource Optimization: Right-size VMs based on actual guest usage patterns

Practical Example

Consider this scenario:

  • VM Allocated Memory: 8GB (shown in VergeOS)
  • Windows Task Manager: Shows 3GB used
  • Physical Host: Has 8GB reserved for this VM

This is normal and expected behavior. The VergeOS dashboard correctly shows that 8GB of physical memory is committed to this VM, while the guest OS shows its internal usage of that allocated memory.

Troubleshooting Memory Issues

When VergeOS Shows High Memory Usage

If VergeOS shows high memory utilization:

  1. Review VM allocations: Check if VMs are over-allocated for their actual needs
  2. Plan capacity expansion: High allocation percentages indicate need for more physical RAM
  3. Optimize VM sizing: Consider reducing allocations for underutilized VMs

When Guest OS Shows Memory Pressure

If applications report memory issues while VergeOS shows available allocation:

  1. Check guest OS configuration: Verify VM has adequate allocated memory
  2. Review application requirements: Ensure sufficient memory is allocated
  3. Monitor memory leaks: Look for applications consuming excessive memory over time

Memory Performance Issues

For memory-related performance problems:

  1. Verify adequate allocation: Ensure VMs have sufficient memory allocated
  2. Check host memory pressure: Avoid overcommitting total physical RAM
  3. Review storage impact: Memory pressure can cause increased swap activity

Best Practices for Memory Management

Right-Sizing Virtual Machines

  • Start with manufacturer-recommended memory allocations
  • Monitor guest-level usage over time to identify optimization opportunities
  • Avoid significant over-allocation that wastes physical resources
  • Leave adequate buffer for memory spikes and growth

Capacity Planning

  • Plan physical memory capacity based on total VM allocations, not guest usage
  • Account for hypervisor overhead and system memory requirements
  • Maintain 10-15% free capacity for maintenance and unexpected demand
  • Consider future growth when sizing new nodes

Performance Optimization

  • Allocate sufficient memory to avoid guest OS memory pressure
  • Use memory monitoring tools within VMs to identify optimization opportunities
  • Consider workload patterns when planning memory allocation
  • Test application performance with different memory allocations

Next Steps

To deepen your understanding of VergeOS memory management:

Additional Resources

For specific questions about memory allocation in your environment, consult the VergeOS support team at support@verge.io or review the performance monitoring sections in the product documentation.


Document Information

  • Last Updated: 2024-08-15
  • VergeOS Version: 4.12.6+