Skip to content

Nodes Overview#

A node in VergeOS represents a physical or virtual server that contributes compute, storage, and networking resources to your environment. The Nodes dashboard provides comprehensive monitoring and management capabilities for each node in your system.

Key Components#

Node Status Information#

  • Status: Shows the current operational state (Running/Offline)
  • Running: Indicates the node is actively operational
  • Maintenance Mode: Indicates if the node is in maintenance state
  • Last Powered On: Timestamp of when the node was last started
  • Local Time: Current time on the node
  • IPMI Status: Displays the status of the Intelligent Platform Management Interface
  • IPMI Network Address: Network address for remote management access
  • System Version: Shows current VergeOS version (OS Version, vSAN Version, Appserver Version, Kernel Version)

Hardware Configuration#

  • CPU: Processor model and generation
  • CPU Cores: Number of physical processor cores
  • RAM: Total physical memory capacity
  • Failover RAM: Memory reserved for failover scenarios
  • Overcommit RAM: Memory available for oversubscription
  • Cluster: The cluster this node belongs to
  • PXE Network: Network used for PXE boot operations
  • Model: Hardware platform information
  • Series: Product series designation

Asset Tags and Features#

  • Physical: Indicates physical hardware asset
  • Capture System Logs: System logging status
  • LLDP: Link Layer Discovery Protocol status
  • iOMMU (VT-d) Support: Hardware virtualization support
  • Needs Reboot: Indicates if a reboot is required
  • Needs Driver Reload: Indicates if driver updates require reload

Resource Metrics#

The dashboard provides real-time and historical monitoring through:

  • CPU Usage Graph: Displays current, average, and maximum CPU utilization with historical trend visualization
    • Total CPU
    • Core peak
    • Usage (User, System, IO Wait, VM Usage, IRQ)

Node Statistics#

The dashboard provides key metrics organized by category:

  • Physical RAM: Current memory utilization percentage and capacity
  • Virtual RAM: Allocated virtual memory (typically 0% when not overcommitted)
  • Temperature: Current node temperature with visual indicator (green/yellow/red based on thresholds)
  • Running Machines: Count of active workloads on the node
  • Cores Usage: Percentage of CPU cores in use
  • RAM Usage: Memory consumption across running machines

Hardware Resources#

Drives#

Displays all physical drives attached to the node:

  • Status: Online/Offline indicator
  • Name: Drive identifier (e.g., nvme0n1, nvme1n1)
  • Model: Manufacturer and model number
  • Tier: vSAN storage tier assignment
  • vSAN Drive ID: Unique identifier within vSAN
  • Firmware: Drive firmware version
  • Bus: Hardware bus connection
  • Usage: Capacity utilization with visual indicator
  • Repairing: Drive rebuild status
  • Read/Write Errors: Error counters for drive health monitoring

Network Interface Cards (NICs)#

Comprehensive NIC information and status:

  • Status: Operational state (Up/Down)
  • Fabric Status: Core fabric connectivity status (Confirmed/Not Confirmed) with visual indicator
    • The globe icon indicates NICs connected to the core fabric
    • Confirmed status means the NIC is properly integrated into the network fabric
  • Name: Interface identifier (e.g., enp2s0f0, enp8s0)
  • Model: Network adapter model
  • Vendor: Network adapter manufacturer
  • Driver: Network driver in use
  • Port: Physical port number
  • Speed: Link speed (e.g., 10000Mb/s, 2500Mb/s)
  • MAC: Hardware MAC address
  • Network: Associated VergeOS network
  • RX/TX: Received and transmitted data volume
  • RX/TX Rate: Current transfer rates
  • Actions: Quick access icons for NIC operations (console, graphs, configuration)

Memory Modules#

Displays installed RAM configuration:

  • Module count and capacity
  • Memory type and specifications

LLDP Neighbors#

Shows Link Layer Discovery Protocol information for connected network devices:

  • Connected switch information
  • Port mappings
  • Network topology visualization

PCI Devices#

Lists all PCI/PCIe devices:

  • Device types and models
  • Bus assignments
  • Passthrough availability

SR-IOV NIC Devices#

Virtual function information for Single Root I/O Virtualization capable NICs:

  • Virtual function count
  • Assignment status

NVIDIA vGPU DEVICES#

GPU resources available for virtual GPU assignments (if applicable):

  • GPU model and capacity
  • vGPU profiles available
  • Allocation status

USB Devices#

Connected USB devices:

  • Device identification
  • Passthrough capability

Resources#

Resource allocation and limits:

  • Core assignments
  • Memory allocations
  • Storage assignments

Tasks#

Active and scheduled tasks:

  • Running operations
  • Queued jobs
  • Task history

Running Machines#

Displays active workloads with detailed resource consumption:

  • Status: Running state indicator
  • Type: Workload type (Virtual Machine, vNet Container)
  • Name: Machine identifier
  • CPU Cores: Number of assigned cores
  • CPU Usage: Current processor utilization percentage
  • RAM: Allocated memory with utilization percentage and visual indicator
  • Last Started: Timestamp of when the machine was started

Common machine types include:

  • Virtual Machines (VMs)
  • vNet Containers (network services)
  • System services (NAS, DMZ, External, etc.)

Key Features#

Management Operations#

Available from the left-side menu:

  • Power Control:
    • Power On/Off
    • Reboot
    • Kill Power (force shutdown)
  • Maintenance Operations:
    • Enable/Disable Maintenance mode
    • Scale Up (add resources to cluster)
  • Remote Console: Direct access to node console for troubleshooting
  • Configuration:
    • Edit node settings
    • Assign Tags for organization
    • Delete node (when appropriate)
    • Add notes for documentation
  • Refresh: Update dashboard data
  • IPMI Management: Remote management interface access

Monitoring#

  • Real-time Resource Utilization: Live graphs for CPU, memory, and temperature
  • Historical Performance Data: Trend analysis through the History link
  • Event Logs: Comprehensive logging of node events with:
    • Level: Warning/Info/Error classification
    • Time: Timestamp of each event
    • Source: Originating component (e.g., node1)
    • Message: Detailed event description (e.g., temperature warnings)
    • View More option for extended log history
  • Hardware Health Status: Drive errors, NIC status, and component monitoring

Logs and Diagnostics#

The Logs section provides critical system events:

  • Temperature Warnings: Core temperature threshold alerts
  • Hardware Events: Drive status changes, NIC events
  • System Events: Power state changes, maintenance mode transitions
  • Performance Alerts: Resource utilization warnings

Common log entries include core temperature warnings with specific threshold values (e.g., "Core has reached warning temperature '96 / 95").

Best Practices#

  • Monitor Node Health: Regularly review node health metrics, especially temperature and drive status
  • Maintenance Mode: Always enable maintenance mode before performing system updates or hardware changes
  • Review Logs: Periodically check logs for potential issues, particularly temperature warnings and drive errors
  • Load Balancing: Maintain balanced workload distribution across nodes in your cluster
  • Stay Updated: Keep firmware and system software up to date through the System menu
  • Remote Console Access: Use the remote console feature for direct troubleshooting when physical console or SSH access is needed
  • Fabric Status Monitoring: Monitor the fabric status of core NICs to ensure proper network integration and redundancy
  • Drive Health: Monitor drive error counters and repairing status to proactively address storage issues
  • Temperature Management: Address repeated temperature warnings by checking cooling systems and airflow