Skip to content

vSAN#

vSAN Diagnostics Guide

Overview

This guide provides comprehensive information about the vSAN diagnostic options available in the user interface. These diagnostic tools enable system administrators to monitor, troubleshoot, and maintain vSAN deployments effectively.

Critical Warning

The diagnostic commands detailed in this guide are powerful administrative tools. Improper usage can result in: - System outages - Service interruptions - Potential data loss

Exercise extreme caution and ensure proper understanding before execution.

Prerequisites

To use these diagnostic tools, you must have:

  • Root-level access to your VergeIO cluster
  • Note: Tenants do not have a vSAN.

Accessing vSAN Diagnostics

  1. Navigate to vSAN Diagnostics using either method:
  • From the home screen: Select the vSAN Tiers count box → vSAN Diagnostics (left menu)
  • Alternative path: Home screen → System (left menu) → vSAN Diagnostics 2. Command execution:

  • Select desired command from the dropdown menu

  • Configure available options if applicable
  • Click SEND→ to execute

Command Visibility

Enable the "Show Command" option to view the exact command being executed. This can be valuable for: - SSH execution - BASH script integration - Advanced command automation

Diagnostic Commands

Add Drive to vSAN

Running this command allows you to manually add a drive via the UI. Drives are normally added either during the installation, or via the Nodes > Nodes Drives page. However adding them in that way does not allow for the addition of drives to Tier0.

Prerequisites:

  • Drive must be physically present in the system
  • Drive must be visible from Nodes > Nodes Drives page

Usage Parameters:

  • Selecting Add Drive to vSAN
  • From the right menu, select the Node that we will be adding the drive to.
  • Enter the appropriate path, E.G. `/dev/nvme0n1```<br>
  • You can use the "Click here to view devices" to get the path. !!! warning "The contents of this drive will be overwritten."
  • Select the Tier you want to assign the drive to.
  • Check the Swap box if you want Swap enabled on this drive. !!! info "This will use the cluster settings for the Swap size."
  • Verify. You will need to TYPE Yes I know what I'm doing in the Verify box.
  • Select SEND →

CLI Syntax:

vcmd newdevice --path=PATH [OPTIONS]
  --path=PATH    Path to target device
  --tier=NUM     Tier number assignment

Cancel Integrity Check

Terminates any active integrity check operations. See Integrity Check for additional information.

CLI Syntax:

vcmd cancelintegcheck

Clear Reference Counts

Reference counts are how the vSAN tracks the number of times a file is referenced in the vSAN. Clearing this count will force a full vSAN walk and a refresh of the Reference Counts.

Function:

  • Clears existing reference counts
  • Initiates full vSAN traversal
  • Rebuilds reference count data

Support Authorization Required

Execute only under direct support guidance.

Usage Parameters:

  • Verify. You will need to TYPE Yes I know what I'm doing in the Verify box.
  • Select SEND →

CLI Syntax:

vcmd clearrefcounts

Find Inode

Running this query will allow you to find out what an Inode (Index Node) referrences. Inode is a data structure that stores information about a file or directory, such as its owner, access rights, date and time of creation and modification, size and location on the vSAN. Each file or directory in the system has its own unique index node number (inode number), which can be used to perform various operations with a file or directory. This can be used to troubleshoot errors in the vSAN.

Purpose:

  • Retrieves inode reference information
  • Maps inode numbers to filesystem entities
  • Assists in vSAN troubleshooting

CLI Syntax:

find /vsan -inum inode_number_here -printf /%P\n

Get Cache Info

Retrieves detailed cache information for specified nodes.

Output Information:

  • Total cache capacity
  • Available cache space
  • Cache page statistics
  • Performance metrics

CLI Syntax:

vcmd getcacheinfo

Get Clients

Retrieves client connection information for specified nodes.

Output Information:

  • Connected node information
  • IP address mappings
  • Worker thread statistics

CLI Syntax:

vcmd getclients

Get Cluster Rates

Retrieves cluster-wide performance metrics.

Output Information:

  • Read/write rates
  • Throttle status
  • Performance statistics

CLI Syntax:

vcmd getclusterrates

Get Cluster Usage

Provides cluster-wide storage utilization information.

Output Information:

  • Maximum storage capacity
  • Current utilization
  • Repair operation counts

CLI Syntax:

vcmd getclusterusage

Get Current Master

Retrieves master node information from each cluster member.

Output Information:

  • Master node identification
  • Online status
  • Transaction logging information

CLI Syntax:

vcmd getcurmaster

Get Device Integrity

Retrieves integrity check results for specified nodes.

CLI Syntax:

vcmd getdeviceinteg

Get Device List

Provides comprehensive device inventory.

Output Information:

  • Device identifiers
  • System paths
  • Tier assignments

CLI Syntax:

vcmd getdevicelist

Get Device Status

Retrieves detailed device status information.

Output Information:

  • Device paths
  • Operational status
  • Capacity metrics
  • Performance statistics

CLI Syntax:

vcmd getdevicestatus

Get Device Usage

Provides device utilization metrics.

Output Information:

  • Total capacity
  • Current utilization
  • Usage trends

CLI Syntax:

vcmd getdeviceusage

Get File Status

Retrieves detailed file metadata.

Output Information:

  • Inode information
  • File type
  • Tier assignment
  • Hash key data

CLI Syntax:

vcmd stat /path/to/file.raw

Get Fuse Info

Retrieves FUSE (Filesystem in Userspace) statistics.

Output Information:

  • Mount point information
  • Thread statistics
  • Throttling metrics

CLI Syntax:

vcmd getfuseinfo

Get Integrity Check Status

Retrieves results from the most recent integrity check.

Output Information:

  • Check status
  • Path information
  • Temporal data
  • Verification results

CLI Syntax:

vcmd getintegcheckstatus

Get Journal Status

Retrieves journal system status information.

Output Information:

  • Operational status
  • Redundancy status
  • System metadata

CLI Syntax:

vcmd getjournalstatus

Get Node Device List

Retrieves detailed hardware information for storage devices.

Output Information:

  • Driver information
  • Model specifications
  • Firmware versions
  • Physical attributes

CLI Syntax:

vcmd getnodedevicelist

Get Node Info

Retrieves comprehensive node configuration data.

Output Information:

  • Node identification
  • Cluster configuration
  • System parameters
  • Operational status

CLI Syntax:

vcmd getnodeinfo

Get Node List

Provides cluster-wide node inventory.

Output Information:

  • Node identification
  • Online status
  • Version information
  • Tier utilization

CLI Syntax:

vcmd getnodelist

Get Path from Inode

Resolves filesystem paths from inode numbers.

CLI Syntax:

vcmd getpathfromino $1

Get Read Ahead

Retrieves read-ahead buffer statistics.

Output Information:

  • Queue statistics
  • Thread utilization
  • System status

CLI Syntax:

vcmd getreadahead

Get Repair Status

Monitors ongoing repair operations.

Output Information:

  • Device repair status
  • Operation progress
  • System health

CLI Syntax:

vcmd getrepairstatus

Get Running Configuration

Retrieves active system configuration.

Output Information:

  • Worker thread allocation
  • System throttles
  • Operational parameters

CLI Syntax:

vcmd getrunningconf

Get Sync List

Monitors synchronization operations.

Output Information:

  • Operation frequency
  • Start times
  • File processing status

CLI Syntax:

vcmd getsynclist

Get Tier Device Maps

Retrieves tier-to-device mapping information.

Output Information:

  • Physical device mappings
  • Tier assignments
  • System configuration

CLI Syntax:

vcmd gettierdevicemaps

Get Tier Node Maps

Retrieves tier-to-node mapping information.

Technical Details:

  • Base-0 indexing (0=Node1, 1=Node2, etc.)
  • 65536 buckets per tier map
  • Primary (tier_x.0) and redundant (tier_x.1) mappings

CLI Syntax:

vcmd gettiernodemaps

Get Tier Status

Retrieves comprehensive tier health information.

Output Information:

  • Redundancy status
  • Walk statistics
  • Transaction data
  • Health metrics

CLI Syntax:

vcmd gettierstatus

Get Top Usage Rates

Monitors real-time I/O statistics.

Real-time Data

Multiple executions may be necessary for trend analysis.

CLI Syntax:

vcmd getfhlist | grep -Eo '(ino|rrate|wrate)\b.*'

Get Volume Usage

Retrieves detailed volume utilization statistics.

Parameters:

  • Path specification (optional)
  • Recursive flag
  • Human-readable output
  • Preferred tier display

CLI Syntax:

vcmd getvolusage --path=/ --recursive=1 --human=1

Integrity Check

Initiates system integrity verification.

Parameters:

  • Path specification (required)
  • Recursive operation
  • Fix mode (destructive)
  • Meta-tier only option

Data Loss Risk

Fix mode zeros bad blocks. THIS IS DESTRUCTIVE. Use only under support guidance.

CLI Syntax:

vcmd integcheck /vol

Integrity Check Device

Performs device-level integrity verification.

Parameters:

  • Node selection
  • Device ID (-1 for all devices)

CLI Syntax:

vcmd integcheckdevice --id=x

Summarize Disk Usage

Generates storage utilization summaries.

Parameters:

  • Path specification
  • Recursive operation
  • Preferred tier display
  • Deduplication analysis
  • Fast deduplication option

CLI Syntax:

vcmd du /vol

Additional Resources

Feedback

Need Help?

If you need further assistance or have any questions about this article, please don't hesitate to reach out to our support team.

Document Information

  • Last Updated: 2024-12-27
  • VergeOS Version: 4.13.2

Adding Tier 0 to an Existing System

Overview

Key Points

  • Tier 0 is normally configured during initial installation
  • This procedure is for special cases requiring post-installation configuration
  • Requires careful attention to device paths and hardware compatibility

This guide outlines the process for adding Tier 0 storage to an existing VergeOS system. While Tier 0 is typically configured during installation, these steps provide a method for adding it to production systems that cannot be reinstalled.

Critical Warning

  • This procedure should only be performed by qualified VergeOS engineers or under direct support guidance
  • Selected devices will be formatted and all existing data will be destroyed
  • Incorrect device path selection can seriously damage your system

Prerequisites

Before beginning this procedure, ensure:

  • Storage devices are physically installed in the system
  • Tier 0 devices are consistent across controller nodes
  • Hardware meets specifications from the Node Sizing Guide

Steps

1. Identify Device Paths

  1. Navigate to System > vSAN Diagnostics from the Main Dashboard
  2. Select Get Node Device List from the Query dropdown
  3. Click Send
  4. Identify unused devices (marked as "vsan = false")
  5. Note the device paths (/dev/sd*) for each controller node

Tip

Verify current vSAN drive assignments by checking vSAN Tiers > [select tier] > Drives to avoid selecting drives already in use.

2. Add Drives to Tier 0

For each drive:

  1. In vSAN Diagnostics:
    • Set Query to Add Drive to vSAN
    • Select the appropriate Node (node0 or node1)
    • Enter the correct Path for the device
    • Set Tier to Tier 0
    • Configure Swap setting

Swap Configuration

  • Enable swap on only ONE storage tier
  • If swap is enabled on another tier, disable it for Tier 0
  • Contact VergeOS Support for guidance on swap configuration if needed
  1. Enter the verification phrase: Yes I know what I'm doing
  2. Click Send to execute

3. Verify Configuration

  1. Monitor the system dashboard for tier status - Status will show "online-no redundancy" during meta migration
  2. Refresh node information: - Navigate to each controller node's dashboard - Select Refresh > Drives & NICs

Post-Configuration

Monitor the vSAN tier status in the system dashboard. The tier should transition from "online-no redundancy" to "online" once meta migration completes.

Additional Resources


Document Information

  • Last Updated: 2024-11-25
  • VergeOS Version: 4.13

Scaling Up a vSAN

To scale up a vSAN, follow the steps below. However, before proceeding, ensure that your current vSAN has at least 10% free capacity.

Important

  • All drives in a tier must be alike. If a drive of an incorrect size is added to an existing tier, the tier will only be able to use the space of the smallest drive.
  • Ensure that your vSAN has at least 10% free capacity unless you are doubling the capacity. If the free space is less than 10% and you are not doubling the drive count, consider scaling out by adding a node.

Steps to Scale Up

  1. Physically add the drives or Fiber Channel LUNs on the node you want to scale up.

  2. Log in to the host system's UI and select the appropriate cluster you want to scale out from the top compute cluster section on the home page.

  3. Select the node that you are scaling up.

  4. Refresh the system to recognize the new drives: - Select Refresh from the left menu, and choose Drives & NICs from the dropdown. - Confirm by selecting Yes.

  5. Select the Scale Up option on the left menu.

  6. The page will now show the newly inserted drives in an offline state. Select the drive(s), then under Node Drives, select the Scale Up function.

  7. Select the appropriate tier for the drive(s) and submit.

Upon completion, the screen will refresh and the drives will disappear from the view. Go back to the main page, where you will see the vSAN tiers change color to yellow, indicating that it is in a repair state. This is expected, and the vSAN will return to a green/healthy state after a few minutes, showing the newly added tier or increased space on an existing tier.

Repeat these steps for each node as necessary.


Document Information

  • Last Updated: 2024-08-29
  • VergeOS Version: 4.13

Preferred Tier Usage

How Preferred Tier Settings Determine Which Tier to Use

When creating or modifying a virtual machine (VM) disk drive in VergeOS, users can set a Preferred Tier. In most cases, this is left at default, which can be configured under System > System Settings > Default VM Drive Tier. However, the system's behavior when a specified tier does not exist can be unexpected. Here's how VergeOS determines which tier to use in such cases:

  • Setting a preferred tier to a non-existent higher tier:

    • Example: If a user selects Tier 3 in a system that only has Tier 1 and Tier 4 storage available, the system will attempt to pick the next higher (slower) tier. In this case, the system will default to Tier 4.
  • Setting a preferred tier to a non-existent lower tier:

    • Example: If a user selects Tier 3 in a system that only has Tier 1 and Tier 2 storage, the system will pick the next lower (faster) tier. In this case, the system will default to Tier 2.

In both scenarios, VergeOS ensures that the closest available tier is selected based on the user’s preference.


Document Information

  • Last Updated: 2024-08-29
  • vergeOS Version: 4.12.6

vSAN Encryption Information

You can confirm that the vSAN has encryption enabled by navigating to Nodes> Node 1> Drives and then double-clicking on the first drive in the list.  The Encrypted checkbox is checked if the Vsan is encrypted.

  • Encryption for the vSAN is configured during the initial installation only.

  • System startup on an encrypted system can be configured two different ways:

  1. The most common method is by having encryption keys written to a USB drive during the initial installation. In this scenario, these drives are typically plugged into the first two nodes of an encrypted system to boot normally. All other nodes do not require them, as Node 1 and Node 2 are the controller nodes. The USB drive does not require much storage at all, less than 1GB.
  2. If the controller nodes do not have USB encryption keys connected, the system will prompt an operator to type the proper encryption password to complete the power-up process.
  • Default encryption is set for all snapshot synchronizations through a site-sync.

Information about encrypting a Site Synchronization can be found in the Product Guide


Document Information

  • Last Updated: 2024-09-03
  • VergeOS Version: 4.12.6

Reasons for Unexpected / Unexplained vSAN Growth

There are several reasons for the vSAN to start growing at a rate faster than anticipated. Administrators should first determine when the unexplained growth occurred by reviewing the vSAN Tiers' growth history, and then assess potential areas for unexpected growth.

Review vSAN Tiers for Growth History

To isolate unexplained growth, it is important to narrow down when the growth increased exponentially. Using the steps below, administrators can review storage growth and visualize normal growth from daily operations versus spikes in growth, which are typically unexpected.

  1. Navigate to the vSAN Tiers from the Main Dashboard. If vSAN Tiers is not present, then this environment is a tenant of a parent system, and the vSAN tier needs to be examined at the parent system.
  2. Open the vSAN Tier with unexpected growth (for example, vSAN Tier 0).
  3. On the left navigation menu, click on History.
  4. A new menu will appear showing history in various graphs. Modify the filter period to isolate any growth on this tier. - It is recommended to start with a custom filter of 1 day and review the Storage Usage graph.

Things to Note:

  • If you see dips and spikes every hour or once a day, this is likely the result of snapshots falling out of retention (old ones expiring, new ones being created). Note whether the total storage consumed at the start of the day is nearly equivalent to the end of the day. If so, expand the custom filter to a week.
  • When reviewing by week, check if the total storage consumed at the start of the week is similar to the end. If, for example, the growth is roughly 10%, repeat for the previous week. If the weekly growth percentage is consistent, this represents your average weekly growth rate, which can help plan for hardware expansion.
  • Filter the current month and check for any sudden spikes in storage consumption on the Storage Usage graph. Click and drag over the time in question to zoom in on the data, and hover over the graph for specific date/time information.

vsan_unexpected_growth.png

Possible Reasons for Storage Increase

Several areas in the VergeOS platform may contribute to unexpected storage growth. Common areas to check include:

  • Cloud Snapshots:
  • Navigate to System > Cloud Snapshots.
  • Are any being held past their expected expiration time?
  • Are there snapshots without a Snapshot Profile? These may have been taken manually. Investigate when and why they were taken.
  • Are any snapshots set to "Never Expire"? This can lead to large data consumption over time.

  • Virtual Machines (VMs) Snapshots:

  • Navigate to the Machines Dashboard. The Snapshots count box shows the number of machine-level snapshots present. Click this box to list all VM snapshots and their creation date/time. Review if any can be removed.
  • Navigate to Machines > Virtual Machines. Sort by the Snapshot Profile column to identify VMs with machine-level snapshots. These are included in the recurring cloud snapshots, so review whether individual snapshots are necessary or if they can be removed.

  • VMWare Backup Jobs:

  • Navigate to Backup/DR > VMware Services and review each VMware Service instance for Backup Job history.
  • On the left menu, click Backup Jobs to review each specific instance. Check the Expires column for each backup and review if it can be removed.

  • Media Images:

  • Navigate to Media Images and sort by Modified. Check if any upload dates/times match the unexplained growth period.
  • Review whether media images, especially other hypervisor formats (e.g., .ova or .vhdx), can be removed.

  • Incoming Site Syncs:

  • Navigate to Backup/DR > Incoming Syncs. Open each Incoming Sync dashboard and check the Received Snapshots count. Investigate the source (origin) site for increased storage matching the timeframe.

  • Tenant Storage:

  • Navigate to Tenants > Each Tenant Dashboard.
  • Review Total Storage Used by clicking on History in the left menu. Follow the same process listed above to review growth history.
  • If unexpected growth is found, investigate within the tenant for the possible causes of storage increase (as listed above), and within any sub-tenants if applicable.

Document Information

  • Last Updated: 2024-09-03
  • VergeOS Version: 4.12.6

How To Identify a Failed Disk In Your VergeOS Environment

VergeOS offers a diagnostic function that allows system administrators to turn a disk drive's LED light on or off, making it easier to physically identify a failed or problematic drive. Follow the steps below to locate a failed disk drive for replacement.

Steps to Identify a Failed Disk

  1. Log in to the VergeOS UI and navigate to the dashboard of the node where the failed disk resides.
  2. On the Node Dashboard, locate and select Diagnostics from the left-hand column.
  3. In the Diagnostics page, change the Query to LED Control (Drive).
  4. In the LED Control (Drive) details section:
    • Path: Enter the path to the drive you want to locate (e.g., /dev/sdb). If you're unsure of the path, check the system alerts and logs for recent error or warning messages.
    • State: Set the LED state to On, then click Send to activate the LED light on the drive.
  5. Locate the drive with the active LED indicator in your physical server.
  6. Once the drive has been identified and replaced, set the State to Off and click Send to deactivate the LED light.

For detailed instructions on drive replacement, refer to the Maintenance section in the inline help under Drive Replacement. This section guides you through the entire process.


Document Information

  • Last Updated: 2024-08-29
  • vergeOS Version: 4.12.6