The following is a comprehensive troubleshooting guide to the x360Recover agent for Linux.
Components of the x360Recover agent for Linux
The core foundation of any image-based backup solution is the file system snapshot.
Taking a snapshot of the file system allows data to be frozen at a moment in time. This provides a consistent backup of everything, at one point it time.
Under Windows, the Microsoft Volume Shadowcopy Service (VSS) provides this snapshot functionality.
Unfortunately, under Linux, there is no native universal snapshot feature built in to the kernel, so we have to provide our own: the Elastio snapshot driver.
What is Elastio?
Elastio consists of three components:
- A kernel module driver
- A system service
- A library of utilities for managing snapshots
How does the Elastio snapshot driver work?
The Elastio driver is embedded in the kernel. This means it is able to intercept file system operations before they are committed to disk.
Elastio has two operational modes:
- actively maintaining a snapshot of disk volumes
- tracking a map of blocks which have been changed since the last backup
Actively maintaining a snapshot of disk volumes
To create and maintain a snapshot of one or more disk volumes, Elastio momentarily freezes disk writing. It then inserts a copy-on-write virtual disk volume overlay for each volume being frozen. Ongoing disk writing continues to occur normally.
Data is written to the disk volume without interference or queuing. There is never a point in time where the snapshot system might cause any ‘missed’ data writing.
Instead, prior to allowing each block to be written, Elastio makes a copy of the original data block within the COW image disk. This copy preserves the original data state for the snapshot. A virtual device is created representing the ‘snapshot’ that overlays the COW file data onto the live filesystem in a read-only view. This presents a frozen image of the system, and provides a backup image.
Tracking a map of blocks changed since the last backup
Once the backup is complete, the snapshot COW file is discarded and Elastio is transitioned into tracking mode.
Rather than maintaining a snapshot and injecting change data into a COW file, in this mode the driver is simply maintaining a bitmap of blocks which have changed since the last backup.
Operationally, Elastio is constantly tracking block changes throughout the system.
Creating a new snapshot marks a demarcation between one bitmap and the next, providing an accurate list of changes between each snapshot / backup point.
Elastio stores its COW files and metadata in a hidden folder named
Snapshots are maintained by Elastio and managed by the x360Recover agent. There should never be a reason to manually modify snapshot metadata.
Note: the _<id> portion of the folder name is a hash of the mount point path.
The backup agent itself is named xcloud-agent and its roles include
- running as a service
- communicating with the appliance or vault
- managing Elastio snapshots
- performing all backup operations and functions
Most common agent configuration options are available from the appliance or vault UI using the Agent Configuration option. We recommend managing all of your agent configurations using this method (unless you have specific instructions supplied the Axcient support team.)
All relevant configuration information for the x360Recover agent for Linux
Most agent configuration parameters
Local cache credentials and the Direct-to-Cloud access token Note: The local cache network share user password is stored in secure_config.toml in encrypted format. If you need to change the user password manually within the config file, you may save the password in plain text and it will be encrypted on the next agent service restart.
All relevant agent log files
Detailed agent troubleshooting information
User-friendly Windows event log style information
Agent update information
Note: Agent 3.x is a converged code project with both Windows and Linux versions of the agent. (The Windows agent is not yet released.) All ‘Windows Event Log’ entries and event id’s are the same for both Windows and Linux. As Linux obviously doesn’t have a ‘Windows Event Log’ these user friendly information events are stored in the agent-events.log file on Linux.
What are the steps in a Linux backup operation?
In general terms, each backup cycle follows the following steps:
- The appliance signals the agent to start a backup.
- The agent collects backup job information (such as throttling parameters) and starts the backup.
- A snapshot is created of the system to freeze data for backup.
- Change detection is performed to identify new data to send to the backup server.
- Change data for all volumes is transmitted to the backup server.
- A Backup Complete check is performed to verify data integrity.
- If successful, the snapshot is committed on the backup server; Otherwise, the backup job fails
- The snapshot is removed and the agent begins waiting for the next backup cycle
How does the agent calculate which data has changed?
The agent uses two methods to calculate which data has changed:
Method #1: “Slow algorithm”
File system metadata is used to determine the date/time of each block. Any blocks identified as newer than the previous backup snapshot time are defined as ‘changed’. This process is slow and can take minutes to complete on large filesystems. This method is sometimes referred to as the ‘slow algorithm’.
Method #2: “Fast algorithm” or “FastDelta”
The native snapshot mechanism uses it's own built-in change tracking functionality to identify changed blocks. This method is known as the ‘Fast Algorithm’ or ‘FastDelta’.
For Windows, this method can be tricky, because there are several Microsoft applications (such as Exchange and SQL) that ‘cheat’ when operating with Volume Shadowcopy Services (VSS) for performance reasons. VSS does not accurately calculate change data for these database files. For this reason, on Windows, the agent has a special driver that reads and interrogates the VSS data directly on disk to perform block change calculations.
Note: Without the FastDelta mechanism, Microsoft Exchange and SQL databases on Windows must be parsed in their entirety for block changes. This process can take a considerable amount of time for very large databases. We recommend enabling FastDelta for Windows Exchange and SQL servers.
Fortunately, the Linux snapshot driver (Elastio) is embedded directly into the Linux kernel as a driver module. There is no way to ‘cheat’ the snapshot layer on Linux, so the changed block data provided by the driver is accurate.
Under Linux, changes to the kernel structure in more recent versions (v5.4.0+) have removed the mechanism used to detect and manage volume mounts and dismounts by the Elastio module.
This means that we cannot quiesce and save changed block states during system shutdown and startup. Every time the system is restarted the first backup will be required to use the ‘Slow Algorithm’ method to ensure all block changes are captured successfully. All subsequent backups will leverage the FastDelta mechanism for determining changed blocks.
- Agent 3.x, for both Linux and Windows, enables FastDelta by default.
- Agent 2.x disables FastDelta by default.
- Configuration for FastDelta can be managed via the Agent configuration option on the Protected System Details page.
Linux agent installation
During the installation of the Linux agent under Ubuntu (or other Ubuntu-based distributions, like Zorin, or Mint), you may encounter an appstream error. Here are steps needed to resolve this error:
How to resolve an appstream error during installation
During the installation of the Linux agent under Ubuntu (or other Ubuntu-based distributions, like Zorin, or Mint), you may encounter an error message similar to this example:
This is a known Ubuntu issue with apt-get update.
For details, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906538
1. Reinstall libappstream
2. Open a terminal, ssh to the system, or login to the console and switch to the root user.
3. Run the following command to reinstall libappstream:
apt-get install --reinstall libappstream4
Potential issues with local cache are essentially the same for both Windows and Linux.
For assistance in troubleshooting local cache, see Local Cache for Direct to Cloud.
Errors creating a snapshot
- Issue: If the agent.log file contains errors pertaining to failures to create a snapshot, it is possible that the Elastio driver is in an invalid state.
- Solution: Reboot the system. The agent is aware of most potential Elastio issues and will attempt to reset the snapshot system after a failed backup, but on rare occasions a system reboot may be necessary.
Insufficient snapshot space
- Issue: As with Windows, snapshot storage within Linux requires some free space on the volume. By default, both Windows and Linux allow for using 10% of the volume size for storing snapshot data. Snapshot data is sparsely configured, so storage is only used on demand. If either (a) the configured maximum snapshot storage size is exceeded, or (b) the volume runs out of free space during a backup, then the snapshot will be lost and the backup will fail.
Solution:Agent configuration parameters exist to adjust the amount of space available for snapshot usage:
- bdsnap_file_size_percent = <integer> (percentage) sets the maximum snapshot space globally for all volumes as some integer percentage of each volume’s total size.
- To adjust only specific volumes, you may use:
- bdsnap_file_size_percent_by_volume = [ “<mount point> ; <percentage>”] where each entry is a string consisting of a mount path to a partition and a percentage, separated by a semi-colon. For example bdsnap_file_size_percent_by_volume = [ “/dev/sda5;15”, “/mnt/data;5”] etc.
- Path may be a device path, mount point, UUID, or label of a volume.
- Configuring custom snapshot size limits is available in Agent release 3.2.0 and higher.
Note: For the Linux agent only: To configure snapshot storage limits for Windows, please use vssadmin.
Insufficient free disk space for snapshots
|Note: The two suggested solutions below [ storage_redirect and storage_redirect_by_volume] are only available with Linux agent 3.3.0 and higher.|
- Issue: If a volume has limited free disk space, the Elastio snapshot data may be relocated to another volume.
- Solution: Relocate ALL snapshot data to a single location, use:
storage_redirect = “<path>”
where <path> may be a device path, mount point, UUID, or label of a volume.
If specified, all snapshot data will be stored at the specified path:
To reconfigure specific volume snapshots to another location, use:
Storage_redirect_by_volume = [“<path1;path2>”]
where the parameter is a list of strings.
Each string contains
The strings are separated by a semi-colon.
Each path may be a device path, mount point, UUID, or label of a volume.
FAT volume support
Agent 3.x has explicit support for backup of FAT/FAT32 volumes under both Windows and Linux.
By default, the agent will always back up the system boot-related volumes (both the required EFI partition as well as the FAT-formatted Windows system partitions.)
To enable backup of other FAT/FAT32 volumes, you must (a) enable backup for FAT and (b) explicitly include the target volume(s) in the backup_volumes parameter.
Note: When you specify a FAT volume in the backup_volumes parameter, you must also explicitly declare all other volumes that are included in the backup.
When backup_volumes is not blank, it means “Backup only these specified volumes.”
- set the parameter enable_fat = true
- set backup_volumes = [“<path1>”,”path2”, …] to designate all volumes to be included in the backup.
Each path may be a device path, mount point, UUID, or label of a volume.
Hole punch activity is causing poor backup performance
By default, the agent tries to reduce storage space on the backup server by advising the server when blocks on the filesystem have been deleted.
The agent does this by sending a special ‘hole punch’ command for each deleted block. This hole punch activity physically deletes and clears the unused blocks from the ZFS file system on the server.
On rare occasions, some systems have reported either (a) excessively high block deletions or (b) particularly slow disk performance where detecting and sending ‘hole punch’ data has unacceptably degraded backup performance.
You may identify hole punch activity within the log files as such:
If this process appears to be taking an excessively long time, you may disable hole punch operations with this setting:
inc_send_punchs = false
Setting this parameter is available on Agent version 3.3.0 and newer.