You’ve probably heard about Kdump before, but you might not know how to debug it when things go wrong in your virtual machines.
First off, what Kdump is and why it’s important for virtual machines. Essentially, Kdump provides a mechanism to dump kernel memory in case of an unexpected system crash or panic. This can be incredibly useful when troubleshooting issues with your VM because you can analyze the contents of that memory dump to figure out what went wrong.
Now, let’s say you have a virtual machine running on Ubuntu Server 18.04 and it suddenly crashes for no apparent reason. You might see something like this in your logs:
[ 2.537697] BUG: soft lockup CPU#1 stuck for 16s! [ 0.000CPUs] # This line indicates a bug in the code, specifically a soft lockup on CPU #1 for 16 seconds.
[ 2.548428] ---[ heap ]---
[ 2.548430] PC is at vmalloc+0x7f/0xffffffffc000007f SMP PF 6: [<c000007f>] # This line shows the location of the bug, specifically in the vmalloc function.
[ 2.548431] ---[ stack ]--- # This line indicates the stack trace, which can help identify the cause of the bug.
This indicates that CPU#1 has been stuck for 16 seconds, which is a classic sign of a system hang or lockup. But how do you go about debugging this issue? Well, let’s start by checking if Kdump was enabled during the installation process:
# This script checks if Kdump was enabled during the installation process by using the dpkg command to list all installed packages and filtering for kdump-tools.
# The -l flag lists all installed packages, and the | (pipe) symbol sends the output to the next command.
# The grep command searches for the keyword "kdump-tools" in the output of the previous command.
# The ii at the beginning of each line indicates that the package is installed and the version number.
# The amd64 and all at the end of each line indicate the architecture and package type.
$ dpkg -l | grep kdump-tools
ii kexec-tools 2.0.6ubuntu18.04 amd64 Tools for handling kernel crashes and system reboots
ii kdump-tools 2:1.3.7-5ubuntu18.04 all Kernel crash dump tools
Great, it looks like both kexec-tools and kdump-tools are installed! Now let’s check if they were enabled during the installation process by running:
# This script checks if kexec-tools and kdump-tools are installed and enabled during the installation process.
# The following command uses dpkg to list all installed packages, filters for packages containing 'kdump' or 'kexec',
# and prints the second column (package names) using awk. The output is then sorted and duplicates are removed using uniq.
$ dpkg -l | grep 'kdump\|kexec' | awk '{print $2}' | sort | uniq
# The output should include the following packages:
# kdump-tools/focal,now 2:1.3.7-5ubuntu18.04 amd64 [installed]
# kexec-tools/focal,now 2.0.6ubuntu18.04 amd64 [installed]
Awesome! Both kdump and kexec are installed and enabled. Now let’s check if Kdump was configured correctly during the installation process:
# This script checks the configuration of Kdump during the installation process.
# The "cat" command is used to display the contents of a file.
# In this case, the file being displayed is "/proc/sys/kernel/crash_dump_filter".
# The output of the "cat" command should include a hexadecimal range.
# The first part of the range, "0xffffffffc01000007f", represents the start address of the memory range to be dumped.
# The second part of the range, "0xffffffffc020000", represents the end address of the memory range to be dumped.
# However, there is a missing "0" at the end of the second part of the range.
# This could potentially cause issues with the Kdump configuration.
$ cat /proc/sys/kernel/crash_dump_filter
0xffffffffc01000007f-0xffffffffc0200000
This indicates that Kdump was configured to dump kernel memory from 0x00C01000007F (inclusive) up to and including 0x00C02000000. This is a good starting point for analyzing the contents of our crash dump!
Now let’s say you want to manually enable Kdump on your virtual machine. You can do this by running:
# This command reconfigures the kexec-tools and kdump-tools packages with sudo privileges
sudo dpkg-reconfigure kexec-tools kdump-tools
The `sudo` command allows the user to run the following command with root privileges. This is necessary for reconfiguring system packages.
# This command prompts the user to select options for configuring Kdump
dpkg-reconfigure kexec-tools kdump-tools
The `dpkg-reconfigure` command is used to reconfigure a previously installed package. In this case, we are reconfiguring the `kexec-tools` and `kdump-tools` packages. This command will prompt the user to select options for configuring Kdump.
# This command enables Kdump on the virtual machine
sudo systemctl enable kdump-tools
The `systemctl enable` command is used to enable a system service. In this case, we are enabling the `kdump-tools` service, which is responsible for managing Kdump on the virtual machine.
# This command starts the Kdump service
sudo systemctl start kdump-tools
The `systemctl start` command is used to start a system service. In this case, we are starting the `kdump-tools` service, which will now be responsible for managing Kdump on the virtual machine.
# This command checks the status of the Kdump service
sudo systemctl status kdump-tools
The `systemctl status` command is used to check the status of a system service. In this case, we are checking the status of the `kdump-tools` service to ensure that it is running properly.
This will launch a configuration wizard that asks if you want to enable Kdump and configure it for your system. Follow the prompts and answer ‘Yes’ when prompted!
And there you have it, You now know how to debug Kdump in virtual machines using Linux package management tools like dpkg and awk. Remember, always be careful when working with kernel memory dumps because they can contain sensitive information that should not be shared publicly.