Killing zombie processes is important to manage our systems. Actually, this is not only a good way to manage our servers. In GreenOps practices, killing zombie processes has many positive points for us.
Usually, we manage zombie processes for the following reasons:
- Freeing System Resources: Zombie processes still occupy a slot in the process table. Eliminating them helps free up these limited system resources, ensuring smoother operations and avoiding potential system slowdowns.
- Improving System Performance: While zombies don’t use CPU or memory, having many zombie processes can clutter the process table, making system management tasks more complex and slightly impacting performance. Removing them contributes to overall system efficiency.
- Preventing System Instability: Accumulating too many zombie processes can eventually fill the process table, leading to the inability to create new processes. Killing them ensures the system remains stable and operational.
- Improving Monitoring and Management: Zombie processes can confuse monitoring systems or administrators by appearing in process lists even though they’re not doing any useful work. Removing them simplifies system diagnostics and improves management clarity.
- Avoiding Memory Leaks: While zombies themselves don’t consume memory, their parent processes might hold on to resources waiting to retrieve the zombie’s exit status. Killing them ensures proper release of those resources.
In the practice of Green Software, it is essential to maximize the efficiency of hardware and computing resources. Therefore, the benefits highlighted in points 1 and 2 above are crucial aspects that contribute significantly to the implementation of Green Software practices.
In addition to this, zombie processes will increase the attack surfaces. The accumulation of zombie processes could open unnecessary ports and also indicate bugs or misconfigurations in the parent processes. Attackers could take advantage of these weaknesses to disrupt the systems. Since security attacks like DDos generate carbon emissions with too much traffic, a flawless system contributes to reducing carbon emissions. Therefore, effectively managing zombie processes not only helps prevent attacks from attackers but also contributes to the reduction of CO2 emissions.
Now we have understood why managing zombie processes is important. However, Zombie processes are typically inconspicuous, so regular monitoring of process lists is necessary to identify and manage them. This task can be time-consuming and prone to oversight if done manually.
In this article, I’ll introduce you to how to manage zombie processes automatically with Ansible. If we can manage Zombie processes automatically, it will lead our system more GreenOps way.
Note:
- This article referred to a book Building Green Software: A Sustainable Approach to Software Development and Operations. You can refer for more information about GreenOps. https://www.amazon.co.jp/Building-Green-Software-Sustainable-Development/dp/1098150627
- You can download the example source code from here:
https://github.com/mahoutukaisali/LightSwitchOps_Lab/tree/main
Implementation
- Step 1. Prepare a script to generate Zombie processes for the test purposes.
zombie.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(int argc, char *argv[]) {
pid_t pid;
pid = fork();
if (pid == 0) { // child process
execl("/bin/sleep", "sleep", "5", NULL);
} else { // parent process
int status;
sleep(180);
exit(0);
waitpid(pid, &status, 0);
printf("PID %d has been finished; exit status: %d", pid, status);
exit(0);
}
}
Once you created zombie.c script, compile it:
$ gcc -o zombie zombie.c
- Step 2. Implement Ansible Playbook. This will detect Zombie processes and kill them.
---
- name: Kill zombie processes
hosts: localhost
become: yes
tasks:
- name: Find zombie processes
ansible.builtin.shell: "ps aux | awk '{ if ($8 ~ /Z/) print $2 }'"
register: zombie_processes
changed_when: false
- name: Terminate zombie processes if exist
block:
- name: Debug show zombie processes
debug:
msg: "{{ zombie_processes.stdout_lines }}"
- name: Identify parent PIDs of zombie processes
ansible.builtin.shell: "ps aux | awk '{ if ($8 ~ /Z/) print $2 }' | xargs -I{} ps -o ppid= -p {}"
register: parent_pids
changed_when: false
- name: Debug Show parent PIDs
debug:
msg: "{{ parent_pids.stdout_lines }}"
- name: Terminate parent processes with SIGHUP
ansible.builtin.shell: kill {{ item }}
loop: "{{ parent_pids.stdout_lines }}"
register: terminated_parent_processes
ignore_errors: yes
- name: Output message
debug:
msg: "{{ 'terminated processes are ' + terminated_parent_processes.stdout_lines }}"
when: zombie_processes.stdout != ""
Here’s a breakdown of what each part does:
- Find Zombie Processes:
- name: Find zombie processes
ansible.builtin.shell: "ps aux | awk '{ if ($8 ~ /Z/) print $2 }'"
register: zombie_processes
changed_when: false
- Description: This task uses the ps aux command combined with awk to identify zombie processes (where the process status $8 is Z). It prints the process IDs (PIDs) of these zombie processes.
- Register: The result is stored in a variable named zombie_processes.
- Changed_when: Set to false, indicating that this task does not change the system state because by default, the “shell” module runs commands that will be regarded as the status “changed” by Ansible.
2. Terminate Zombie Processes if They Exist: This block is executed only if zombie processes are found.
- Debug Show Zombie Processes:
- name: Debug show zombie processes
debug:
msg: "{{ zombie_processes.stdout_lines }}"
Description: Displays the list of zombie process PIDs found in the previous step for debugging purposes.
- Identify Parent PIDs of Zombie Processes:
- name: Identify parent PIDs of zombie processes
ansible.builtin.shell: "ps aux | awk '{ if ($8 ~ /Z/) print $2 }' | xargs -I{} ps -o ppid= -p {}"
register: parent_pids
changed_when: false
Description: Finds the parent process IDs (PPIDs) of the zombie processes.
Register: The result is stored in the variable parent_pids.
- Debug Show Parent PIDs:
- name: Debug Show parent PIDs
debug:
msg: "{{ parent_pids.stdout_lines }}"
Terminate Parent Processes with SIGHUP:
- name: Terminate parent processes with SIGHUP
ansible.builtin.shell: kill {{ item }}
loop: "{{ parent_pids.stdout_lines }}"
register: terminated_parent_processes
ignore_errors: yes
- Description: Sends a SIGHUP signal to each parent process identified in the previous step. This signal typically causes the process to reload its configuration or terminate if it is no longer needed.
- Loop: Iterates over the list of parent PIDs.
- Register: Stores the result of the kill commands in terminated_parent_processes.
- Ignore_errors: Set to yes, meaning that errors during this task are ignored, allowing the playbook to continue.
Output Message:
- name: Output message
debug:
msg: "{{ 'terminated processes are ' + terminated_parent_processes.stdout_lines }}"
- Description: Displays a message indicating which processes were terminated, using the results stored in terminated_parent_processes.
Let’s test it!
After creating all of these files, let’s take a look!
First, run a compiled script “zombie” to generate Zombie processes.
$ ./zombie.c
Open the another new terminal, run the below command:
$ ps aux
We can find there are several Zombie processes.
user01 104161 0.0 0.0 0 0 pts/2 Z+ 16:52 0:00 [sleep] <defunct>
Let’s sort out them only as the process numbers.
$ ps aux | awk '{ if ($8 ~ /Z/) print $2 }'
104160
Identify the parents’ number
$ ps aux | awk '{ if ($8 ~ /Z/) print $2 }' | xargs -I{} ps -o ppid= -p {}
104160
We can confirm whether the detected parent process is correct or not from the `ps aux` output above. It looks perfect:
user01 104160 0.0 0.0 4316 932 pts/2 S+ 16:52 0:00 ./zombie
Finally, let’s kill this Zombie process. To do this, we need to kill its parent’s processes
$ kill 104160
Summary:
This Ansible Playbook does not contain the tasks to restart the parent processes, but we can implement additional tasks to restart the parent services immediately after killing them. Also, we can detect which services often generate Zombie processes based on the Ansible execution log so that we can utilize it to improve the source code of the software.
As we have seen so far, we can automatically detect Zombie processes and manage them using Ansible. This will help us to administrate our system in a greener way.