Automate Zombie Process Management for Greener Ops

Killing zombie processes is important to manage our systems. Actually, this is not only a good way to manage our servers. In GreenOps practices, killing zombie processes has many positive points for us.

Usually, we manage zombie processes for the following reasons:

  1. Freeing System Resources: Zombie processes still occupy a slot in the process table. Eliminating them helps free up these limited system resources, ensuring smoother operations and avoiding potential system slowdowns.
  2. Improving System Performance: While zombies don’t use CPU or memory, having many zombie processes can clutter the process table, making system management tasks more complex and slightly impacting performance. Removing them contributes to overall system efficiency.
  3. Preventing System Instability: Accumulating too many zombie processes can eventually fill the process table, leading to the inability to create new processes. Killing them ensures the system remains stable and operational.
  4. Improving Monitoring and Management: Zombie processes can confuse monitoring systems or administrators by appearing in process lists even though they’re not doing any useful work. Removing them simplifies system diagnostics and improves management clarity.
  5. Avoiding Memory Leaks: While zombies themselves don’t consume memory, their parent processes might hold on to resources waiting to retrieve the zombie’s exit status. Killing them ensures proper release of those resources.

In the practice of Green Software, it is essential to maximize the efficiency of hardware and computing resources. Therefore, the benefits highlighted in points 1 and 2 above are crucial aspects that contribute significantly to the implementation of Green Software practices.

    In addition to this, zombie processes will increase the attack surfaces. The accumulation of zombie processes could open unnecessary ports and also indicate bugs or misconfigurations in the parent processes. Attackers could take advantage of these weaknesses to disrupt the systems. Since security attacks like DDos generate carbon emissions with too much traffic, a flawless system contributes to reducing carbon emissions. Therefore, effectively managing zombie processes not only helps prevent attacks from attackers but also contributes to the reduction of CO2 emissions.

    Now we have understood why managing zombie processes is important. However, Zombie processes are typically inconspicuous, so regular monitoring of process lists is necessary to identify and manage them. This task can be time-consuming and prone to oversight if done manually.

    In this article, I’ll introduce you to how to manage zombie processes automatically with Ansible. If we can manage Zombie processes automatically, it will lead our system more GreenOps way.

    Note:

  1.  This article referred to a book Building Green Software: A Sustainable Approach to Software Development and Operations. You can refer for more information about GreenOps. https://www.amazon.co.jp/Building-Green-Software-Sustainable-Development/dp/1098150627
  2. You can download the example source code from here:
    https://github.com/mahoutukaisali/LightSwitchOps_Lab/tree/main

Implementation

  • Step 1. Prepare a script to generate Zombie processes for the test purposes.


zombie.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main(int argc, char *argv[]) {
    pid_t pid;

    pid = fork();
    if (pid == 0) { // child process
        execl("/bin/sleep", "sleep", "5", NULL);
    } else { // parent process
        int status;

        sleep(180);
        exit(0);

        waitpid(pid, &status, 0);
        printf("PID %d has been finished; exit status: %d", pid, status);
        exit(0);
    }
}

Once you created zombie.c script, compile it:

$ gcc -o zombie zombie.c

  • Step 2. Implement Ansible Playbook. This will detect Zombie processes and kill them.
---
- name: Kill zombie processes
  hosts: localhost
  become: yes
  tasks:
    - name: Find zombie processes
      ansible.builtin.shell: "ps aux | awk '{ if ($8 ~ /Z/) print $2 }'"
      register: zombie_processes
      changed_when: false

    - name: Terminate zombie processes if exist
      block:
        - name: Debug show zombie processes
          debug:
            msg: "{{ zombie_processes.stdout_lines }}"
        
        - name: Identify parent PIDs of zombie processes
          ansible.builtin.shell: "ps aux | awk '{ if ($8 ~ /Z/) print $2 }' | xargs -I{} ps -o ppid= -p {}"
          register: parent_pids
          changed_when: false

        - name: Debug Show parent PIDs
          debug:
            msg: "{{ parent_pids.stdout_lines }}"

        - name: Terminate parent processes with SIGHUP
          ansible.builtin.shell: kill {{ item }}
          loop: "{{ parent_pids.stdout_lines }}"
          register: terminated_parent_processes
          ignore_errors: yes

        - name: Output message
          debug:
            msg: "{{ 'terminated processes are ' + terminated_parent_processes.stdout_lines }}" 

      when: zombie_processes.stdout != ""

Here’s a breakdown of what each part does:

  1. Find Zombie Processes:
- name: Find zombie processes
  ansible.builtin.shell: "ps aux | awk '{ if ($8 ~ /Z/) print $2 }'"
  register: zombie_processes
  changed_when: false
  • Description: This task uses the ps aux command combined with awk to identify zombie processes (where the process status $8 is Z). It prints the process IDs (PIDs) of these zombie processes.
  • Register: The result is stored in a variable named zombie_processes.
  • Changed_when: Set to false, indicating that this task does not change the system state because by default, the “shell” module runs commands that will be regarded as the status “changed” by Ansible.

2. Terminate Zombie Processes if They Exist: This block is executed only if zombie processes are found.

  • Debug Show Zombie Processes:
- name: Debug show zombie processes
  debug:
    msg: "{{ zombie_processes.stdout_lines }}"

Description: Displays the list of zombie process PIDs found in the previous step for debugging purposes.

  • Identify Parent PIDs of Zombie Processes:
- name: Identify parent PIDs of zombie processes
  ansible.builtin.shell: "ps aux | awk '{ if ($8 ~ /Z/) print $2 }' | xargs -I{} ps -o ppid= -p {}"
  register: parent_pids
  changed_when: false

Description: Finds the parent process IDs (PPIDs) of the zombie processes.

Register: The result is stored in the variable parent_pids.

  • Debug Show Parent PIDs:
- name: Debug Show parent PIDs
  debug:
    msg: "{{ parent_pids.stdout_lines }}"

Terminate Parent Processes with SIGHUP:

- name: Terminate parent processes with SIGHUP
  ansible.builtin.shell: kill {{ item }}
  loop: "{{ parent_pids.stdout_lines }}"
  register: terminated_parent_processes
  ignore_errors: yes
  • Description: Sends a SIGHUP signal to each parent process identified in the previous step. This signal typically causes the process to reload its configuration or terminate if it is no longer needed.
  • Loop: Iterates over the list of parent PIDs.
  • Register: Stores the result of the kill commands in terminated_parent_processes.
  • Ignore_errors: Set to yes, meaning that errors during this task are ignored, allowing the playbook to continue.

Output Message:

- name: Output message
  debug:
    msg: "{{ 'terminated processes are ' + terminated_parent_processes.stdout_lines }}"
  • Description: Displays a message indicating which processes were terminated, using the results stored in terminated_parent_processes.

Let’s test it!

After creating all of these files, let’s take a look!

First, run a compiled script “zombie” to generate Zombie processes.

$ ./zombie.c

Open the another new terminal, run the below command:

$ ps aux

We can find there are several Zombie processes.

user01      104161  0.0  0.0      0     0 pts/2    Z+   16:52   0:00 [sleep] <defunct>

Let’s sort out them only as the process numbers.

$ ps aux | awk '{ if ($8 ~ /Z/) print $2 }'
104160

Identify the parents’ number

$ ps aux | awk '{ if ($8 ~ /Z/) print $2 }' | xargs -I{} ps -o ppid= -p {}
104160

We can confirm whether the detected parent process is correct or not from the `ps aux` output above. It looks perfect:

user01       104160  0.0  0.0   4316   932 pts/2    S+   16:52   0:00 ./zombie

Finally, let’s kill this Zombie process. To do this, we need to kill its parent’s processes

$ kill 104160

Summary:

    This Ansible Playbook does not contain the tasks to restart the parent processes, but we can implement additional tasks to restart the parent services immediately after killing them. Also, we can detect which services often generate Zombie processes based on the Ansible execution log so that we can utilize it to improve the source code of the software.

    As we have seen so far, we can automatically detect Zombie processes and manage them using Ansible. This will help us to administrate our system in a greener way. 

Leave a Comment

Your email address will not be published. Required fields are marked *