Setting up a quota watcher agent in Python

This is a documentation for QuotaWatcher utility, a small cron job developed to monitor disk usage on our servers.

In this post I am going to explain how this agent works, what are the steps we need to build it and how it can be improved. Please feel free to comment and add your input

All the code is heavily pep8'd :) , I use PyCharm and Sublime to tackle every single formatting and quality problem

alert

Importing needed Libraries

    from __future__ import division

    __author__ = "Rad"
    __license__ = "GNU General Public License version 3"
    __date__ = "06/30/2015"
    __version__ = "0.2"

    try:
        import os
        from quota_logger import init_log
        import subprocess
        from prettytable import PrettyTable
        from smtplib import SMTP
        from smtplib import SMTPException
        from email.mime.text import MIMEText
        from argparse import ArgumentParser
    except ImportError:
        # Checks the installation of the necessary python modules
        import os
        import sys

        print((os.linesep * 2).join(
            ["An error found importing one module:", str(sys.exc_info()[1]), "You need to install it Stopping..."]))
        sys.exit(-2)

I like this way of importing libraries, if some libraries are not already installed, the system will exit. There is another room for improvement here, if a library does not exist, it is possile to install it automatically if we run the code as admin or with enough permission

The Notifier Class

    class Notifier(object):

        suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']

        def __init__(self, **kwargs):

            self.threshold = None
            self.path = None
            self.list = None
            self.email_sender = None
            self.email_password = None
            self.gmail_smtp = None
            self.gmail_smtp_port = None
            self.text_subtype = None
            self.cap_reached = False
            self.email_subject = None

            for (key, value) in kwargs.iteritems():
                if hasattr(self, key):
                    setattr(self, key, value)

            self._log = init_log()

We init the class as an object containing some features, this object will have a threshold upon which there will be an email triggered to a recipient list. This obect is looking ath the size of each subdirectory in path. You need to create an email addresse and add some variables to your PATH ( will be discussed later)

        @property
        def loggy(self):
            return self._log

We need to inherhit logging capabilities from the logging class we imported (see later the code of this class). This will allow us to log from within the class itself

        @staticmethod
        def load_recipients_emails(emails_file):
            recipients = [line.rstrip('\n') for line in open(emails_file) if not line[0].isspace()]
            return recipients

We need to lad the emails from a file created by the user. Usually I create 2 files, development_list containing only email adresses I will use for testing and production_list containing adresses I want to notify in production

        @staticmethod
        def load_message_content(message_template_file, table):
            template_file = open(message_template_file, 'rb')
            template_file_content = template_file.read().replace(
                "{{table}}", table.get_string())
            template_file.close()
            return template_file_content

Inspired by MVC apps, we load message body from a template, this template will contain a placeholder called {{table}} that will contain the table of subdirectories and their respective sizes

        def notify_user(self, email_receivers, table, template):
            """This method sends an email
            :rtype : email sent to specified members
            """
            # Create the message
            input_file = os.path.join(
                os.path.dirname(__file__), "templates/" + template + ".txt")
            content = self.load_message_content(input_file, table)

            msg = MIMEText(content, self.text_subtype)

            msg["Subject"] = self.email_subject
            msg["From"] = self.email_sender
            msg["To"] = ','.join(email_receivers)

            try:
                smtpObj = SMTP(self.gmail_smtp, self.gmail_smtp_port)
                # Identify yourself to GMAIL ESMTP server.
                smtpObj.ehlo()
                # Put SMTP connection in TLS mode and call ehlo again.
                smtpObj.starttls()
                smtpObj.ehlo()
                # Login to service
                smtpObj.login(user=self.email_sender, password=self.email_password)
                # Send email
                smtpObj.sendmail(self.email_sender, email_receivers, msg.as_string())
                # close connection and session.
                smtpObj.quit()
            except SMTPException as error:
                print "Error: unable to send email :  {err}".format(err=error)

notify_user is the function that will send an email to the users upon request. It loads the message body template and injects the table in it.

        @staticmethod
        def du(path):
            """disk usage in kilobytes"""
            # return subprocess.check_output(['du', '-s',
            # path]).split()[0].decode('utf-8')
            try:
                p1 = subprocess.Popen(('ls', '-d', path), stdout=subprocess.PIPE)
                p2 = subprocess.Popen((os.environ["GNU_PARALLEL"], '--no-notice', 'du', '-s', '2>&1'), stdin=p1.stdout,
                                      stdout=subprocess.PIPE)
                p3 = subprocess.Popen(
                    ('grep', '-v', '"Permission denied"'), stdin=p2.stdout, stdout=subprocess.PIPE)
                output = p3.communicate()[0]
            except subprocess.CalledProcessError as e:
                raise RuntimeError("command '{0}' return with error (code {1}): {2}".format(
                    e.cmd, e.returncode, e.output))
            # return ''.join([' '.join(hit.split('\t')) for hit in output.split('\n')
            # if len(hit) > 0 and not "Permission" in hit and output[0].isdigit()])
            result = [' '.join(hit.split('\t')) for hit in output.split('\n')]
            for line in result:
                if line and len(line.split('\n')) > 0 and "Permission" not in line and line[0].isdigit():
                    return line.split(" ")[0]

This is a wrapper of the famous du command. I use GNU_PARALLEL in case we have a lot of subdirectories and in case we don't want to wait for sequential processing. Note that we could have done this in multithreading as well

        def du_h(self, nbytes):
            if nbytes == 0:
                return '0 B'
            i = 0
            while nbytes >= 1024 and i < len(self.suffixes) - 1:
                nbytes /= 1024.
                i += 1
            f = ('%.2f'.format(nbytes)).rstrip('0').rstrip('.')
            return '%s %s'.format(f, self.suffixes[i])

I didn't want to use the -h flag because we may want to sum up subdirectories sizes or doing other postprocessing, we'd rather keep them in a unified format (unit). For a more human readable format, we can use du_h() method

        @staticmethod
        def list_folders(given_path):
            user_list = []
            for path in os.listdir(given_path):
                if not os.path.isfile(os.path.join(given_path, path)) and not path.startswith(".") and not path.startswith(
                        "archive"):
                    user_list.append(path)
            return user_list

we need at some point to return a list of subdirectories, each will be passed through the same function (du)

        def notify(self):
            global cap_reached
            self._log.info("Loading recipient emails...")
            list_of_recievers = self.load_recipients_emails(self.list)
            paths = self.list_folders(self.path)
            paths = [self.path + user for user in paths]
            sizes = []
            for size in paths:
                try:
                    self._log.info("calculating disk usage for " + size + " ...")
                    sizes.append(int(self.du(size)))
                except Exception, e:
                    self._log.exception(e)
                    sizes.append(0)
            # sizes = [int(du(size).split(' ')[0]) for size in paths]
            # convert kilobytes to bytes
            sizes = [int(element) * 1000 for element in sizes]
            table = PrettyTable(["Directory", "Size"])
            table.align["Directory"] = "l"
            table.align["Size"] = "r"
            table.padding_width = 5
            table.border = False
            for account, size_of_account in zip(paths, sizes):
                if int(size_of_account) > int(self.threshold):
                    table.add_row(
                        ["*" + os.path.basename(account) + "*", "*" + self.du_h(size_of_account) + "*"])
                    self.cap_reached = True
                else:
                    table.add_row([os.path.basename(account), self.du_h(size_of_account)])
            # notify Admins
            table.add_row(["TOTAL", self.du_h(sum(sizes))])
            table.add_row(["Usage", str(sum(sizes) / 70000000000000)])
            self.notify_user(list_of_recievers, table, "only_admins")
            if self.cap_reached:
                self.notify_user(list_of_recievers, table, "default_size_limit")

        def run(self):
            self.notify()

Finally we create the function that will bring all this protocol together :

  • Read the list of recievers
  • load the path we want to look into
  • for each subdirectory calculate the size of it and append it to a list
  • create a Table to be populated row by row
  • add subdirectories and their sizes
  • Calculate the total of sizes in subdirectories
  • If one of the subdirectories has a size higher than the threshold specified, trigger the email
  • Report the usage as a percentage
    def arguments():
        """Defines the command line arguments for the script."""
        main_desc = """Monitors changes in the size of dirs for a given path"""

        parser = ArgumentParser(description=main_desc)
        parser.add_argument("path", default=os.path.expanduser('~'), nargs='?',
                            help="The path to monitor. If none is given, takes the  home directory")
        parser.add_argument("list", help="text file containing the list of persons to be notified, one per line")
        parser.add_argument("-s", "--notification_subject", default=None, help="Email subject of the notification")
        parser.add_argument("-t", "--threshold", default=2500000000000,
                            help="The threshold that will trigger the notification")
        parser.add_argument("-v", "--version", action="version",
                            version="%(prog)s {0}".format(__version__),
                            help="show program's version number and exit")
        return parser

The program takes in account : the path to examine, the list of emails in a file, the subject of the alert, the thresold that will trigger the email (here by defailt 2.5T)

    def main():

        args = arguments().parse_args()
        notifier = Notifier()
        loggy = notifier.loggy
        # Set parameters
        loggy.info("Starting QuotaWatcher session...")
        loggy.info("Setting parameters ...")
        notifier.list = args.list
        notifier.threshold = args.threshold
        notifier.path = args.path

        # Configure the app
        try:
            loggy.info("Loading environment variables ...")
            notifier.email_sender = os.environ["NOTIFIER_SENDER"]
            notifier.email_password = os.environ["NOTIFIER_PASSWD"]
            notifier.gmail_smtp = os.environ["NOTIFIER_SMTP"]
            notifier.gmail_smtp_port = os.environ["NOTIFIER_SMTP_PORT"]
            notifier.text_subtype = os.environ["NOTIFIER_SUBTYPE"]
            notifier.email_subject = args.notification_subject
            notifier.cap_reached = False
        except Exception, e:
            loggy.exception(e)

        notifier.run()
        loggy.info("End of QuotaWatcher session")

Note that in the main we load some environment variable that you should specify in advance. This is up to the user to fill these out, It is always preferable to declare these as environment variable, most of the time these are confidential so we better not show them here, it is always safe to set environment variable for these

That's it

this is an example of the LOG output.

2015-07-03 10:40:46,968 - quota_logger - INFO - Starting QuotaWatcher session...  
2015-07-03 10:40:46,969 - quota_logger - INFO - Setting parameters ...  
2015-07-03 10:40:46,969 - quota_logger - INFO - Loading environment variables ...  
2015-07-03 10:40:46,969 - quota_logger - INFO - Loading recipient emails...  
2015-07-03 10:40:47,011 - quota_logger - INFO - calculating disk usage for amcpherson ..  
.
2015-07-03 11:21:09,442 - quota_logger - INFO - calculating disk usage for andrewjlroth  
...
2015-07-03 15:31:41,500 - quota_logger - INFO - calculating disk usage for asteif ...  
2015-07-03 15:40:34,268 - quota_logger - INFO - calculating disk usage for clefebvre ...  
2015-07-03 15:42:47,483 - quota_logger - INFO - calculating disk usage for dgrewal ...  
2015-07-03 16:01:30,588 - quota_logger - INFO - calculating disk usage for fdorri ...  
2015-07-03 16:03:43,850 - quota_logger - INFO - calculating disk usage for fong ...  
2015-07-03 16:16:13,781 - quota_logger - INFO - calculating disk usage for gha ...  
2015-07-03 16:16:38,673 - quota_logger - INFO - calculating disk usage for jding ...  
2015-07-03 16:16:50,820 - quota_logger - INFO - calculating disk usage for cdesouza ...  
2015-07-03 16:16:52,585 - quota_logger - INFO - calculating disk usage for jrosner ...  
2015-07-03 16:27:30,684 - quota_logger - INFO - calculating disk usage for jtaghiyar ...  
2015-07-03 16:28:16,982 - quota_logger - INFO - calculating disk usage for kareys ...  
2015-07-03 19:21:07,607 - quota_logger - INFO - calculating disk usage for hfarahani ...  
2015-07-03 19:22:07,618 - quota_logger - INFO - calculating disk usage for jzhou ...  
2015-07-03 19:38:28,147 - quota_logger - INFO - calculating disk usage for pipelines ...  
2015-07-03 19:53:20,771 - quota_logger - INFO - calculating disk usage for projects ...  
2015-07-03 20:52:45,001 - quota_logger - INFO - calculating disk usage for raniba ...  
2015-07-03 20:59:50,543 - quota_logger - INFO - calculating disk usage for tfunnell ...  
2015-07-03 21:00:47,216 - quota_logger - INFO - calculating disk usage for ykwang ...  
2015-07-03 21:03:30,277 - quota_logger - INFO - calculating disk usage for azhang ...  
2015-07-03 21:03:30,820 - quota_logger - INFO - calculating disk usage for softwares ...  
2015-07-03 21:03:42,679 - quota_logger - INFO - calculating disk usage for sjewell ...  
2015-07-03 21:03:51,711 - quota_logger - INFO - calculating disk usage for kastonl ...  
2015-07-03 21:04:52,536 - quota_logger - INFO - calculating disk usage for amazloomian .  
..
2015-07-03 21:07:43,501 - quota_logger - INFO - End of QuotaWatcher session  

And as of the email triggered, it will look like

** THIS IS AN ALERT MESSAGE : DISK USAGE SPIKE **

This is a warning message about the disk usage relative to the Shahlab group at GSC

We detected a spike > 2.5 T for some accounts and here is a list of the space usage per account reported today


    Directory                   Size     
    amcpherson               1.96 TB     
    andrewjlroth           390.19 GB     
    asteif                   2.05 TB     
    clefebvre               16.07 GB     
    dgrewal                  1.61 TB     
    fdorri                 486.49 GB     
    *fong*                 *9.67 TB*     
    gha                      50.7 GB     
    jding                  638.72 GB     
    cdesouza                56.15 GB     
    jrosner                  1.82 TB     
    jtaghiyar              253.84 GB     
    *kareys*              *11.26 TB*     
    hfarahani                1.09 TB     
    jzhou                    1.19 TB     
    pipelines                 2.1 TB     
    *projects*             *4.09 TB*     
    raniba                   2.03 TB     
    tfunnell                 1.02 TB     
    ykwang                   1.71 TB     
    azhang                  108.4 MB     
    softwares               34.67 GB     
    sjewell                 24.53 GB     
    kastonl                118.51 GB     
    amazloomian              1.71 TB     
    TOTAL                   45.34 TB     
    Usage                    71.218%     


Please do the necessary to remove temporary files and take the time to clean up your working directories

Thank you for your cooperation

(am a cron job, don't reply to this message, if you have questions ask Ali)



PS : This is a very close estimation, some directories may have strict permissions, for an accurate disk usage please make sure that you set your files permissions so that anyone can see them.  

The logger

    import logging
    import datetime

    def init_log():
        current_time = datetime.datetime.now()
        logger = logging.getLogger(__name__)
        logger.setLevel(logging.INFO)
        handler = logging.FileHandler(current_time.isoformat()+'_quotawatcher.log')
        handler.setLevel(logging.INFO)
        # create a logging format
        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
        handler.setFormatter(formatter)
        logger.addHandler(handler)
        return logger

Before you start

export NOTIFIER_SENDER="your_email@gmail.com"  
export NOTIFIER_PASSWD="passwordhere"  
export NOTIFIER_SMTP="smtp.gmail.com"  
export NOTIFIER_SMTP_PORT=587  
export NOTIFIER_SUBTYPE="plain"  
export GNU_PARALLEL="/path/to/your/gnu/parallel"

How to run the program

python quotawatcher.py  dev_list -s "Hey Test" -t 2500000000000  



comments powered by Disqus