Monitoring Tools for Cloud Applications

There are a million tools out there for monitoring cloud applications.  The following is a list of the tools my current company has selected to monitor all levels of our stack.  Overall we are very happy with this suite.  Please post in the comments if you can recommend others!

Machine Level

Nagios – Nagios is used to monitor your IT infrastructure.  This tool will generate alerts (via email, text, etc.) when your CPU, threads, disk space, etc. exceeds a given threshold.  The great thing about nagios is that it’s completely free!

Logging

Papertrail – Papertrail is a log aggregator and search tool.  Like most applications, our system runs across a wide variety of EC2 instances on AWS, across many different deployment environments.  Papertrail is great because it aggregates all your logs together in one place.  No need to figure out which machine your user is on, ssh into it, etc.  Just use the web search on papertrail and then easily drill down into your logs.  Well worth the money!

Exceptions

Sentry – Sentry is a great tool that will parse your system logs and aggregate exception errors.  It will create a nice dashboard of all of your system exceptions, group similar ones together.  It allows you to track who is looking into the exceptions, assign them to developers, and even export them to JIRA.  Overall it’s just a great way to give visibility to the system exceptions without having to go through your logs manually.




Performance

New Relic – New Relic is a beast.  They have been rolling out tool after tool.  The one we use most is APM (Application Performance Monitoring), which gives you alerting on system errors and performance bottlenecks.  It provides a great way to drill down into your most time consuming transactions and help guide refactoring.  They provide a host of other tools but we don’t use them that often.

API

Runscope – Runscope is frekkin awesome!  It provides an easy way to create tests that validate REST API’s.  We’ve found this tool to be key for all integration points across our organization.  If a team provides a service that others can take a dependency on, we ensure that a runscope test is in place.  If the API is ever broken, everyone will know immediately.  The tests can be run globally, and there is also some performance monitoring included as well.

 

I know I’m probably missing a bunch of tools, but these are the ones we currently use, and they seem to cover all the bases.  Also, check out my other article “When your production system goes down”, for my personal strategy for handling prod issues, when the alarms do start ringing!

Please comment and tell me what tools you like to use?

 

9 thoughts on “Monitoring Tools for Cloud Applications”

  1. We developed CronAlarm for what we consider to be an often overlooked area of enterprise monitoring – cron jobs and scheduled tasks. Other options didn’t quit meet our needs so we came up with this and have found it to be an integral part of our monitoring solution.

    1. Nice! This has been a big pain point for us. Right now we find out from customer’s when one of our crons doesn’t run, which is pretty embarrassing. I will totally check this out, thanks!

    1. Thanks for visiting the site, John! Keep up the great work! We use runscope daily and it has saved our butts many times! Worth every penny!

  2. Dave – Please take a look at a new Cloud Uptime Monitoring service: HappyApps (http://www.happyapps.io):
    – Easy setup for monitoring Apps, DBs, & IT systems across multiple clouds
    – Modern, Clean, and easy to understand status and incident dashboard
    – Ability to monitor systems as groups
    – Clear visibility into group hierarchy and system dependence mapping
    – Unique noise reduction technology that only delivers meaningful alerts (Email & SMS). No more false positives and unnecessary alerts.

    Do give this a try. It’s FREE! http://www.happyapps.io

    1. Thanks for visiting Ashish! We are currently missing a tool like this to pull all the reporting together in one place. We were thinking about using something like domo, but this looks like it could be a better solution. Thanks for letting us know!

  3. The large companies known for their traditional data center monitoring applications have been slow to hit the cloud market, and what products they do have are rehashes of existing applications that do little in the way of providing more than reporting and alerting tools. CA is on an acquisition spree to fix this and just acquired 3Tera , a cloud provisioning player.

Leave a Reply

Your email address will not be published. Required fields are marked *