Monitoring Tools for Cloud Applications

There are a million tools out there for monitoring cloud applications.  The following is a list of the tools my current company has selected to monitor all levels of our stack.  Overall we are very happy with this suite.  Please post in the comments if you can recommend others!

Machine Level

Nagios – Nagios is used to monitor your IT infrastructure.  This tool will generate alerts (via email, text, etc.) when your CPU, threads, disk space, etc. exceeds a given threshold.  The great thing about nagios is that it’s completely free!

Logging

Papertrail – Papertrail is a log aggregator and search tool.  Like most applications, our system runs across a wide variety of EC2 instances on AWS, across many different deployment environments.  Papertrail is great because it aggregates all your logs together in one place.  No need to figure out which machine your user is on, ssh into it, etc.  Just use the web search on papertrail and then easily drill down into your logs.  Well worth the money!

Exceptions

Sentry – Sentry is a great tool that will parse your system logs and aggregate exception errors.  It will create a nice dashboard of all of your system exceptions, group similar ones together.  It allows you to track who is looking into the exceptions, assign them to developers, and even export them to JIRA.  Overall it’s just a great way to give visibility to the system exceptions without having to go through your logs manually.




Performance

New Relic – New Relic is a beast.  They have been rolling out tool after tool.  The one we use most is APM (Application Performance Monitoring), which gives you alerting on system errors and performance bottlenecks.  It provides a great way to drill down into your most time consuming transactions and help guide refactoring.  They provide a host of other tools but we don’t use them that often.

API

Runscope – Runscope is frekkin awesome!  It provides an easy way to create tests that validate REST API’s.  We’ve found this tool to be key for all integration points across our organization.  If a team provides a service that others can take a dependency on, we ensure that a runscope test is in place.  If the API is ever broken, everyone will know immediately.  The tests can be run globally, and there is also some performance monitoring included as well.

 

I know I’m probably missing a bunch of tools, but these are the ones we currently use, and they seem to cover all the bases.  Also, check out my other article “When your production system goes down”, for my personal strategy for handling prod issues, when the alarms do start ringing!

Please comment and tell me what tools you like to use?

 

Promote Yourself

PromoteLike most, I first started in the software industry as an engineer.  I was taught growing up that hard work in the end will be rewarded.  My strong work ethic as an engineer resulted in me quickly being promoted up from individual contributor to team lead, manager, and then eventually director.

Then I noticed a big change.

When you are an individual contributor, you are often recognized by your peers and management for your contributions.  You are asked to complete a specific task or project, and the results are usually demonstrable and easily recognizable.

As you move up into management everything changes.

When you move into management, there are fewer people above you to recognize your contributions.  Oftentimes, your manager doesn’t understand the technical contributions you are making to the team, because they are non-technical.

Also, the role of a manager is much fuzzier.  How can you quantify if you were successful at motivating or mentoring your team?

When I first moved into a management position, I thought it was most important to focus on managing down.  My primary responsibility is to get my team to execute, right?

Well, that is only PART of the role.  The other part of the manager’s job is to clearly communicate up (your boss) and across (your peers) your personal accomplishments as well as those of your team.

You have to become a salesman!  This is not easy to do, especially for those of us introverted engineers.

You need to get out of your comfort zone!  Instead of spending your whole day with the team, force yourself to walk around the office and meet one new person a week.  Tell them what you do and what your team is working on.  What your challenges are.  See if there is any way you can help these people that you run into.  Are any of the projects or initiatives that your team is involved in relevant to this person?

When your team hits a big milestone, be sure to communicate it out to the organization.  Are the other dev teams aware?  How about the rest of the product group?  Promote your team to the organization.  Be proud of their accomplishments.  This is not slimy or devious.  This is basic business communication.

You need to do this, there is no choice.  If you don’t do it, no one else will.  The reputation of your team will suffer for it.  Your reputation as a leader will suffer.

And you can’t just promote your team, you need to promote yourself as well.  No one else will be an advocate for you, except you!

Some people do the above naturally.  I’d bet that most engineering managers don’t.  I know I don’t.  I’m still not good at doing the above.  However, I’ve seen other managers excel at this and reap the benefits.

Check out a book on this entire topic here:

How to have a successful 1-on-1 with your boss

1on1Ah, the dreaded weekly 1-on-1!  Do you get nervous leading up to your 1-on-1 with your boss?  Are you sometimes caught off guard or feel unprepared during the discussion?  Do you ever feel like the time isn’t valuable?

Here are some tips I’ve picked up over the years to ensure a successful 1-on-1 with your boss:

Before the meeting

  • Be prepared.  This meeting is regularly scheduled, and it’s important.  You have it every week so you know what it’s going to be like.  There is no reason to not be prepared for this meeting.
  • Give them a heads up.  If there is a specific topic you want to cover, give your boss a heads up a day or so beforehand.  This will give them time to think about it, rather than catching them off guard in the meeting.
  • Review the past week.  Spend 10 minutes reviewing what happened in your group over the past week.  I typically write down a bulleted list because my memory is bad.  Were there any production issues?  Be prepared to answer questions regarding any event that may have made its way to your boss via other channels.
  • No surprises.  Don’t wait for your 1-on-1 to let your boss know of any big or urgent news.  See this post for tips on managing production issues.

During the meeting

  • Be on time.  Your boss’s time is valuable, don’t disrespect them by being late.
  • Let them lead.  Even though you’ve come prepared with a list of topics and questions, let your boss lead the discussion.  Remember, people have their own agendas and interests.  If your boss doesn’t have any topics to cover then you can move on to your agenda.
  • Raise Issues.  It’s important that your boss hears about issues going on within your team from you first.  It demonstrates that you are the leader of your team and have things under control.  However, as mentioned above, you should be constantly in communication with your boss of any news on your team.  Use the 1-on-1 time to raise up project risks or other concerns, vs. news.
  • Listen.  Pay close attention to the body language and questions that your boss asks.  What is he/she really interested in?  Do they want a status update, or just brainstorm and bounce ideas off of you?  Let them lead and run with it, but find ways to weave in the questions you need answered.  If that doesn’t work, try to move onto your questions/issues after half way through.
  • Take Notes.  I find that I need to take notes in my 1-on-1 to ensure I don’t drop anything.  I usually bring a notebook to take notes vs a computer, as it demonstrates that you are focused on the meeting, and not distracted by email/chat/etc.
  • Learn their style.  You can learn so much from a person by observing their behavior in these 1-on-1 settings.  You should start to see a pattern emerge over a few weeks on what your boss likes to cover in these meetings.  If they are a seasoned manager they will be effective, but that won’t always be the case.  Use the ‘heads up’ before the meeting to ensure the topics you want addressed are covered.  Don’t wait for your boss to discuss your career goals, or potential growth opportunities, bring it up here.

After the meeting

  • Take Notes.  If you didn’t do so in the meeting, immediately afterwards jot down some notes from the meeting.  Pay attention to the topics that they raised.
  • Take Action.  Were there action items?  If so, make sure there is some progress on them by next week’s meeting!

Hopefully you find some of the tips above to be useful.  I’d love to hear other tactics that people employ to ensure they have a successful 1-on-1!

 

When ramping up new engineers, focus on the product!

Onboarding-Sign

When ramping new hires up, it’s very tempting to quickly throw them into the fire, fix bugs, start building features, etc.  After they’ve completed their orientation and filled out their paperwork, what better way for them to learn the system?

Stop!

It’s critically important that your engineers know how the business operates, who the customers are, their needs, and how your product fills that need.

The company I currently work for provides a SaaS offering that is VERY workflow intensive.  We have 20+ roles in the system with around 5 major different personas, across 3 different applications.  I made the mistake in the first paragraph and am now regretting it.  We were under high growth at the time, hiring as fast as we could, and our backlog was growing.

Now, these engineers have been on board for several months and know nothing about the product.  When building new features, they don’t have the customer in mind.

Bottom line, when onboarding new employees focus on the product and end users first, THEN have them learn the code.  This may take a week or more, depending on your product, but it will pay dividends down the road.

 

What does a Software Engineering Manager actually do?

cubiclesThe role of the Software Engineering Manager in an organization is extremely varied.  This can be a benefit to the job, in that you are wearing so many hats and there is hardly any routine from day to day.  However, without careful time management skills it can feel overwhelming.

Some of the typical roles for a SW Manager include:

  • Project Management
    • Breaking a project or work down into smaller chunks, and assigning to the right developer
    • Work with product owner to define the requirements / user stories, to ensure they are fully vetted
    • Establish schedule / estimates / delivery timelines / etc
    • Oftentimes you will be stuck in the ScrumMaster role, if no one else on the team wants to do it.
  • Performance Management
    • Performance reviews
    • Salary adjustments (bonus, raises, etc)
    • Performance plans (i.e. PIP’s)
  • Mentorship
    • 1-on-1’s
    • Career development of your team
  • Communication – You are the voice of your team, and as such need to communicate:
    • Up – Communicating status of your team up your management chain.
    • Across – Communicating with peers and other functional groups across the organization
    • Down – Communicating news, decision making to your team.
  • Recruitment – You will need to work with HR to create job postings, screen resumes, interview candidates, etc.

Note that none of the items above includes anything technical!  Depending on the size of your team you may also be serving as the lead developer / architect on the project.

 

 

Firedrill! What to do when your production system goes down

There’s no worse feeling than when your production system goes down.  The business relies on your system’s availability.  Something happened, a bug, bad code push, a customer inserted crazy data, or whatever.

Now everyone is looking at you to fix it.  You are completely dependent upon your team, operations and engineering to come together, diagnose, address root cause, and deploy a fix ASAP.

Your ass is on the line and you are pretty much helpless.

What can you do to help?

Here are my tips:

  • Make sure you have the right people on the scene.  Have at least 1 engineer and ops person on the issue together.  Open a dedicated skype room or google hangout where information can flow freely.
  • Quickly assess the severity of the service degradation.
  • Notify your management chain, product team, and various other relevant internal stakeholders ASAP.  Be honest.
  • Provide cover for the team diagnosing the issue.  Limit distractions.
  • Get out of the way.  Your job is to ensure the right people are on the issue, and the org is up to date on the status.
  • Once the issue is identified and a patch is deployed, communicate out to the org what happened.
  • Afterwards, gather the team together and hold a quick post mortem to find out what went wrong.  Some key questions:
    • What services were affected?
    • What actually happened?
    • What is the root cause?
    • How can this be prevented in the future?  Is additional logging, instrumentation needed to diagnose the issue more quickly in the future?
  • Thanks the team for their teamwork, and quick resolve.
  • Send out a service incident report to the company that is transparent.  Describe the information gathered from the post mortem and explain it in simple terms.  Remember, the rest of the company wants to know that you have things under control, and you are taking the necessary steps to ensure it won’t happen again.  Most people understand that things go wrong and people make mistakes.

What other steps do you take?

 

Ten Potential Blog Posts

Here are 10 potential blog posts that I can write about:

  1. Why I enjoy being a software manager
  2. Why I loathe being a software manager
  3. Challenging personnel situations
  4. Common situations and how to react
  5. Time management / being overwhelmed
  6. Agile / Lean adaptation
  7. Challenges in embedded vs. saas
  8. Tackling difficult conversations
  9. Architectural Discussions + Patterns
  10. Recommended Books

There are probably a ton more, but these are the ones that are on the top of my head.

 

My first post!

Hey everyone.  This is my first blog post.

I’ve been working in the software industry for 12+ years, and managing teams and people for the past 5 years.

This blog is my attempt to share my thoughts and lessons learned in the field.

My goal is to help sharpen my own thoughts by writing them down and potentially connect and share ideas with like minded folks in the future!

Thanks,

-Dave