Your Certificate Expired and Broke My Device

Your Certificate Expired and Broke My Device 

Certs are easy to manage, said no one

Certificates are a wonderful thing for security but a horrid one for management, especially if you have no formal process in place for renewing them. A quick web search will turn up countless posts and articles about expired certs breaking organizations and causing chaos.

This post is not about managing their expiration as I believe operational leaders need to make it a priority in the months prior and have a formalized plan. Instead I will focus on the technical side of supporting users and departments from a network administration perspective.

Most network professionals know the certificates on their devices do not hinder client connectivity. The obvious caveat to that is when you forget to update your EAP cert(s) or use your infrastructure as the CA. However when something doesn't connect to the network we all know the phone calls or tickets that come in saying the network is down or your certificate expired and broke my device.

Its not me, its you

This was my dilemma recently. With the root certificate expiring soon I got to work updating our NAC with all the necessary new ones, most importantly the certificate used for our EAP methods (PEAP-MSCHAPv2 and TLS). My goal was to update the EAP certificate first as I knew this would be the biggest cause for concern and I did not want that update to coincide with the root expiring. After difficulty getting our NAC updated, the nodes were now using the new certificate with minimal impact to endpoints. Minimal since some of our laptops were unable to reconnect without user intervention to click accept on trusting the network. After speaking with our device admin we were unable to determine why this only happened on laptops and not other wireless PCs. In any case, the critical piece of my puzzle was complete and everything was authenticating. Yay!

That is until the following week when the root certificate finally expired. Falling asleep before the expiration (because there's nothing I can do) and making it almost through to the morning until the on call woke me up stating our service desk claims the primary SSID for VoWiFi phones was down. Thinking this is an interesting assertion as we use a central wireless controller with centralized switching, if we had a problem surely we’d have other issues with other production SSID’s. Sleepily I hopped on my computer and began to parse through RADIUS logs. A short time later I confirmed my initial suspicion, the devices had not received the updated root certificate.

After that call my week became consumed by troubleshooting requests even though there was nothing to troubleshoot from my perspective. Still, I believe in teamwork to an extent, primarily for those who are on the ground helping out individuals. This did not remove any frustration that was incurred when attempting to explain the issue to everyone unfortunately. As everyone became aware of the issue, other teams began double checking their devices and the problems slowly cleared up. The focus quickly moved away from us and towards the root (pun intended) issue.

Did we learn anything?

No project good or bad should be put to rest without some lessons learned. For me the biggest lesson learned is to do you due diligence and start a new or otherwise unknown process far in advance. This was my first time updating certificates for an organization this large using an enterprise grade NAC. During the process I uncovered many issues that required extensive troubleshooting to correct before I could begin renewing everything. Things from communication errors between nodes, necessary restarts, failed backups and implementing best practices for this particular appliance. The other primary lesson is always document and be able to give an executive summary of what you are doing. Not too much or too little information is key to providing leaders with enough detail to transfer to other leaders who may be less technical. Lastly, be prepared to troubleshoot. Know the possible issues and know how to look for them. Set aside time after the go live (the overall one, not just your piece) to assist and understand how other teams changes interact with yours.


Comments

Popular posts from this blog

Capturing Roaming Events

IoT and Smart Home Devices: Part 1

Frame Exploration: Authentication Frames