Reinout van Rees: Easy maintainance: script that prints out repair steps

At my work we have quite a number of different sites/apps. Sometimes it is just a regular django website. Sometimes django + celery. Sometimes it also has extra django management commands, running from cronjobs. Sometimes Redis is used. Sometimes there are a couple of servers working together....

Anyway, life is interesting if you're the one that people go to when something is (inexplicably) broken :-) What are the moving parts? What do you need to check? Running top to see if there's a stuck process running at 100% CPU. Or if something eats up all the memory. df -h to check for a disk that's full. Or looking at performance graphs in Zabbix. Checking our "sentry" instance for error messages. And so on.

You can solve the common problems that way. Restart a stuck server, clean up some files. But what about a website that depends on background jobs, run periodically from celery? If there are 10 similar processes stuck? Can you kill them all? Will they restart?

I had just such a problem a while ago. So I sat down with the developer. Three things came out of it.

I was told I could just kill the smaller processes. They can be re-run later. This means it is a good, loosely-coupled design: fine :-)
The README now has a section called "troubleshooting" with a couple of command line examples. For instance the specific celery command to purge a specific queue that's often troublesome.
This is essential! I'm not going to remember that. There are too many different sites/apps to keep all those troubleshooting commands in my head.
A handy script (bin/repair) that prints out the commands that need to be executed to get everything right again. Re-running previously-killed jobs, for instance.

The script grew out of the joint debugging session. My colleague was telling me about the various types of jobs and celery/redis queues. And showing me redis commands that told me which jobs still needed executing. "Ok, so how do I then run those jobs? What should I type in?"

And I could check serveral directories to see which files were missing. Plus commands to re-create them. "So how am I going to remember this?"

In the end, I asked him if he could write a small program that did all the work we just did manually. Looking at the directories, looking at the redis queue, printing out the relevant commands?

Yes, that was possible. So a week ago, when the site broke down and the colleague was away on holiday, I could kill a few stuck processes, restart celery and run bin/repair. And copy/paste the suggested commands and execute them. Hurray!

So... make your sysadmin/devops/whatever happy and...

Provide a good README with troubleshooting info. Stuff like "you can always run bin/supervisorctl restart all without everything breaking. Or warnings not to do that but to instead do xyz.
Provide a script that prints out what needs doing to get everything OK again.

Reinout van Rees: Easy maintainance: script that prints out repair steps

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...