We’ve had a bunch of new servers in place for around 3 months now. They seem to be working well and are performing just fine.
Then, out of the blue, our monitoring started throwing alerts on seemingly random servers. Our queues were building up – basically, database performance had dropped dramatically and our processing scripts couldn’t stuff data into the DBs fast enough.
What could be causing it?
I use cobbler to provision our new Dell servers, which is great but it needs the MAC addresses of the servers to identify each machine.
Previously, I have been doing this manually:
- log in to the DRAC web interface
- launch the java console
- rebooting the server
- go into the BIOS
- navigate to Embedded Devices
- manually record the MAC addresses
This takes quite a while, and is prone to error.
I recently had another 42 servers to deploy to I looked for a way to automate this process. I found one! Continue reading
I use the toggl time-tracking service to keep track of the hours I work for my various clients.
toggl make available desktop clients for Windows, Mac, & Linux, but the Linux packages are in .deb format for Ubuntu and, until recently, they did not provide x86_64 packages.
toggl recently released the desktop client as open source so I grabbed it and have built an RPM.
RPM (Fedora 12, x86_64): TogglDesktop-2.5.1-1.fc12.x86_64.rpm
It seems that several people have been having problems getting Dell OMSA 6.2 to work correctly on CentOS 5.4 x86_64. Specifically, the software does not detect any storage controllers, and therefore also doesn't find any disks. eg.
[root@b034 ~]# omreport storage pdisk controller=0
Invalid controller value. Read, controller=0
No controllers found.
After a little investigation, I found the source of the problem.
Update: see my recent post describing a better way to do this.
I often need to deploy Ruby gems across many CentOS servers. I prefer to use the native OS package management tools (rpm + yum) rather than using Ruby gems.
Here’s how to build RPMs from Ruby gems using gem2rpm.
When creating backups or log files, I like to name the files with a timestamp, ie. the date plus the time.
I use the date command to produce timestamps in the appropriate format, but I find the format specifier a bit long-winded and difficult to remember – is %m minutes or month?
There is a better way… date -I
I use puppet to manage the configuration of the machines I manage. So far, I've been rolling out new resources to machines but recently I've wanted to remove resources from machines. Here's how I modified my cron classes so I could remove cron jobs as well as create them.
In my iptables configurations, I generally allow all traffic I am interested in and deny the rest, logging anything that is denied.
I found that this can get a bit noisy with loads of connections to udp:137 and udp:500, etc. so I decided to deny the more common ports without logging. But which are the most common ports?
I've used subversion for quite a while now – I vaguely remember using CVS when working with some Sourceforge projects, but most of my experience is with subversion.
I've used the command svn status (or svn st, for short) to show me what changes there are in my working copy. However, I've occasionally thought it would be nice to see what updates are available in the repository but I've never bothered to find out how to do it. Until now…
I'm using the net-snmp-lvs module to interface LVS statistics to SNMP so I can graph them (I'm using OpenNMS).
I have a virtual HTTP service that is balanced across eight real servers. In testing, everything seemed to work just fine and I got some nice graphs that show the Connection Rate, Packet Rate, and Byte Rate for the virtual service and each of the real servers.
This morning, we attempted a cutover, ie. we re-directed real traffic to the new service. Sadly, our perimeter firewall hit > 90% CPU so we had to revert. But, in the time that we were live, I noticed that the Connection Rate statistics were missing for both the virtual service and the real servers for the period in which the service was under high load:
Notice the gap in the Connection Rate graph when the Packet & Byte rate graphs show high values.
I am currently investigating the cause of this issue.