news / tech talk
Uptime
by Lee LeClair06/20/2008
As seen in Inside Tucson Business
- Arizona Daily Star
- BizPlanIt
- Linux World Expo
- 40 under 40
- They're on the A-List
- Dotche system built by Ephibian
- AzBusiness
- Arizona Daily Star
- Arizona Daily Star
- Phoenix Business Journal
- Ranking Arizona
- The Arizona Republic
- Hostingtech.com
- American City Business Journals Inc.
- AZtechBiz
- Inside Tucson Business
- Arizona Business Gazette
- Inside Tucson Business
- Fiesta Mall
- Arizona Daily Star
- .com Success!
- Business Wire
- Buck's Woodside Menu
- CRN
- Arizona Daily Star
- LocalBusiness.com
- The Business Journal - Phoenix
- Phoenix Business Journal
- LocalBusiness.com
- Business Wire
- Inside Tucson Business
- internet.com
- AzBusiness
- AZtechBiz
- designshops.com
- AZtechBiz
- BizAZ
- Virtualized Cloud
- Collaboration and Communication
- Personally Identifiable Information
- Cyberwarfare
- iPad and E-Readers
- Trusted Platform Module
- Smartphone Data Security
- Cyber-Espionage
- DTNs
- Have a Plan
- Cloud Computing - Part 2
- Impact of Technology on Existing...
- Data Archiving
- Mobile Telephony - Part 2
- Cloud Computing
- Social Networks
- Password Management
- Netbooks
- Microtargeting
- Packet Analysis
- IP v6
- Surge Protection
- Traveling Safely
- Thin Client
- Uptime
- Mobile Telephony
- Know Thy Programs
- Voice Over IP - Part 3
- Google Apps
- Virtual Computing
- Securing Users
- Simple Desktop Management
- Service Oriented Architecture
- Light-based Communication
- Data Mining
- Small Business Architecture
- Voice Over IP - Part 2
- Business Automation
- Database Needs
- DMZs
- CPUs
- SPAM & Botnets
- Security Testing
- Customer Advocacy
- Laptop Security
- Windows Vista
- Large Scale Deployment
- Network Access Control
- Generator Use
- Uninterrupted Power Supplies
- Web Site Security
- Blu-ray vs. HD-DVD
- Dual-Core Processors
- Business Security
- AJAX
- 3G Mobile Internet
- Apple Intel Processors
- Entertainment Tech
- Cafe Wireless
- Commercial Hosting
- Gaming Consoles
- Voice Over IP
- Blogging
- Is WI-FI Secure?
- OpenDocument Format
- Allured Publishing Changes Name to...
- Computer Model Can Help Prevent War?
- Defense contractors run gamut from...
- ASU gears on-site construction...
- The Cleveland Foundation Selects...
- Global Partners Join Forces to Speed...
- Intuit Completes Acquisition of...
- Strategy unveiled on how tobacco tax...
- Gaiam's, Real Goods' revenues increase...
- LSST Awarded Time on TeraGrid
- Aldine Independent School District...
- Miraval featured in Natural Solutions...
- Ventana Medical Systems Joins TSIA to...
- UA $3 Million Bioterrorism Grant...
- Arizona Center for Integrative...
As businesses have become dependent on their computers and network systems, the importance of “uptime” has become critical. No one can afford to be “down” these days, it directly affects customers and operations. So if you need your systems and networks to be up all the time, ensure you take the following things into consideration: power, simplicity, redundancy, and continuity of operations. In our discussion, we are not going to assume you are planning for disaster recovery, just very reliable uptime.
A first consideration is power as it is the crux of all IT equipment. Having reliable power means taking the occasional power hiccup into account as well as the ability to deal with longer outages. We are fortunate in Arizona to have pretty reliable power and not too many sources of environmental outage (tornado, hurricane, earthquake, flood, etc.). Nevertheless, we are not immune and some of you may have suffered outages from trees blown over during a storm, vehicle accidents with power poles, and of course surges from lightning storms. Be sure that critical systems are identified and sit on fuse protected Uninterruptible Power Supplies (UPS). These can be multiple small units or larger units that can handle multiple systems. Note that most small UPS units are only good for a 10-15 minutes of load and that their performance degrades over time; their internal battery packs will need to be replaced in 3-5 years. These types of systems need to be sized appropriately to your system loads and all systems that you want to stay up need to be on them (network components too). While these will deal with short outages, they will not provide any lengthy uptime. For that, you will need a generator of some type. Generators are available for diesel, gas, and natural gas. At this level, you may want to consult with a electrical power engineer or electrician to ensure your generator is sized correctly and all systems are connected correctly. Test it at least quarterly.
The next consideration is simplicity. Ensure your network and systems are designed to operate in a simple and understandable way with clear and up to date documentation. This will help immensely with keeping the network and systems operational. When problems do occur, they will be much easier to troubleshoot if you stay away from overly complex designs. Also, if you have IT staff turnover, a simple design and good documents makes the system easier to understand for transition to newbies. For campus area networks, go with a very simple layer 3 core and layer 2 only at the edges.
Now consider where you need redundancy. This will vary based on the size and criticality of various points in your network and systems. Adding redundancy can be easy (e.g., dual power supplies in a various equipment) but it can complicate things as well (redundant network components in a load-balancing or failover configuration). Keep the “simple” rule in mind and add redundancy where it makes the most sense. Where you put redundant network systems in place, investigate the redundancy protocols being implemented (GLBP, HSRP, VRRP, etc.) and tune the timers to keep uptime as high as possible.
Finally, plan continuity of operations scenarios. Think through the most likely scenarios, prioritize what will need to be done, and write down the information you and your staff will need (power company phone number, hardware manufacturers, warranty info, etc.) as well as the procedures (e.g., system recovery procedures). Then schedule practice times and tests of these scenarios. You would not want to have to try to swim after having only read about it, right? In real downtime situations, stress will be high so practiced checklist responses are the best way to get your systems back to where they need to be.
Lee Le Clair is the CTO at Ephibian. His Tech Talk column appears the third week of each month in Inside Tucson Business
A first consideration is power as it is the crux of all IT equipment. Having reliable power means taking the occasional power hiccup into account as well as the ability to deal with longer outages. We are fortunate in Arizona to have pretty reliable power and not too many sources of environmental outage (tornado, hurricane, earthquake, flood, etc.). Nevertheless, we are not immune and some of you may have suffered outages from trees blown over during a storm, vehicle accidents with power poles, and of course surges from lightning storms. Be sure that critical systems are identified and sit on fuse protected Uninterruptible Power Supplies (UPS). These can be multiple small units or larger units that can handle multiple systems. Note that most small UPS units are only good for a 10-15 minutes of load and that their performance degrades over time; their internal battery packs will need to be replaced in 3-5 years. These types of systems need to be sized appropriately to your system loads and all systems that you want to stay up need to be on them (network components too). While these will deal with short outages, they will not provide any lengthy uptime. For that, you will need a generator of some type. Generators are available for diesel, gas, and natural gas. At this level, you may want to consult with a electrical power engineer or electrician to ensure your generator is sized correctly and all systems are connected correctly. Test it at least quarterly.
The next consideration is simplicity. Ensure your network and systems are designed to operate in a simple and understandable way with clear and up to date documentation. This will help immensely with keeping the network and systems operational. When problems do occur, they will be much easier to troubleshoot if you stay away from overly complex designs. Also, if you have IT staff turnover, a simple design and good documents makes the system easier to understand for transition to newbies. For campus area networks, go with a very simple layer 3 core and layer 2 only at the edges.
Now consider where you need redundancy. This will vary based on the size and criticality of various points in your network and systems. Adding redundancy can be easy (e.g., dual power supplies in a various equipment) but it can complicate things as well (redundant network components in a load-balancing or failover configuration). Keep the “simple” rule in mind and add redundancy where it makes the most sense. Where you put redundant network systems in place, investigate the redundancy protocols being implemented (GLBP, HSRP, VRRP, etc.) and tune the timers to keep uptime as high as possible.
Finally, plan continuity of operations scenarios. Think through the most likely scenarios, prioritize what will need to be done, and write down the information you and your staff will need (power company phone number, hardware manufacturers, warranty info, etc.) as well as the procedures (e.g., system recovery procedures). Then schedule practice times and tests of these scenarios. You would not want to have to try to swim after having only read about it, right? In real downtime situations, stress will be high so practiced checklist responses are the best way to get your systems back to where they need to be.
Lee Le Clair is the CTO at Ephibian. His Tech Talk column appears the third week of each month in Inside Tucson Business