Complex technology projects, learn from Apple System 7 Blue Meanies

One of the most enjoyable projects I worked on was Apple’s System 7.  There were many lessons I learned working on that project, one of which is “don’t tell the whole development team to innovate.”  Because if everyone innovates, the system doesn’t work.

For all the years I spent working on Windows Operating Systems from 1992 to 2006, the last client OS i worked on was Windows XP, running the Technical Evangelism team. 

When Windows Vista (aka Longhorn) came after Windows XP, I recognized the pattern from System 7 pushed too far as Jim Allchin and the rest of the executives ordered innovation in all parts of the OS.  We saw powerpoints for features that had little hope of seeing the light of day.

One big lesson that worked well to ship System 7 was “Blue Meanies.”  Who are the Blue Meanies?  Here is the secret about box with the people.

System 7.0.1:

Help! Help! We're being held prisoner in a system software factory!

The Blue Meanies

Darin Adler
Scott Boyd
Chris Derossi
Cynthia Jasper
Brian McGhie
Greg Marriott
Beatrice Sochor
Dean Yu

What did they do?

While the Meanies have sometimes been characterized as the "coders of System 7", the Mac OS was by then sufficiently large that major subsystems such as QuickDraw and QuickTime were developed and maintained by specialized groups, and the Meanies primarily focused on getting the pieces to work together.

If you have a complex project where there is a lot of innovation which causes conflicts between groups when don’t work, think about creating a group of people whose job is to get the pieces to work together.

Some may call this architecture, but getting systems to work together many times require the skills of implementation, not just architecture.  The Apple System Blue Meanies did it all.

In your complex data center projects who are the Blue Meanies on your project?

Read more

4 category approach to Green the Data Center

WSJ has a guest article by Robert Plant.

— Dr. Plant is an associate professor in the department of computer information systems at the University of Miami's School of Business Administration

How Green Should My Tech Be?

To decide whether an eco-friendly IT idea makes sense, first place it in one of four categories

By ROBERT PLANT

In these tough economic times, green initiatives can be a hard sell. Companies don't want to take a gamble on pricey projects that lie outside their core mission. Yet lots of eco-friendly ideas promise to pay for themselves—and then some—by slashing costs and boosting efficiency.

The Journal Report

See the complete Business Insight report.

How should companies approach the problem? To find out, we looked at green initiatives in one critical section of businesses, the corporate data center, and placed potential projects into four categories. At one end of the spectrum are obviously useful ideas that are simple and inexpensive. At the other end are expensive distractions that should be avoided at all costs. By figuring out which category an idea fits into, companies can better weigh the risk and potential return.

The caveat that starts out is this system is dependent on the judgment by the CIO.

One caveat. This system—based on an earlier model developed in collaboration with Prof. Leslie Willcocks from the London School of Economics—relies heavily on the judgment of a company's chief information officer. We assume the CIO is closely monitoring promising technologies and can evaluate their possible impact on the business.

The four categories are.

Here are the four categories.

No-Brainers. In these cases, the green technology is a commodity. It not only cuts power use and emissions—thereby fulfilling its green mission—it's easy and cheap to obtain and implement. The bottom line: Companies should pursue these projects as soon as possible.

Promising but Pricey. Here, the green technology is clearly useful but isn't yet popular enough to be a commodity.

Business Opportunities. In some cases, green tech initiatives have the potential to win new business. One

Distractions. When evaluating green projects, the vast majority of companies shouldn't try to keep up with industry titans.

Read more

Data Center Myth – Thermal/Temperature Shock

Mike Manos has a post pointing out what he calls “data center junk science” and the data center thermal shock requirement. 

Mike’s post got my curiosity up, and I spent time researching to build on Mike’s post. This is my 956th post in less than 2 years, and people many times think I have a journalism writing background.  Well fooled you, I am an Industrial Engineer and Operations Research graduate from Cal Berkeley.  So, even thought I write a lot, you are reading my notebook of stuff that I discover I want to share with others. For those of you who don’t want industrial engineers do.

Industrial engineering is a branch of engineering that concerns with the development, improvement, implementation and evaluation of integrated systems of people, money, knowledge, information, equipment, energy, material and process. It also deals with designing new prototypes to help save money and make the prototype better. Industrial engineering draws upon the principles and methods of engineering analysis and synthesis, as well as mathematical, physical and social sciences together with the principles and methods of engineering analysis and design to specify, predict and evaluate the results to be obtained from such systems. In lean manufacturing systems, Industrial engineers work to eliminate wastes of time, money, materials, energy, and other resources.

This background all helps me think of how to green the data center.

And Operations Research helps me think about the technical methods and SW to do this.

interdisciplinary branch of applied mathematics that uses methods such as mathematical modeling, statistics, andalgorithms to arrive at optimal or near optimal solutions to complex problems. It is typically concerned with determining the maxima (of profit, assembly line performance, crop yield, bandwidth, etc) or minima (of loss, risk, etc.) of some objective function. Operations research helps management achieve its goals using scientific methods.

Mike’s post got me thinking because one of my summer internships was at HP where I worked as a reliability/quality engineer figuring out how to build better quality HP products.  The team I worked in were early innovators in thermal cycling and stressing components back in the early 1980’s. 

Data Center Junk Science: Thermal Shock \ Cooling Shock

October 1, 2009 by mmanos

I recently performed an interesting exercise where I reviewed typical co-location/hosting/ data center contracts from a variety of firms around the world.    If you ever have a few long plane rides to take and would like an incredible amount of boring legalese documents to review, I still wouldn’t recommend it.  :)

I did learn quite a bit from going through the exercise but there was one condition that I came across more than a few times.   It is one of those things that I put into my personal category of Data Center Junk Science.   I have a bunch of these things filed away in my brain, but this one is something that not only raises my stupidity meter from a technological perspective it makes me wonder if those that require it have masochistic tendencies.

I am of course referring to a clause for Data Center Thermal Shock and as I discovered its evil, lesser known counterpart “Cooling” Shock.    For those of you who have not encountered this before its a provision between hosting customer and hosting provider (most often required by the customer)  that usually looks something like this:

If the ambient temperature in the data center raises 3 degrees over the course of 10 (sometimes 12, sometimes 15) minutes, the hosting provider will need to remunerate (reimburse) the customer for thermal shock damages experienced by the computer and electronics equipment.  The damages range from flat fees penalties to graduated penalties based on the value of the equipment.

As Mike asks the issue of duration.

Which brings up the next component which is duration.   Whether you are speaking to 10 minutes or 15 minutes intervals these are nice long leisurely periods of time which could hardly cause a “Shock” to equipment.   Also keep in mind the previous point which is the environment has not even violated the ASHRAE temperature range.   In addition, I would encourage people to actually read the allowed and tested temperatures in which the manufacturers recommend for server operation.   A 3-5 degree swing  in temperature would rarely push a server into an operating temperature range that would violate the range the server has been rated to work in or worse — void the warranty.

Here is the military specification typically used by vendors. MIL-STD- 810G to define temperature/thermal shock.

MIL-STD-810G
METHOD 503.5
METHOD 503.5
TEMPERATURE SHOCK

1.
SCOPE.
1.1
Purpose.
Use the temperature shock test to determine if materiel can withstand sudden changes in the temperature of the surrounding atmosphere without experiencing physical damage or deterioration in performance. For the purpose of this document, "sudden changes" is defined as "an air temperature change greater than 10°C (18°F) within one minute."
1.2
Application.
1.2.1
Normal environment.
Use this method when the requirements documents specify the materiel is likely to be deployed where it may experience sudden changes of air temperature. This method is intended to evaluate the effects of sudden temperature changes of the outer surfaces of materiel, items mounted on the outer surfaces, or internal items situated near the external surfaces. This method is, essentially, surface-level tests. Typically, this addresses:
a.
The transfer of materiel between climate-controlled environment areas and extreme external ambient conditions or vice versa, e.g., between an air conditioned enclosure and desert high temperatures, or from a heated enclosure in the cold regions to outside cold temperatures.
b.
Ascent from a high temperature ground environment to high altitude via a high performance vehicle (hot to cold only).
c.
Air delivery/air drop at high altitude/low temperature from aircraft enclosures when only the external material (packaging or materiel surface) is to be tested.

As Mike says the surprising part is the requirement for thermal shock is coming from technical people, most likely who have military backgrounds.

Even more surprising to me was that these were typically folks on the technical side of the house more then the lawyers or business people.  I mean, these are the folks that should be more in tune with logic than say business or legal people who can get bogged down in the letter of the law or dogmatic adherence to how things have been done.  Right?  I guess not.

I can’t imagine any business person or attorney thinking a thermal shock is 3 degree change in 15 minutes.  If there was an attorney involved they would go to MIL-STD 810G definition of temperature shock being greater than 10°C (18°F) within one minute.

So where does this myth come from?  Most likely their is a social network effect of people who have consider themselves smarter than others and have added thermal shock to the requirements.  One of the comments from Mike’s blog documents the possible social network source.

Dave Kelley, Liebert Precision Cooling

The only place where something like this is “documented” in any way is in the ASHRAE THermal Guidelines book. Since the group that wrote this book included all of the major server vendors, it must have been created with some type of justifiable reason. It states that the “maximum rate of temperature change is 5 degress C (9 degrees F) per hour.

And as Mike closes this has unintended consequences.

But this brings up another important point.  Many facilities might experience a chiller failure, or a CRAH failure or some other event which might temporarily have this effect within the facility.    Lets say it happens twice in one year that you would potentially trigger this event for the whole or a portion of your facility (your probably not doing preventative maintenance  – bad you!).  So the contract language around Thermal shock now claims monetary damages.   Based on what?   How are these sums defined?  The contracts I read through had some wild oscillations on damages with different means of calculation, and a whole lot more.   So what is the basis of this damage assessment?   Again there are no studies that says each event takes off .005 minutes of a servers overall life, or anything like that.   So the cost calculations are completely arbitrary and negotiated between provider and customer. 

This is where the true foolishness then comes in.   The providers know that these events, while rare, might happen occasionally.   While the event may be within all other service level agreements, they still might have to award damages.   So what might they do in response?   They increase the costs of course to potentially cover their risk.   It might be in the form of cost per kw, or cost per square foot, and it might even be pretty small or minimal compared to your overall costs.  But in the end, the customer ends up paying more for something that might not happen, and if it does there is no concrete proof it has any real impact on the life of the server or equipment, and really only salves the whim of someone who really failed to do their homework.  If it never happens the hosting provider is happy to take the additional money.

Temperature/thermal shock is a term that doesn’t apply to data centers.  Hopefully you’ll know when to call temperature/thermal shock requirements in data center operations a myth.

Thanks Mike for taking the time to write on this.

Read more

Adding Twitter to the blogging process

I’ve been resistant to Tweet with Twitter given I already write on average two blog entries a day.  But, this week I connected my blog www.greenm3.com to my twitter feed www.twitter.com/greenm3

Here is what I did.

1) In Typepad I added my Twitter account

image

2) I could now publish my posts to twitter, but it was an extra step to go to TypePad posts to enable a twitter feed.  I use Windows Live Writer to write blog entries, and there is a twitter notify plug-in.

image

3) You can check to see if the twitter plug-in installed in the Windows Live Writer edit blog settings.

image

4)  When you publish a dialog box comes to confirm the publishing to Twitter.  Here is the dialog from this post.

image

Overall easier than I anticipated and it is just one more dialog box now that I have Windows Live Writer configured.

You can read my tweets at www.twitter.com/greenm3 which are an alternative to subscribing to my RSS feeds.

Read more

Intel Developer Forum presentation Social Networks and Innovation, a new method for data centers

Intel Developer Forum is a big technical media event.  There are lots to see and the media coverage is huge.  Here is a partial picture of the media room as people are busy writing about Intel’s latest announcements.  This room can hold over 200 people and it is full.

image

While I was in the media room I missed the most useful presentation of the day.  Below is a picture of Eleanor Wynn, Social Technology Architect and Principal Engineer, Intel Corporation, staffing the booth for IT-CMF

image 

I caught Eleanor moving, and here is a better picture.

eleanor

What is IT-CMF?

IT Capability Maturity Framework (IT-CMF)

Ran across an interesting piece of work out of Intel Corp. The IT Capability Maturity Framework trys to take a stab at a common problem. What attracted me to this framework was the business oriented approach this framework takes. But after digging through their site I was hungry for more information. I couldn't find much more information besides a high level explanation. They do have a sample assessment out there that give you a better idea of the framework.

From IT-CMF Website:

From the synthesis of leading academic research, proven industry best practices and Intel's own experience in transforming the Intel IT organisation, Intel developed the IT Capability Maturity Framework (IT-CMF). Based on the lack of existing frameworks and the huge appetite from other top Business and IT executives for such an approach, Intel has decided that the best way to further develop and disseminate the IT-CMF, its associated tools and practices is to have it included as part of IVI’s research and education agenda.

The IT-CMF consists of four integrated strategies:

So, this is cool, but then I realized I missed Eleanor’s presentation on Social Networks.

What does Social Networks have to do with data centers?  Social Networks characterize the behavior in the data center system for those companies/people who are doing the most innovative work.

image

Eleanor and I had a chance to talk for 3 hours at IDF, so I learned a lot even though I missed her presentation.  One big concept which was helpful to describe issues is the “tribal knowledge” in IT vs. meme.

: an idea, behavior, style, or usage that spreads from person to person within a culture

Which gets to my point of why Social Networks and Memes are important characteristics in Innovative Green Data Centers.

image

The most innovative people in data centers are networked to share and receive ideas.

image

The social network enabled organizations know this.

image 

Your head may hurt with these concepts, but here is summary to help you.  My head hurts a little bit too, but I’ve been playing with these ideas for a while, and luckily I can follow up with Eleanor.

image 

Some of the most innovative data center people are figuring out how to build their data center social networks as a competitive advantage.

Read more