A few days ago, Donald Sharp of Cumulus Networks asked an interesting question in the Network Collective Slack. That question, which sparked the thought exercise the resulted in this blog post, was this: Why is it that in networking we need fully functional labs to test every change we do in the network?
It’s a good question and the debate went on for some time as to why we feel the need to lab every change that we do. Donald used the example that if we were installing new pipes into a building there would be no need for the plumber to lab it up, and he’s correct. I think it’s fair to say that his analogy is a big simplistic but it does get the point across (in Donald’s defense, his illustration was simply meant to demonstrate the point). The example starts to fall apart when we recognize that data isn’t water as it changes quantity, shape, direction, and importance in ways we, the “data plumbers”, don’t get to decide and often are unaware of.
Putting the simplistic analogy aside, there was another theme that arose from the conversation which got me thinking. Plumbers know things will work because their industry is broadly standardized, builds compatible/interoperable parts, and is highly predictable. Networking is often not this predictable. Networkers regularly have no reliable way to predict what a particular feature or function is going to do in the real world. In this case we aren’t talking about reacting to a changing network traffic profile, this is simply not knowing whether or not a particular feature is going to work as advertised or not. This isn’t a planning or capacity problem, it’s a poor product quality problem.
This challenge doesn’t exist just in networking either. Nearly all of the infrastructure professionals that I know expect the products that they are tasked with evaluating, designing, deploying, and operating to be wildly buggy until years after their initial release. And even after the product reaches a decent level of reliability, we expect to see regressions in code with bugs that were once fixed being reintroduced as code branches get merged in our vendor’s pipelines. It’s ubiquitous at this point and while many bemoan the current state of affairs, few seem to be talking about the consequences.
It will probably help to take a look at some of the reasons why things are the way they are…
There’s a number of motivators that appear to influence why companies release code/products before they are ready. In my experience, the list below are some of the more common justifications.
Time to Market: Being the first to market often has positive outcomes to overall product sales and the perception of your capabilities when compared to others. When facing a market that risks saturation, getting in ahead of others can be the difference between your product succeeding and your company falling far behind. The rewards of being first or early to market are significant.
The New and Shiny Effect: New features garner far better headlines than tried and true reliability. Features are immediate, reliability is proven over time. Even Gartner, whose mission is to educate technology consumers on which products they should be considering, doesn’t factor in stability and reliability directly into their famous magic quadrants. If stability is rarely presented as a requirement of purchase, why should companies focus on proving stability up front?
QA Isn’t Free: QA costs money. If customers aren’t demanding it, QA is an expense that either cuts into your margins or makes the product more expensive than the competition. Also, companies have customers who QA the product for them and report back their findings in the form of support tickets.
We’re Human After All: The least infuriating of all the potential reasons why we have poor performing products is that they are made by humans and humans are prone to error from time to time. It’s bound to happen, and it would happen even if the best controls were put into place.
Will It Ever Change?
The simple answer to the question is probably not any time soon. Unless consumers of these products start demanding more stable and reliable releases, what motivation will companies have in making them. It’s an economics question and the companies that seem to have the most challenge with quality are at the top of the profitability charts year after year. Until technology consumers vote with their wallets and are willing to pay a higher price for a better product, I don’t anticipate much will change.
Yes, I’m pessimistic about this, and I think it’s an awfully sad state of affairs in our industry.
So, What Are The Consequences?
There are a number of obvious consequences when you can’t rely on the products that you’re buying. As stated above, customers end up doing the QA for the vendors. In an ideal world (well, an ideal world that accepts poor product quality as fact) this happens in a lab, where problems and bugs can be identified before the products make it into real world use. The more common story is that these bugs are discovered once in production. End users must then not only go through proper channels to report the bug, they also must endure the impact to their network. This all equates to time, money, and pain for the organization that use poorly implemented products
What is spoken about far less is the personal toll that buggy releases take on practitioners. Today’s infrastructure engineers are expected to be the technical experts for the organizations that employ them. They are asked to provide guidance, execute prudent decision making, understand consequences (intentional and unintentional) of technological decisions, and are ultimately held responsible for making the technology meet the business needs.
This puts them in the middle, between the employers that pay them for this expertise and the vendors who are promising the world and delivering far less. Poor product quality adds stress, distrust, and immense additional work. This leads to the industry we see today. Experts that self-identify as strong cynics, having earned the right to be both wary of and pessimistic toward vendors that continually abuse the practitioner in the pursuit of their own economic benefit. Burnout is also prevalent amongst those in this field and I think buggy products is a significant contributing factor in that (at least it was in my own personal burnout story)
I think it’s time that we reshape the conversation on poor product quality and buggy code. The economic motivators are understandable but where do ethics fit into the conversation? Is it ethical for large corporations to put this much work/pressure/stress on the practitioners who use their products? Does this course of action work to the long term benefit of the vendor releasing these products? I don’t think so. I firmly believe this state of affairs is making a deeply cynical industry that is far less enjoyable than the one I started in. Burnout seems to be happening in greater regularity than it has in the past. It is likely that some day in the future it will be time to pay the piper and the companies that have so long caused pain for their users, either intentionally or unintentionally, will be disrupted by those who have empathy and understand that there are real people implementing these products. Those people can only take the pain for so long.
Maybe I’m a dreamer though… we’ve tolerated things being like this for quite a while and while the biggest offenders are seeing some sense of eroding market share it isn’t a mass exodus. I don’t think we’ve seen a company who is truly empathetic to the practitioner yet, so I hold out hope. Infrastructure vendors should consider the personal impact of rushing a product to market, just as much as they consider the economic one. If I’m right and practitioner empathy is a path to disruption in the infrastructure market, winning the short term economic gain at the price of your customer’s pain, time, and money, won’t be a great long term strategy. Trust is one of the hardest things to repair, and I don’t see much trust anymore.
Jordan Martin, CCIE #43772, is a Technical Solutions Architect at World Wide Technology focusing on SD-WAN and enterprise networking. Jordan also co-founded and hosts the popular Network Collective podcast.