5 lessons from the Lighthouse Roadshow in 2019

Thursday, 5 December, 2019

Having completed a series of twelve Lighthouse Roadshow events across Europe and North America over the past six months, I’ve had time to reflect on what I’ve learnt about the rapid growth of the Kubernetes ecosystem, the importance of community and my personal development.

For those of you who haven’t heard of the Lighthouse series before, Rancher Labs first ran this roadshow in 2018 with Google, GitLab and Aqua Security. The theme was ‘Building an Enterprise DevOps Strategy with Kubernetes’. After selling out six venues across North America, I felt that its success could be repeated in Europe. We tested this theory in May by running the first 2019 Lighthouse in Amsterdam with Microsoft, GitHub and Aqua Security. The event sold out in just two weeks, and we had to move to a larger venue downtown to accommodate a growing waiting list.

Bas Peters from Github presenting

Bas Peters from GitHub at Lighthouse Amsterdam – 16th May 2019

After the summer vacation period, the European leg of the Lighthouse re-started in earnest with events in Munich and Paris on consecutive days. The Paris event turned out to be the largest of the roadshow. Held at Microsoft’s magnificent Paris HQ, we packed their main auditorium with almost 300 delegates. In the weeks that followed the Lighthouse team also visited Copenhagen, London, Oslo, Helsinki, Stockholm and, finally, Dublin. Not to be outdone, Rancher’s US team organised a further three Lighthouse events with partners Amazon, GitLab, and Portworx during November.

Now home, sitting at my desk and reflecting on the lessons learnt, I’ve distilled them down to the following:

Focus on context not product pitches

Organizing the content for so many consecutive events with many different speakers was a significant challenge. We had a mix of sales guys, tech evangelists, consultants and field sales engineers presenting. Those speakers that received the best response (and exchanged the most business cards during the coffee breaks) always delivered insight into the context in which their products exist. I share this lesson because I want to encourage those running similar events in this space to understand the value of insight. This is particularly true if you work for a company that doesn’t charge anything for their technology. In a market where there are no barriers to adopting software, the only way you can genuinely differentiate is the quality of the story you tell and the expert insight that you deliver.

Alain Helaili from GitHub at Lighthouse Paris – 11th Oct 2019

Alain Helaili from GitHub at Lighthouse Paris – 11th Oct 2019

Interest in Kubernetes is exploding

Of the almost 3000 IT professionals who registered for the roadshow globally, more than half are already using Kubernetes in production. So, what makes the excitement around Kubernetes different from previous hype-cycles? I would contend there are two principal differences:

  1. Low barrier to entry – Kubernetes takes minutes to install on-prem or in the cloud. I regularly see enthusiastic sales and marketing people launching their first cluster in the public cloud. Compare that to something like OpenStack which, despite the existence of a variety of installers on the market, is hellish to get up and running. Unless you have access to skilled consultants from the beginning, the technical bar is set so high that only the most sophisticated teams can be successful.
  2. Mature and proven – Kubernetes has, in one form or another, been around for over ten years orchestrating containers in the world’s largest IT infrastructures. Google introduced the Borg around 2004. Borg was a large-scale internal cluster management system, which ran many thousands of different applications, across many clusters, each with up to tens of thousands of machines. In 2014 the company released Kubernetes as an open-source version of Borg. Since then, hundreds of thousands of enterprises have deployed Kubernetes into production with all the public clouds now offering managed varieties of their own. Google rightly concluded that a rising tide would float all ships (and use more cloud compute!). Today Kubernetes is mature, proven and used everywhere. Sadly, you can’t say the same about OpenStack.

Tom Callway from Rancher presenting

Yours truly opening proceedings at Lighthouse Munich – 10th Oct 2019

Enterprises are still asking the same questions

While the adoption of Kubernetes is undeniably the most significant phenomenon in IT operations since virtualization, those enterprises that are considering it are asking the same questions as before:
1. Who should be responsible for it?
2. How does it fit into our cloud strategy?
3. How do we tie it into our existing services?
4. How do we address security?
5. How do we encourage broader adoption?

In what is still a relatively nascent market, its challenging questions like these that need to be answered by Kubernetes advocates transparently and in person if they are to be taken seriously. The stakes are high for early adopters, and they need assurance that the advice you offer is real, tangible and trusted by others. That’s why we created the Lighthouse Roadshow.

Bas Peters from Github presenting

Olivier Maes from Rancher Labs at Lighthouse Copenhagen – 31st Oct 2019

Community matters

Unless the ecosystem around new technology is open and well-governed, it will die. Companies or individuals that reject community members as freeloaders are consigning themselves to irrelevance. You can always find some people who are willing to jump through the hoops of licensing management or lock themselves into a single vendor. Still, most of today’s B2B tech consumers are looking to make their choices based on third-party validation. Community members may not pay for your software, but they contribute to your growth by endorsing your brand and sharing their own success stories.

The Lighthouse Roadshow is 100% community driven. We’re not interested in making a profit from ticket sales preferring instead to see how well our stories resonate with delegates. The more insight delivered, the more successful the event. The feedback from each of the Lighthouse venues has been hugely rewarding and the opportunities for growth have been incalculable. We couldn’t have achieved this if we just measured our success by tracking the conversion rate of delegate numbers to MQLs and close won opportunities.

Steve Giguere from Aqua Security at Lighthouse London

Steve Giguere from Aqua Security at Lighthouse London – 8th Nov 2019

Surrounding yourself with talent makes you better

It’s widely known that one of the best ways to improve on a skill is to practice it with someone better than you. During the Lighthouse Roadshow I had the unique privilege of attending every European event and listening to every talk, sometimes multiple times. The skills and knowledge of the speakers and professionalism of the event professionals who helped us was simply amazing.

I’m particularly grateful to my fantastic colleagues at Rancher Labs – Lujan Fernandez, Abbie Lightowlers, Olivier Maes, Tolga Fatih Erdem, Jeroen Overmaat, Elimane Prud’ hom, Nick Somasundram, Simon Robinson, Chris Urwin, Sheldon Lo-A-Njoe, Jason Van Brackel, Kyle Rome and Peter Smails. I’ve also been fortunate to work alongside rockstars from partner companies like Steve Giguere, Grace Cheung and Jeff Thorne at Aqua Security; Bas Peters, Richard Erwin and Anne-Christa Strik at GitHub; and Bozena Crnomarkovic Verovic, Dennis Gassen, Shirin Mohammadi, Maxim Salnikov, Sherry List, Drazen Dodik, Tugce Coskun, Anna-Victoria Fear, Juarez Junior and many others from Microsoft; Alex Diaz and Patrick Brennan from Portworx; Carmen Puccio from Amazon; and Dan Gordon from GitLab. I can’t help but feel inspired by all these fantastic people.

By the time we finished in Dublin, I felt invigorated and filled with new ideas. Looking back, I know that listening and sharing with these brilliant folks has encouraged me to step up my own game.

More Resources

What to know more about how to build an enterprise Kubernetes Strategy? Download our eBook.

Tags: ,,,, Category: Products, Rancher Kubernetes Comments closed

Windows Containers and Rancher 2.3

Tuesday, 8 October, 2019

Container technology is transforming the face of business and application development. 70% of on-premises workloads today are running on the Windows Server operating system and enterprise customers are looking to modernize these workloads and make use of containers.

We have introduced support for Windows Containers in Windows Server 2016 and graduated support for Windows Server worker nodes in Kubernetes 1.14 clusters. With Windows Server 2019 we have expanded support in Kubernetes 1.16.

For our customers one of the preferred ways to increase the adoption of containers and Kubernetes is to work to make it easier for operators to deploy it and for developers to use it.

Towards that end Microsoft has invested in AKS and Windows Container support with this goal in mind while working with partners such as Rancher Labs who has built their organization on the principle of “Run Kubernetes Everywhere”.

With the release of Rancher 2.3, Rancher is the first to have graduated Windows support to GA and can now deploy Kubernetes clusters with Windows support from within the user experience.

Using Rancher 2.3 users can deploy Windows Kubernetes clusters in AKS, Azure Cloud, any other cloud computing provider or on-premises using the supported and proven network components in Windows Server as well as Kubernetes.

Rancher 2.3 will support Flannel as the CNI plugin and Overlay Networking with VxLAN to enable communication between Windows and Linux containers, services, and applications.

Learn more about Rancher 2.3 and its functionality.

Tags: , Category: Containers Comments closed

Introducing Rancher 2.3: The Best Gets Better

Tuesday, 8 October, 2019

Today we are excited to announce the general availability of Rancher 2.3,
the latest version of our flagship product. Rancher, already the
industry’s most widely adopted Kubernetes management platform, adds
major new features with v2.3, including:

  • Industry’s first generally available support for Windows containers, bringing the benefits of Kubernetes to Windows Server applications.
  • Introduction of cluster templates for secure, consistent deployment of clusters in large scale deployments
  • Simplified installation and configuration of Istio service mesh

These new capabilities strengthen our Run Kubernetes Everywhere strategy
by enabling an even broader range of enterprises to leverage the
transformative power of Kubernetes.

Bringing the Benefits of Kubernetes to Windows Server Applications

Today, 70% of on-premises workloads are running on the Windows Server
operating system, and in March of this year, Windows Server Container
support was built into the release of Kubernetes v1.14

Not surprisingly, Windows containers have been one of the most desired technologies within the Kubernetes ecosystem in recent years. We are proud to be partnering with Microsoft on this launch and are excited to be the first Kubernetes management platform to deliver GA support for Windows Containers and Kubernetes with Windows worker nodes! To get Microsoft’s perspective on Rancher 2.3, check out this blog from Mike Kostersitz, Principal Program Manager at Microsoft.

By bringing all the benefits of Kubernetes to Windows, Rancher 2.3 eases
complexity and provides a fast and straightforward path for modernizing
legacy Windows-based applications, regardless of whether they will run
on-premises or in a multi-cloud environment. Alternatively, Rancher 2.3
can eliminate the need to go through the process of rewriting
applications by containerizing and transforming them into efficient,
secure and portable multi-cloud applications.

Windows Workloads

Secure, Consistent Deployment of Kubernetes Clusters with Cluster Templates

With most businesses managing multiple clusters at any one time,
security is a key priority for all organizations. Cluster templates help
organizations reduce risk by enabling them to enforce consistent cluster
configurations across their entire infrastructure. Specifically, with
cluster templates:

  • Operators can create, save, and confidently reuse well-tested Kubernetes configurations across all their cluster deployments.
  • Administrators can enable configuration enforcement, thereby eliminating configuration drift or improper misconfigurations which, left unchecked, can introduce security risks as more clusters are created.

Cluster Templates

Additionally, admins can scan existing Kubernetes clusters using industry tools like CIS and NIST to identify and report on unsecure cluster settings in order to facilitate a plan for remediation.

Tighter Integration with the Leading Service Mesh Solution

A big part of Rancher’s value is its rich ecosystem catalogue of
Kubernetes services, including service mesh. Istio, the leading service
mesh, eliminates the need for developers to write specific code to enable
key Kubernetes capabilities like fault tolerance, canary rollouts,
A/B testing, monitoring and metrics, tracing and observability, and
authentication and authorization.

Rancher 2.3 delivers simplified installation and configuration of
Istio including:

  • Kiali dashboards for traffic and telemetry visualization
  • Jaeger for tracing
  • Prometheus and Grafana for observability

Istio

Rancher 2.3 also introduces support for Kubernetes v1.15.x and Docker
19.03. Getting started with Rancher v2.3 is easy. See our documentation for instructions on how to be up and running in a flash.

Our Momentum Continues

Rancher 2.3 is just the latest proof point of our momentum in 2019.
Other highlights include:

  • 161 percent year-on-year revenue growth, community growth to more than 30,000 active users, oftware downloads have surpassed 100M.
  • Rancher was named a leader in Forrester New WaveTM , Enterprise Container Platform Software Suites
  • Rancher is included in Five Gartner Hype Cycles in 2019
  • Rancher was recognized by 451 Research as a Firestarter in Q3’19

And, maybe the best part of the story is that we have more exciting news coming very soon! Stay tuned to our blog to learn more.

We also look forward to seeing everyone at KubeCon 2019 in San Diego, California. Come to booth P19 to talk with us or get a personalized demo.

Tags: , Category: Uncategorized Comments closed

Code Commits: only half the story

Monday, 5 August, 2019

It’s not the first time I’ve been asked by a sales rep the following question: “The customer has looked at Stackalytics and is wondering why Rancher doesn’t have as many code commits as the competition. What do I say?”

For those of you unfamiliar with Stackalytics, it provides an activity snapshot, a developer selfie if you will, of commits and lines of code changed in different open source projects. Although a very worthwhile service, some vendors like to use it as proof of their technical prowess and commitment to an open-source project’s ecosystem.

But does the number of code commits by a vendor tell the full story?

Certainly, some would argue that it does. For example, whilst working at Canonical, I regularly came across customers who’d ask us why we made relatively few commits to upstream OpenStack when compared to other vendors. This was despite the Ubuntu OpenStack distribution being used by just about everybody within the community. It seems that now, at Rancher, we’re being asked to justify our Kubernetes credentials by a similar measure despite the fact that our eponymous Kubernetes management platform has been downloaded over 100,000,000 times.

Perhaps those evaluating vendors should be asking different questions like:

  • Is it possible that some vendors hire teams of engineers to focus solely on developing code for upstream Kubernetes?

  • As a customer, will you get access to the engineering expertise needed to make those code commits?

  • Does more upstream code commits mean that the vendor’s Kubernetes management platform is better than competitive products?

  • Is the vendor with the most code commits more engaged with the Kubernetes community than everyone else?

At every tradeshow I’ve been to this year, community members have come to the booth to thank me for the Rancher platform and what Rancher Labs does for the Kubernetes eco-system. They don’t care about code commits, they care about the business value we deliver.

Rancher helps tens of thousands of teams be successful with Kubernetes. Without it they couldn’t easily realise advanced DevOp capabilities like continuous delivery, canary/blue/green deployments, service autoscaling, automated DNS & load balancing, SSL and certificate management, secret management… etc. It’s these capabilities (plus not being locked into a single vendor ecosystem) that deliver extraordinary value to end users, their employers and to the wider Kubernetes community. Best of all – they don’t have to pay for it!

It’s also worth remembering that contributing to a large open source community like Kubernetes isn’t a single-threaded experience. k3s was launched by Rancher in March 2019 to huge excitement. k3s is a Kubernetes distribution designed to run production workloads in remote, resource constrained locations like in IoT devices or the network edge. Although the project isn’t measured by Stackalytics’ code commit counter, k3s amply demonstrates Rancher’s technical leadership and commitment to helping enterprises deploy Kubernetes from their core infrastructure to the network edge.

Building an Enterprise Kubernetes Strategy

For more information on how Rancher can help you build an enterprise Kubernetes strategy, download our recent whitepaper.

The Road to Agile IT is Paved with Containers

Tuesday, 30 July, 2019

The holy grail for any CMO looking for their next gig is to find the
perfect combination of addressable market, market timing, company, and
product. That’s why I am so excited to be joining the team at Rancher
Labs, the leader in container management software. Let’s look at all the
variables.

Market Opportunity & Timing

The market for containers is conservatively HUGE! What’s a
container? A container is a standard unit of software that packages up
code and all associated dependencies enabling an application to run
quickly and reliably from one computing environment to another. For
example, development teams are using containers to package entire
applications and move them to the cloud without the need to make any
code changes. Another example, containers make it easier to build
workflows for modern applications that run between on-premises and cloud
environments.

While containers are a good way to bundle and run your applications, you
also need to manage the containers that run the applications. That’s
where Kubernetes comes in. Kubernetes is an open source container
orchestration engine for automating deployment, scaling, and management
of containerized applications. Recent research indicates that
approximately 40% of enterprises are running Kubernetes in production
today, but in less than three years that number will increase to more
than 84%!

As infrastructure increasingly moves to multi-cloud (e.g. on-premises,
AWS, GCP, Azure) and enterprise applications become more complex,
development and IT operations teams need an effective way to manage
Kubernetes at scale.

Therein lies the opportunity!

Company and Product

If you don’t know already, Rancher Labs builds innovative, open source
software for enterprises leveraging containers to deliver
Kubernetes-as-a-Service. Rancher was founded by a group of cloud and
open source thought leaders who have already
made their mark at places like Cloud.com, Citrix, and GoDaddy. They
foresaw the need and created our flagship Rancher platform, which allows
users to easily manage all aspects of running Kubernetes in production,
on any infrastructure across the data center, cloud, branch offices and
the network edge.

Unlike solutions from competitors like Red Hat and Pivotal, our solution
delivers the ideal balance of flexibility and control, including:

  • Multi-Cluster Application Support: Kubernetes users can deploy and maintain their applications on multiple clusters from a single action, reducing the load on operations teams and increasing productivity and reliability for businesses running in hybrid-cloud, multi-cloud, or multi-cluster Kubernetes environments.
  • Support for Cloud Native Kubernetes Services: In addition to offering two certified Kubernetes distributions (RKE and k3s), Rancher provides complete flexibility by enabling enterprise customers to manage any Kubernetes distribution and any cloud-native Kubernetes service such as GKE, EKS, and AKS. For users, every Kubernetes cluster behaves the same way and has access to all of Rancher’s integrated workload management capabilities.
  • No Vendor Lock-In: As free and open source software, Rancher costs much less to own and operate than PKS and OpenShift while providing a more capable product that doesn’t lock you into any single vendor’s ecosystem.

Addressable market? Check! Market timing? Check! Company? Check!
Product? Check!

It doesn’t get any better than that!

While I am privileged to join Rancher, I am merely one small cog in the
big wheel of their momentum. Check out what’s happened since the start
of 2019 alone:

  • Customer Growth: We grew our customer base by 52% while YoY revenue grew 161%.
  • Product Innovation: We introduced major enhancements to Rancher with the release of version 2.2 and also launched new open source projects:
  • Funding – we raised another $25M in Series C funding, bringing the total amount raised to $55M. That means we’ve got loads of cash to invest in continuing our rapid growth.

You can read all about our momentum here, or to learn more, jump to
www.rancher.com.

#RunKubernetesEverywhere!

Tags: ,,, Category: Products, Rancher Kubernetes Comments closed

Kubernetes Adoption Driving Rancher Labs Momentum

Tuesday, 23 July, 2019

This week Rancher Labs announced a record 161% year-on-year revenue growth, along with a 52% increase in the number of customers in the first half of 2019. Other highlights from H1’19 included:

  • Closure of a $25M series C funding round
  • Doubling of international headcount as we continue our expansion into 12 countries
  • Software downloads surpassed 100 million making Rancher the industry’s most widely adopted Kubernetes software platform
  • General availability of Rancher 2.2
  • Continued investment in open source projects including Rio, Longhorn, k3s, and k3OS

You can find the complete release here.

We are grateful to our community of customers, partners, and users for the growth we achieved in the first half of 2019, and we will continue to gauge Rancher’s success in the larger context of enterprise adoption of Kubernetes. Rancher will continue to deliver value by enabling organizations to deploy and manage Kubernetes across their entire infrastructure.

Kubernetes Everywhere

Recent research reports that approximately 40% of enterprises are running Kubernetes in production today, but in less than three years that number will increase to more than 80%. What will drive that growth? Kubernetes helps organizations significantly increase the agility and efficiency of their software development teams, while also helping IT teams boost productivity, reduce costs and risks, and it moves organizations closer to achieving their hybrid-cloud goals.

As container usage becomes more widespread across an organization, balancing the needs of developers who want autonomy and agility with the needs of IT teams who want consistency and control can prove challenging. Whether your organization builds large clusters of infrastructure and then offers development teams shared access to them, or leaves individual departments or DevOps teams to decide for themselves how and where to use Kubernetes, it is not uncommon for tension to develop between those wanting to run Kubernetes in exactly the way they need it and IT teams that want to maintain security and control over how Kubernetes is implemented.

Rancher’s Role in Enabling Everywhere

Only Rancher is purpose-built to address the requirements of both developer teams and IT operations teams, thereby enabling organizations to deploy and manage Kubernetes at scale.

Here’s how:

  • Simplified Cluster Operations – In addition to offering two certified Kubernetes distros (RKE and k3s), Rancher enables enterprise customers to utilize any Kubernetes distribution or hosted Kubernetes service. Customers can use cloud-native Kubernetes services such as GKE, EKS, and AKS. By supporting any Kubernetes distribution or service, Rancher enables customers to implement Kubernetes in the most cost-effective way and operate Kubernetes clusters in the simplest way possible, while still leveraging the consistency of Kubernetes across all types of infrastructure.

  • Security & Policy Management – Rancher provides IT organizations with centralized management and control over all Kubernetes clusters, regardless of how they are implemented or operated. By managing security policies for all of your Kubernetes clusters in one place, Rancher minimizes human error and wasted energy. Rancher’s unified web UI replicates all functionality available within Kubernetes and includes tooling for Day Two operations. Full control via CLI and API is also available. Rancher is simple to install in any environment, integrates with user authentication platforms, and quickly starts to address many of the workflow challenges experienced by developer and operations teams who work with Kubernetes. A single Rancher installation can manage hundreds of Kubernetes clusters running on-premise or in any cloud. This provides technical teams with a seamless development experience and helps business leaders adopt a multi-cloud or hybrid-cloud strategy.

  • Shared Tools & Services – Rancher provides a rich set of shared tools and services on top of any Kubernetes cluster. Rancher ships with CI/CD, monitoring, alerting, logging, and all the tools needed to make your Kubernetes clusters immediately useful. Less time spent worrying about your infrastructure means more resources to invest in the accelerated delivery of innovative cloud-native applications.

So, while we are proud of our success in the first half of 2019, we are even more excited about the future! As Kubernetes continues to proliferate and grow in complexity, organizations will increasingly rely upon solutions like Rancher that enable them to run Kubernetes EVERYWHERE!

To learn more about Rancher, check us out at www.rancher.com.

For an introduction to Kubernetes, join an upcoming online training session.

Tags: ,, Category: Products, Rancher Kubernetes Comments closed

Announcing Preview Support for Istio

Thursday, 20 June, 2019

 

Today we are announcing support for Istio with Rancher 2.3 in Preview mode.

Why Istio?

Istio, and service mesh generally, has developed a huge amount of excitement
in the Kubernetes ecosystem. Istio promises to add fault tolerance, canary rollouts, A/B testing, monitoring
and metrics, tracing and observability, and authentication and authorization, eliminating the need for
developers to instrument or write specific code to enable these capabilities. In effect, developers can just
focus on their business logic and leave the rest to Kubernetes and Istio.

The claims above aren’t new. About 10 years ago, PaaS vendors made exactly the same claim and even delivered
on it to an extent. The problem was that their offerings required specific languages, frameworks, and, for
the most part, only worked with very simple applications. The workloads were also tied to the vendor’s
unique implementation, which meant that if you wanted your applications to use the PaaS services, you were
potentially locked-in for a very long time.

With containers and Kubernetes, these limitations are virtually nonexistent. As long as you can containerize
your application, Kubernetes can run it for you.

How Istio Works in Rancher 2.3 Preview 2

Our users count on us to make managing and operating Kubernetes and related tools and technologies easy,
without locking them in to a specific cloud vendor. With Istio, we take the same approach.

In this Preview mode, we provide users with a simple UI to enable Istio under the Tools menu. Reasonable
default configurations are provided but can be changed as required:

Announcing Istio

In order to monitor your traffic, Istio needs to inject an Envoy sidecar. In Rancher 2.3 Preview, users can
enable automatic sidecar injection for each namespace. Once this option is selected, Rancher will inject the
sidecar container into each workload:

Announcing Istio

Rancher’s simplified installation and configuration of Istio comes with a built-in, supported Kiali dashboard for traffic and telemetry visualization, Jaeger for tracing, and even its own Prometheus and Grafana (separate
instances than the ones used for Advanced Monitoring).

After you deploy workloads in the namespaces with automatic sidecar injection enabled, head over to the Istio
menu entry and observe the traffic as it flows across your microservice applications:

Announcing Istio

Clicking on Kiali, Jaeger, Prometheus, or Grafana will take you to the respective UI of each tool, where you
can find more details and options:

Announcing Istio

As mentioned earlier, the power of Istio is its ability to bring features like fault tolerance, circuit
breaking, canary deployment, and more to your services. To enable these, you will need to develop and apply
the appropriate YAML files. Istio is not supported for Windows workloads yet, so it should not be enabled in
Windows clusters.

Conclusion

Istio is one of the most talked about and requested features in the Rancher and Kubernetes communities today.
However, there are also a lot of questions around the best way to deploy and manage it. With Rancher 2.3.0
Preview 2, our goal is to make this journey quick and easy.

For release notes and installation steps, please visit
https://github.com/rancher/rancher/releases/tag/v2.3.0-alpha5

An Introduction to Big Data Concepts

Wednesday, 27 March, 2019

Gigantic amounts of data are being generated at high speeds by a variety of sources such as mobile devices, social media, machine logs, and multiple sensors surrounding us. All around the world, we produce vast amount of data and the volume of generated data is growing exponentially at a unprecedented rate. The pace of data generation is even being accelerated by the growth of new technologies and paradigms such as Internet of Things (IoT).

What is Big Data and How Is It Changing?

The definition of big data is hidden in the dimensions of the data. Data sets are considered “big data” if they have a high degree of the following three distinct dimensions: volume, velocity, and variety. Value and veracity are two other “V” dimensions that have been added to the big data literature in the recent years. Additional Vs are frequently proposed, but these five Vs are widely accepted by the community and can be described as follows:

  • Velocity: the speed at which the data is been generated
  • Volume: the amount of the data that is been generated
  • Variety: the diversity or different types of the data
  • Value: the worth of the data or the value it has
  • Veracity: the quality, accuracy, or trustworthiness of the data

Large volumes of data are generally available in either structured or unstructured formats. Structured data can be generated by machines or humans, has a specific schema or model, and is usually stored in databases. Structured data is organized around schemas with clearly defined data types. Numbers, date time, and strings are a few examples of structured data that may be stored in database columns. Alternatively, unstructured data does not have a predefined schema or model. Text files, log files, social media posts, mobile data, and media are all examples of unstructured data.

Based on a report provided by Gartner, an international research and consulting organization, the application of advanced big data analytics is part of the Gartner Top 10 Strategic Technology Trends for 2019, and is expected to drive new business opportunities. The same report also predicts that more than 40% of data science tasks will be automated by 2020, which will likely require new big data tools and paradigms.

By 2017, global internet usage reached 47% of the world’s population based on an infographic provided by DOMO. This indicates that an increasing number of people are starting to use mobile phones and that more and more devices are being connected to each other via smart cities, wearable devices, Internet of Things (IoT), fog computing, and edge computing paradigms. As internet usage spikes and other technologies such as social media, IoT devices, mobile phones, autonomous devices (e.g. robotics, drones, vehicles, appliances, etc) continue to grow, our lives will become more connected than ever and generate unprecedented amounts of data, all of which will require new technologies for processing.

The Scale of Data Generated by Everyday Interactions

At a large scale, the data generated by everyday interactions is staggering. Based on research conducted by DOMO, for every minute in 2018, Google conducted 3,877,140 searches, YouTube users watched 4,333,560 videos, Twitter users sent 473,400 tweets, Instagram users posted 49,380 photos, Netflix users streamed 97,222 hours of video, and Amazon shipped 1,111 packages. This is just a small glimpse of a much larger picture involving other sources of big data. It seems like the internet is pretty busy, does not it? Moreover, it is expected that mobile traffic will experience tremendous growth past its present numbers and that the world’s internet population is growing significantly year-over-year. By 2020, the report anticipates that 1.7MB of data will be created per person per second. Big data is getting even bigger.

At small scale, the data generated on a daily basis by a small business, a start up company, or a single sensor such as a surveillance camera is also huge. For example, a typical IP camera in a surveillance system at a shopping mall or a university campus generates 15 frame per second and requires roughly 100 GB of storage per day. Consider the storage amount and computing requirements if those camera numbers are scaled to tens or hundreds.

Big Data in the Scientific Community

Scientific projects such as CERN, which conducts research on what the universe is made of, also generate massive amounts of data. The Large Hadron Collider (LHC) at CERN is the world’s largest and most powerful particle accelerator. It consists of a 27-kilometer ring of superconducting magnets along with some additional structures to accelerate and boost the energy of particles along the way.

During the spin, particles collide with LHC detectors roughly 1 billion times per second, which generates around 1 petabyte of raw digital “collision event” data per second. This unprecedented volume of data is a great challenge that cannot be resolved with CERN’s current infrastructure. To work around this, the generated raw data is filtered and only the “important” events are processed to reduce the volume of data. Consider the challenging processing requirements for this task.

The four big LHC experiments, named ALICE, ATLAS, CMS, and LHCb, are among the biggest generators of data at CERN, and the rate of the data processed and stored on servers by these experiments is expected to reach about 25 GB/s (gigabyte per second). As of June 29, 2017, the CERN Data Center announced that they had passed the 200 petabytes milestone of data archived permanently in their storage units.

Why Big Data Tools are Required

The scale of the data generated by famous well-known corporations, small scale organizations, and scientific projects is growing at an unprecedented level. This can be clearly seen by the above scenarios and by remembering again that the scale of this data is getting even bigger.

On the one hand, the mountain of the data generated presents tremendous processing, storage, and analytics challenges that need to be carefully considered and handled. On the other hand, traditional Relational Database Management Systems (RDBMS) and data processing tools are not sufficient to manage this massive amount of data efficiently when the scale of data reaches terabytes or petabytes. These tools lack the ability to handle large volumes of data efficiently at scale. Fortunately, big data tools and paradigms such as Hadoop and MapReduce are available to resolve these big data challenges.

Analyzing big data and gaining insights from it can help organizations make smart business decisions and improve their operations. This can be done by uncovering hidden patterns in the data and using them to reduce operational costs and increase profits. Because of this, big data analytics plays a crucial role for many domains such as healthcare, manufacturing, and banking by resolving data challenges and enabling them to move faster.

Big Data Analytics Tools

Since the compute, storage, and network requirements for working with large data sets are beyond the limits of a single computer, there is a need for paradigms and tools to crunch and process data through clusters of computers in a distributed fashion. More and more computing power and massive storage infrastructure are required for processing this massive data either on-premise or, more typically, at the data centers of cloud service providers.

In addition to the required infrastructure, various tools and components must be brought together to solve big data problems. The Hadoop ecosystem is just one of the platforms helping us work with massive amounts of data and discover useful patterns for businesses.

Below is a list of some of the tools available and a description of their roles in processing big data:

  • MapReduce: MapReduce is a distributed computing paradigm developed to process vast amount of data in parallel by splitting a big task into smaller map and reduce oriented tasks.
  • HDFS: The Hadoop Distributed File System is a distributed storage and file system used by Hadoop applications.
  • YARN: The resource management and job scheduling component in the Hadoop ecosystem.
  • Spark: A real-time in-memory data processing framework.
  • PIG/HIVE: SQL-like scripting and querying tools for data processing and simplifying the complexity of MapReduce programs.
  • HBase, MongoDB, Elasticsearch: Examples of a few NoSQL databases.
  • Mahout, Spark ML: Tools for running scalable machine learning algorithms in a distributed fashion.
  • Flume, Sqoop, Logstash: Data integration and ingestion of structured and unstructured data.
  • Kibana: A tool to visualize Elasticsearch data.

Conclusion

To summarize, we are generating a massive amount of data in our everyday life, and that number is continuing to rise. Having the data alone does not improve an organization without analyzing and discovering its value for business intelligence. It is not possible to mine and process this mountain of data with traditional tools, so we use big data pipelines to help us ingest, process, analyze, and visualize these tremendous amounts of data.

Learn to deploy databases in production on Kubernetes

For more training in big data and database management, watch our free online training on successfully running a database in production on kubernetes.

Tags: ,,, Category: Uncategorized Comments closed

Considerations When Designing Distributed Systems

Monday, 11 March, 2019

Introduction

Today’s applications are marvels of distributed systems development. Each function or service that makes up
an application may be executing on a different system, based upon a different system architecture, that is
housed in a different geographical location, and written in a different computer language. Components of
today’s applications might be hosted on a powerful system carried in the owner’s pocket and communicating
with application components or services that are replicated in data centers all over the world.

What’s amazing about this, is that individuals using these applications typically are not aware of the
complex environment that responds to their request for the local time, local weather, or for directions to
their hotel.

Let’s pull back the curtain and look at the industrial sorcery that makes this all possible and contemplate
the thoughts and guidelines developers should keep in mind when working with this complexity.

The Evolution of System Design

Designing Distributed

Figure 1: Evolution of system design over time

Source: Interaction Design Foundation, The
Social Design of Technical Systems: Building technologies for communities

Application development has come a long way from the time that programmers wrote out applications, hand
compiled them into the language of the machine they were using, and then entered individual machine
instructions and data directly into the computer’s memory using toggle switches.

As processors became more and more powerful, system memory and online storage capacity increased, and
computer networking capability dramatically increased, approaches to development also changed. Data can now
be transmitted from one side of the planet to the other faster than it used to be possible for early
machines to move data from system memory into the processor itself!

Let’s look at a few highlights of this amazing transformation.

Monolithic Design

Early computer programs were based upon a monolithic design with all of the application components were
architected to execute on a single machine. This meant that functions such as the user interface (if users
were actually able to interact with the program), application rules processing, data management, storage
management, and network management (if the computer was connected to a computer network) were all contained
within the program.

While simpler to write, these programs become increasingly complex, difficult to document, and hard to update
or change. At this time, the machines themselves represented the biggest cost to the enterprise and so
applications were designed to make the best possible use of the machines.

Client/Server Architecture

As processors became more powerful, system and online storage capacity increased, and data communications
became faster and more cost-efficient, application design evolved to match pace. Application logic was
refactored or decomposed, allowing each to execute on different machines and the ever-improving networking
was inserted between the components. This allowed some functions to migrate to the lowest cost computing
environment available at the time. The evolution flowed through the following stages:

Terminals and Terminal Emulation

Early distributed computing relied on special-purpose user access devices called terminals. Applications had
to understand the communications protocols they used and issue commands directly to the devices. When
inexpensive personal computing (PC) devices emerged, the terminals were replaced by PCs running a terminal
emulation program.

At this point, all of the components of the application were still hosted on a single mainframe or
minicomputer.

Light Client

As PCs became more powerful, supported larger internal and online storage, and network performance increased,
enterprises segmented or factored their applications so that the user interface was extracted and executed
on a local PC. The rest of the application continued to execute on a system in the data center.

Often these PCs were less costly than the terminals that they replaced. They also offered additional
benefits. These PCs were multi-functional devices. They could run office productivity applications that
weren’t available on the terminals they replaced. This combination drove enterprises to move to
client/server application architectures when they updated or refreshed their applications.

Midrange Client

PC evolution continued at a rapid pace. Once more powerful systems with larger storage capacities were
available, enterprises took advantage of them by moving even more processing away from the expensive systems
in the data center out to the inexpensive systems on users’ desks. At this point, the user interface and
some of the computing tasks were migrated to the local PC.

This allowed the mainframes and minicomputers (now called servers) to have a longer useful life, thus
lowering the overall cost of computing for the enterprise.

Heavy client

As PCs become more and more powerful, more application functions were migrated from the backend servers. At
this point, everything but data and storage management functions had been migrated.

Enter the Internet and the World Wide Web

The public internet and the World Wide Web emerged at this time. Client/server computing continued to be
used. In an attempt to lower overall costs, some enterprises began to re-architect their distributed
applications so they could use standard internet protocols to communicate and substituted a web browser for
the custom user interface function. Later, some of the application functions were rewritten in Javascript so
that they could execute locally on the client’s computer.

Server Improvements

Industry innovation wasn’t focused solely on the user side of the communications link. A great deal of
improvement was made to the servers as well. Enterprises began to harness together the power of many
smaller, less expensive industry standard servers to support some or all of their mainframe-based functions.
This allowed them to reduce the number of expensive mainframe systems they deployed.

Soon, remote PCs were communicating with a number of servers, each supporting their own component of the
application. Special-purpose database and file servers were adopted into the environment. Later, other
application functions were migrated into application servers.

Networking was another area of intense industry focus. Enterprises began using special-purpose networking
servers that provided fire walls and other security functions, file caching functions to accelerate data
access for their applications, email servers, web servers, web application servers, distributed name servers
that kept track of and controlled user credentials for data and application access. The list of networking
services that has been encapsulated in an appliance server grows all the time.

Object-Oriented Development

The rapid change in PC and server capabilities combined with the dramatic price reduction for processing
power, memory and networking had a significant impact on application development. No longer where hardware
and software the biggest IT costs. The largest costs were communications, IT services (the staff), power,
and cooling.

Software development, maintenance, and IT operations took on a new importance and the development process was
changed to reflect the new reality that systems were cheap and people, communications, and power were
increasingly expensive.

Designing Distributed

Figure 2: Worldwide IT spending forcast

Source: Gartner Worldwide IT
Spending Forecast, Q1 2018

Enterprises looked to improved data and application architectures as a way to make the best use of their
staff. Object-oriented applications and development approaches were the result. Many programming languages
such as the following supported this approach:

  • C++
  • C#
  • COBOL
  • Java
  • PHP
  • Python
  • Ruby

Application developers were forced to adapt by becoming more systematic when defining and documenting data
structures. This approach also made maintaining and enhancing applications easier.

Open-Source Software

Opensource.com offers the following definition for open-source
software: “Open source software is software with source code that anyone can inspect, modify, and enhance.”
It goes on to say that, “some software has source code that only the person, team, or organization who
created it — and maintains exclusive control over it — can modify. People call this kind of software
‘proprietary’ or ‘closed source’ software.”

Only the original authors of proprietary software can legally copy, inspect, and alter that software. And in
order to use proprietary software, computer users must agree (often by accepting a license displayed the
first time they run this software) that they will not do anything with the software that the software’s
authors have not expressly permitted. Microsoft Office and Adobe Photoshop are examples of proprietary
software.

Although open-source software has been around since the very early days of computing, it came to the
forefront in the 1990s when complete open-source operating systems, virtualization technology, development
tools, database engines, and other important functions became available. Open-source technology is often a
critical component of web-based and distributed computing. Among others, the open-source offerings in the
following categories are popular today:

  • Development tools
  • Application support
  • Databases (flat file, SQL, No-SQL, and in-memory)
  • Distributed file systems
  • Message passing/queueing
  • Operating systems
  • Clustering

Distributed Computing

The combination of powerful systems, fast networks, and the availability of sophisticated software has driven
major application development away from monolithic towards more highly distributed approaches. Enterprises
have learned, however, that sometimes it is better to start over than to try to refactor or decompose an
older application.

When enterprises undertake the effort to create distributed applications, they often discover a few pleasant
side effects. A properly designed application, that has been decomposed into separate functions or services,
can be developed by separate teams in parallel.

Rapid application development and deployment, also known as DevOps, emerged as a way to take advantage of the
new environment.

Service-Oriented Architectures

As the industry evolved beyond client/server computing models to an even more distributed approach, the
phrase “service-oriented architecture” emerged. This approach was built on distributed systems concepts,
standards in message queuing and delivery, and XML messaging as a standard approach to sharing data and data
definitions.

Individual application functions are repackaged as network-oriented services that receive a message
requesting they perform a specific service, they perform that service, and then the response is sent back to
the function that requested the service.

This approach offers another benefit, the ability for a given service to be hosted in multiple places around
the network. This offers both improved overall performance and improved reliability.

Workload management tools were developed that receive requests for a service, review the available capacity,
forward the request to the service with the most available capacity, and then send the response back to the
requester. If a specific service doesn’t respond in a timely fashion, the workload manager simply forwards
the request to another instance of the service. It would also mark the service that didn’t respond as failed
and wouldn’t send additional requests to it until it received a message indicating that it was still alive
and healthy.

What Are the Considerations for Distributed Systems

Now that we’ve walked through over 50 years of computing history, let’s consider some rules of thumb for
developers of distributed systems. There’s a lot to think about because a distributed solution is likely to
have components or services executing in many places, on different types of systems, and messages must be
passed back and forth to perform work. Care and consideration are absolute requirements to be successful
creating these solutions. Expertise must also be available for each type of host system, development tool,
and messaging system in use.

Nailing Down What Needs to Be Done

One of the first things to consider is what needs to be accomplished! While this sounds simple, it’s
incredibly important.

It’s amazing how many developers start building things before they know, in detail, what is needed. Often,
this means that they build unnecessary functions and waste their time. To quote Yogi Berra, “if you don’t
know where you are going, you’ll end up someplace else.”

A good place to start is knowing what needs to be done, what tools and services are already available, and
what people using the final solution should see.

Interactive Versus Batch

Since fast responses and low latency are often requirements, it would be wise to consider what should be done
while the user is waiting and what can be put into a batch process that executes on an event-driven or
time-driven schedule.

After the initial segmentation of functions has been considered, it is wise to plan when background, batch
processes need to execute, what data do these functions manipulate, and how to make sure these functions are
reliable, are available when needed, and how to prevent the loss of data.

Where Should Functions Be Hosted?

Only after the “what” has been planned in fine detail, should the “where” and “how” be considered. Developers
have their favorite tools and approaches and often will invoke them even if they might not be the best
choice. As Bernard Baruch was reported to say, “if all you have is a hammer, everything looks like a nail.”

It is also important to be aware of corporate standards for enterprise development. It isn’t wise to select a
tool simply because it is popular at the moment. That tool just might do the job, but remember that
everything that is built must be maintained. If you build something that only you can understand or
maintain, you may just have tied yourself to that function for the rest of your career. I have personally
created functions that worked properly and were small and reliable. I received telephone calls regarding
these for ten years after I left that company because later developers could not understand how the
functions were implemented. The documentation I wrote had been lost long earlier.

Each function or service should be considered separately in a distributed solution. Should the function be
executed in an enterprise data center, in the data center of a cloud services provider or, perhaps, in both.
Consider that there are regulatory requirements in some industries that direct the selection of where and
how data must be maintained and stored.

Other considerations include:

  • What type of system should be the host of that function. Is one system architecture better for that
    function? Should the system be based upon ARM, X86, SPARC, Precision, Power, or even be a Mainframe?
  • Does a specific operating system provide a better computing environment for this function? Would Linux,
    Windows, UNIX, System I, or even System Z be a better platform?
  • Is a specific development language better for that function? Is a specific type of data management tool?
    Is a Flat File, SQL database, No-SQL database, or a non-structured storage mechanism better?
  • Should the function be hosted in a virtual machine or a container to facilitate function mobility,
    automation and orchestration?

Virtual machines executing Windows or Linux were frequently the choice in the early 2000s. While they offered
significant isolation for functions and made it easily possible to restart or move them when necessary,
their processing, memory and storage requirements were rather high. Containers, another approach to
processing virtualization, are the emerging choice today because they offer similar levels of isolation, the
ability to restart and migrate functions and consume far less processing power, memory or storage.

Performance

Performance is another critical consideration. While defining the functions or services that make up a
solution, the developers should be aware if they have significant processing, memory or storage
requirements. It might be wise to look at these functions closely to learn if that can be further subdivided
or decomposed.

Further segmentation would allow an increase in parallelization which would potentially offer performance
improvements. The trade off, of course, is that this approach also increases complexity and, potentially,
makes them harder to manage and to make secure.

Reliability

In high stakes enterprise environments, solution reliability is essential. The developer must consider when
it is acceptable to force people to re-enter data, re-run a function, or when a function can be unavailable.

Database developers ran into this issue in the 1960s and developed the concept of an atomic function. That
is, the function must complete or the partial updates must be rolled back leaving the data in the state it
was in before the function began. This same mindset must be applied to distributed systems to ensure that
data integrity is maintained even in the event of service failures and transaction disruptions.

Functions must be designed to totally complete or roll back intermediate updates. In critical message passing
systems, messages must be stored until an acknowledgement that a message has been received comes in. If such
a message isn’t received, the original message must be resent and a failure must be reported to the
management system.

Manageability

Although not as much fun to consider as the core application functionality, manageability is a key factor in
the ongoing success of the application. All distributed functions must be fully instrumented to allow
administrators to both understand the current state of each function and to change function parameters if
needed. Distributed systems, after all, are constructed of many more moving parts than the monolithic
systems they replace. Developers must be constantly aware of making this distributed computing environment
easy to use and maintain.

This brings us to the absolute requirement that all distributed functions must be fully instrumented to allow
administrators to understand their current state. After all, distributed systems are inherently more complex
and have more moving parts than the monolithic systems they replace.

Security

Distributed system security is an order of magnitude more difficult than security in a monolithic
environment. Each function must be made secure separately and the communication links between and among the
functions must also be made secure. As the network grows in size and complexity, developers must consider
how to control access to functions, how to make sure than only authorized users can access these function,
and to to isolate services from one other.

Security is a critical element that must be built into every function, not added on later. Unauthorized
access to functions and data must be prevented and reported.

Privacy

Privacy is the subject of an increasing number of regulations around the world. Examples like the European
Union’s GDPR and the U.S. HIPPA regulations are important considerations for any developer of
customer-facing systems.

Mastering Complexity

Developers must take the time to consider how all of the pieces of a complex computing environment fit
together. It is hard to maintain the discipline that a service should encapsulate a single function or,
perhaps, a small number of tightly interrelated functions. If a given function is implemented in multiple
places, maintaining and updating that function can be hard. What would happen when one instance of a
function doesn’t get updated? Finding that error can be very challenging.

This means it is wise for developers of complex applications to maintain a visual model that shows where each
function lives so it can be updated if regulations or business requirements change.

Often this means that developers must take the time to document what they did, when changes were made, as
well as what the changes were meant to accomplish so that other developers aren’t forced to decipher mounds
of text to learn where a function is or how it works.

To be successful as a architect of distributed systems, a developer must be able to master complexity.

Approaches Developers Must Master

Developers must master decomposing and refactoring application architectures, thinking in terms of teams, and
growing their skill in approaches to rapid application development and deployment (DevOps). After all, they
must be able to think systematically about what functions are independent of one another and what functions
rely on the output of other functions to work. Functions that rely upon one other may be best implemented as
a single service. Implementing them as independent functions might create unnecessary complexity and result
in poor application performance and impose an unnecessary burden on the network.

Virtualization Technology Covers Many Bases

Virtualization is a far bigger category than just virtual machine software or containers. Both of these
functions are considered processing virtualization technology. There are at least seven different types of
virtualization technology in use in modern applications today. Virtualization technology is available to
enhance how users access applications, where and how applications execute, where and how processing happens,
how networking functions, where and how data is stored, how security is implemented, and how management
functions are accomplished. The following model of virtualization technology might be helpful to developers
when they are trying to get their arms around the concept of virtualization:

Designing Distributed

Figure 3: Architure of virtualized systems

Source: 7 Layer Virtualizaiton Model, VirtualizationReview.com

Think of Software-Defined Solutions

It is also important for developers to think in terms of “software defined” solutions. That is, to segment
the control from the actual processing so that functions can be automated and orchestrated.

Tools and Strategies That Can Help

Developers shouldn’t feel like they are on their own when wading into this complex world. Suppliers and
open-source communities offer a number of powerful tools. Various forms of virtualization technology can be
a developer’s best friend.

Virtualization Technology Can Be Your Best Friend

  • Containers make it possible to easily develop functions that can execute without
    interfering with one another and can be migrated from system to system based upon workload demands.
  • Orchestration technology makes it possible to control many functions to ensure they are
    performing well and are reliable. It can also restart or move them in a failure scenario.
  • Supports incremental development: functions can be developed in parallel and deployed
    as they are ready. They also can be updated with new features without requiring changes elsewhere.
  • Supports highly distributed systems: functions can be deployed locally in the
    enterprise data center or remotely in the data center of a cloud services provider.

Think In Terms of Services

This means that developers must think in terms of services and how services can communicate with one another.

Well-Defined APIs

Well defined APIs mean that multiple teams can work simultaneously and still know that everything will fit
together as planned. This typically means a bit more work up front, but it is well worth it in the end. Why?
Because overall development can be faster. It also makes documentation easier.

Support Rapid Application Development

This approach is also perfect for rapid application development and rapid prototyping, also known as DevOps.
Properly executed, DevOps also produces rapid time to deployment.

Think In Terms of Standards

Rather than relying on a single vendor, the developer of distributed systems would be wise to think in terms
of multi-vendor, international standards. This approach avoids vendor lock-in and makes finding expertise
much easier.

Summary

It’s interesting to note how guidelines for rapid application development and deployment of distributed
systems start with “take your time.” It is wise to plan out where you are going and what you are going to do
otherwise you are likely to end up somewhere else, having burned through your development budget, and have
little to show for it.

Sign up for Online Training

To continue to learn about the tools, technologies, and practices in the modern development landscape, sign up for free online training sessions. Our engineers host
weekly classes on Kubernetes, containers, CI/CD, security, and more.