An Introduction to Big Data Concepts

Wednesday, 27 March, 2019

Gigantic amounts of data are being generated at high speeds by a variety of sources such as mobile devices, social media, machine logs, and multiple sensors surrounding us. All around the world, we produce vast amount of data and the volume of generated data is growing exponentially at a unprecedented rate. The pace of data generation is even being accelerated by the growth of new technologies and paradigms such as Internet of Things (IoT).

What is Big Data and How Is It Changing?

The definition of big data is hidden in the dimensions of the data. Data sets are considered “big data” if they have a high degree of the following three distinct dimensions: volume, velocity, and variety. Value and veracity are two other “V” dimensions that have been added to the big data literature in the recent years. Additional Vs are frequently proposed, but these five Vs are widely accepted by the community and can be described as follows:

  • Velocity: the speed at which the data is been generated
  • Volume: the amount of the data that is been generated
  • Variety: the diversity or different types of the data
  • Value: the worth of the data or the value it has
  • Veracity: the quality, accuracy, or trustworthiness of the data

Large volumes of data are generally available in either structured or unstructured formats. Structured data can be generated by machines or humans, has a specific schema or model, and is usually stored in databases. Structured data is organized around schemas with clearly defined data types. Numbers, date time, and strings are a few examples of structured data that may be stored in database columns. Alternatively, unstructured data does not have a predefined schema or model. Text files, log files, social media posts, mobile data, and media are all examples of unstructured data.

Based on a report provided by Gartner, an international research and consulting organization, the application of advanced big data analytics is part of the Gartner Top 10 Strategic Technology Trends for 2019, and is expected to drive new business opportunities. The same report also predicts that more than 40% of data science tasks will be automated by 2020, which will likely require new big data tools and paradigms.

By 2017, global internet usage reached 47% of the world’s population based on an infographic provided by DOMO. This indicates that an increasing number of people are starting to use mobile phones and that more and more devices are being connected to each other via smart cities, wearable devices, Internet of Things (IoT), fog computing, and edge computing paradigms. As internet usage spikes and other technologies such as social media, IoT devices, mobile phones, autonomous devices (e.g. robotics, drones, vehicles, appliances, etc) continue to grow, our lives will become more connected than ever and generate unprecedented amounts of data, all of which will require new technologies for processing.

The Scale of Data Generated by Everyday Interactions

At a large scale, the data generated by everyday interactions is staggering. Based on research conducted by DOMO, for every minute in 2018, Google conducted 3,877,140 searches, YouTube users watched 4,333,560 videos, Twitter users sent 473,400 tweets, Instagram users posted 49,380 photos, Netflix users streamed 97,222 hours of video, and Amazon shipped 1,111 packages. This is just a small glimpse of a much larger picture involving other sources of big data. It seems like the internet is pretty busy, does not it? Moreover, it is expected that mobile traffic will experience tremendous growth past its present numbers and that the world’s internet population is growing significantly year-over-year. By 2020, the report anticipates that 1.7MB of data will be created per person per second. Big data is getting even bigger.

At small scale, the data generated on a daily basis by a small business, a start up company, or a single sensor such as a surveillance camera is also huge. For example, a typical IP camera in a surveillance system at a shopping mall or a university campus generates 15 frame per second and requires roughly 100 GB of storage per day. Consider the storage amount and computing requirements if those camera numbers are scaled to tens or hundreds.

Big Data in the Scientific Community

Scientific projects such as CERN, which conducts research on what the universe is made of, also generate massive amounts of data. The Large Hadron Collider (LHC) at CERN is the world’s largest and most powerful particle accelerator. It consists of a 27-kilometer ring of superconducting magnets along with some additional structures to accelerate and boost the energy of particles along the way.

During the spin, particles collide with LHC detectors roughly 1 billion times per second, which generates around 1 petabyte of raw digital “collision event” data per second. This unprecedented volume of data is a great challenge that cannot be resolved with CERN’s current infrastructure. To work around this, the generated raw data is filtered and only the “important” events are processed to reduce the volume of data. Consider the challenging processing requirements for this task.

The four big LHC experiments, named ALICE, ATLAS, CMS, and LHCb, are among the biggest generators of data at CERN, and the rate of the data processed and stored on servers by these experiments is expected to reach about 25 GB/s (gigabyte per second). As of June 29, 2017, the CERN Data Center announced that they had passed the 200 petabytes milestone of data archived permanently in their storage units.

Why Big Data Tools are Required

The scale of the data generated by famous well-known corporations, small scale organizations, and scientific projects is growing at an unprecedented level. This can be clearly seen by the above scenarios and by remembering again that the scale of this data is getting even bigger.

On the one hand, the mountain of the data generated presents tremendous processing, storage, and analytics challenges that need to be carefully considered and handled. On the other hand, traditional Relational Database Management Systems (RDBMS) and data processing tools are not sufficient to manage this massive amount of data efficiently when the scale of data reaches terabytes or petabytes. These tools lack the ability to handle large volumes of data efficiently at scale. Fortunately, big data tools and paradigms such as Hadoop and MapReduce are available to resolve these big data challenges.

Analyzing big data and gaining insights from it can help organizations make smart business decisions and improve their operations. This can be done by uncovering hidden patterns in the data and using them to reduce operational costs and increase profits. Because of this, big data analytics plays a crucial role for many domains such as healthcare, manufacturing, and banking by resolving data challenges and enabling them to move faster.

Big Data Analytics Tools

Since the compute, storage, and network requirements for working with large data sets are beyond the limits of a single computer, there is a need for paradigms and tools to crunch and process data through clusters of computers in a distributed fashion. More and more computing power and massive storage infrastructure are required for processing this massive data either on-premise or, more typically, at the data centers of cloud service providers.

In addition to the required infrastructure, various tools and components must be brought together to solve big data problems. The Hadoop ecosystem is just one of the platforms helping us work with massive amounts of data and discover useful patterns for businesses.

Below is a list of some of the tools available and a description of their roles in processing big data:

  • MapReduce: MapReduce is a distributed computing paradigm developed to process vast amount of data in parallel by splitting a big task into smaller map and reduce oriented tasks.
  • HDFS: The Hadoop Distributed File System is a distributed storage and file system used by Hadoop applications.
  • YARN: The resource management and job scheduling component in the Hadoop ecosystem.
  • Spark: A real-time in-memory data processing framework.
  • PIG/HIVE: SQL-like scripting and querying tools for data processing and simplifying the complexity of MapReduce programs.
  • HBase, MongoDB, Elasticsearch: Examples of a few NoSQL databases.
  • Mahout, Spark ML: Tools for running scalable machine learning algorithms in a distributed fashion.
  • Flume, Sqoop, Logstash: Data integration and ingestion of structured and unstructured data.
  • Kibana: A tool to visualize Elasticsearch data.

Conclusion

To summarize, we are generating a massive amount of data in our everyday life, and that number is continuing to rise. Having the data alone does not improve an organization without analyzing and discovering its value for business intelligence. It is not possible to mine and process this mountain of data with traditional tools, so we use big data pipelines to help us ingest, process, analyze, and visualize these tremendous amounts of data.

Learn to deploy databases in production on Kubernetes

For more training in big data and database management, watch our free online training on successfully running a database in production on kubernetes.

Tags: ,,, Category: Uncategorized Comments closed

5 Reasons To Attend SUSECON 2019

Wednesday, 27 March, 2019

SUSECON 2019 is just days away but there’s still time to register.

Attendees will learn how they can use open software-defined infrastructure and application delivery solutions to reduce costs and complexity, anticipate and quickly leverage the latest advancements, and move the business forward while reducing unnecessary risk.

5 Reasons to register today:
  • Education – Learn the latest developments in enterprise-class Linux, OpenStack, Ceph storage, Kubernetes, Cloud Foundry, and more.
  • Training – 100+ Hours of hands-on technology training from those who are creating it and gain new insights to solve your current business problems.
  • Live sessions – 150+ sessions to choose from, presented by SUSE Engineers, Product Managers, customers, partners, and community enthusiasts.
  • Certifications – Complimentary as part of your SUSECON certification.
  • Networking – SUSECON is a great networking opportunity where you can make tons of connections, old and new.

Whether you’ve already signed up or are still deciding, check out the agenda for this year’s event and start planning your time in Nashville, TN. See you there.

Digital Transformation is Hard, Let Global Services Make IT Easy!

Tuesday, 26 March, 2019

Digital Transformation is HardWe’ve all heard the mantra by now:  It’s time for every business to transform to become a digital business.  You know this is the truth – your customers demand it in this “always-on, I want it now world” that we live in.  And, if you are not even thinking about digital transformation, you’re being surpassed by your competitors.

Consider this, from a recent IDG paper:

  • 89% of organizations have adopted or have plans to adopt a digital strategy
  • More than 1/3 of businesses have started their transformation journey
  • Almost 50% of businesses are in the very early stages (gathering information or starting to formulate a plan

If you fall into the last category, what is holding you back?

Maybe it’s the need to maintain your day to day operations AND the need to transform.  And, you are expected to do this all with the SAME staff that is already overworked and overwhelmed.  Maybe you don’t have the correct skill set.  Or maybe you are facing culture issues. (read Ryan’s blog on adopting a software-defined infrastructure).

How Do I Start My Transformation?

Great question!  And I’ve got three answers!

  • Come to SUSECON ’19 next week in Nashville! There’s still time to register and if you don’t attend you’ll have to wait a full year for SUSECON ’20!   We’re rolling out the green carpet for you with:
    • 150+ sessions
    • 100+ hours of hands-on technology
    • 35+ partners in the technology showcase
    • 20+ expert led demo stations
    • 10+ complimentary certification exams
  • Stop by the Services Kiosk. Talk to one of the many SUSE technical and product experts that will be staffing the kiosk.  While you’re there, enter to win one of the daily drawings for up to one full week of complimentary consulting. Use those consulting hours to solve some of your biggest headaches… from defining your business outcomes to integration with your legacy solutions to migration issues and knowledge transfer.
  • Check out our SUSE Global Services offerings. Engage with the trusted partner that built your solutions.  From discovery and design workshops to premium support services offering and more, SUSE Global Services is here and dedicated to your business success. We’ve got the answers you’re looking for.

 

SUSECON:  The Place to Be!

Yes, transformation is hard.  But it doesn’t have to be.  Come to SUSECON ’19 and learn how SUSE, the open, open source company is all about helping you to build “Your Kind of Open.”

Container Segmentation Strategies and Patterns

Tuesday, 12 March, 2019

At a recent container security conference the topic of ‘container segmentation patterns’ came up, and it became clear that many security architects are wrestling with how to best segment workload communication in the dynamic environment of containers. The question was also raised “Is the DMZ dead?”

The concept of network segmentation has been around for a while and is considered a best practice to achieve ‘defense in depth’ for business critical applications. Proper segmentation can protect applications from hackers as well as limit the ‘blast radius’ in the case of a breach. So it makes sense that devops and security professionals would wonder if container segmentation would provide similar protections for a container network, and how this could be possible with network plug-ins, CNIs, and SDNs.

Old Ways of Segmentation – Patterns

Before we get into how container segmentation works let’s review some of the common traditional patterns for achieving network segmentation:

  • The DMZ. All external access including internet application front-ends such as web servers are placed in the DMZ, which uses perimeter firewalls to restrict inbound and outbound traffic.
  • Physical Network Segments.  Even behind the DMZ, different network segments are used for applications with different trust levels and to further segment communication to sensitive data such as databases. Perimeter firewalls are used to contain traffic in each segment.
  • Data Center Segmentation. In extreme cases, segmentation is achieved by placing applications and infrastructure in separate data centers, each with its own security protections.
  • VPC and Security Groups. More recently, in public cloud services VPCs and security groups are used to segment traffic with network segmentation policies and ingress/egress firewall rules easily applied to different VPCs. However, this is still a tedious manual configuration of L3/L4 policies that can’t protect containers.
  • Separate Data and Control Networks. Less common but with the same goal of separating traffic to control attacks, control plane and monitoring traffic is segmented on each server to separate them from data transmissions, minimizing the possibility of data breaches from monitoring and system tools.

The patterns above all attempt to segment network communications according to varying trust levels of the applications running in each segment. One commonality between them is the use of physical network controls and traditional firewalls to separate traffic. Even VPCs are based on traditional notions of a physical network segment. As we’ll discover later, in a truly cloud native environment, these segmentation techniques become increasingly ineffective as workloads become dynamically deployed across traditional network boundaries.

The DMZ is Dead, Or Is It?

The general consensus around the room at this gathering of container security and operations people was that ‘the DMZ is dead.’ In this world of overlay networks, Kubernetes and public cloud providers, the old way of thinking of a DMZ to segment all internet facing applications is no longer relevant. In reality, DMZs will still exist, but they will be almost invisible, or irrelevant, to the security discussion, because they are not the primary way to protect access to applications and databases.

Given this realization, how do security architects provide network visibility and protection in an environment where external access frequently comes through an ingress into a container cluster directly from the internet? In addition, ingress/egress connections to container based api services must be allowed in what was traditionally considered an ‘east-west’ flow of traffic. It seems in cloud environments that the definitions of north-south and east-west traffic are becoming blurred.

What is Segmentation in a Cloud-native, Container-based World?

Container segmentation is the practice of segmenting container communications so only authorized connections between containers are allowed. In practice, because containers are typically created from a service concept by orchestration tools such as Kubernetes, container segmentation can be enforced at the service level. Multiple containers scaling up from the same image/service should not require different network segmentation policies in most cases.

For example, these are layer 7 segmentation rules from the NeuVector console.

These whitelist rules allow connections from one Kubernetes service (all pods) to another and requires a certain protocol to be used. For example, rule 10002 requires the redis application protocol between the nodejs demo pods and redis demo pods.

Container segmentation is often called micro-segmentation or nano-segmentation because containers are often deployed as microservices which can be dynamically deployed and scaled across a Kubernetes cluster. Because different services can be deployed across a shared network and servers (or VMs, hosts), and each workload or pod has its own network addressable IP address, container segmentation policies can be difficult to create and enforce.

However, without the ability to segment container connections and enforce network restrictions the blast radius of an attack can be the entire cluster, or worse yet, the entire container deployment across clouds.

What’s needed is more of a virtualized network segmentation capability that is aligned more tightly with how cloud-native container services are deployed, as shown below.

Container segmentation can provide the required protection regardless of where the workload is deployed and give confidence to the security and devops teams that unauthorized connections between segments can be prevented, or at least detected and alerted.

What About Namespaces? Network Policy?

Going back to the container security gathering of experts, the general agreement was that namespaces can NOT be trusted to enforce container segmentation policies. While namespaces do provide some level of segmentation between containerized services, security teams should not rely of them for defense in depth. The built-in Network Policy features of Kubernetes were also deemed to be not practical for most business critical deployments. These opinions were due to a number of cited reasons:

  1. Recent demonstrations of breaching namespace boundaries.
  2. Cumbersome granularity of segmentation policies.
  3. Lack of policy management framework.
  4. Lack of visibility and monitoring.
  5. Inability to detect network attacks within trusted connections.

Namespaces were found to be useful for organizing services and to ease the management of such services where each service in a namespace has some attribute in common with others to make them manageable as a group. But, don’t use namespaces for your container segmentation strategy.

Will a Service Mesh Do Segmentation?

The excitement about service mesh technologies like Istio and Linkerd2 is driven by the promise of an application discovery and routing layer for containers which has some security features built-in. But there is a difference between a Layer 7 load balancer with security features and a true security product like a Layer 7 container firewall. Security features in service meshes include the ability to do authentication, authorization, and encryption of connections. By authorizing connections between containers based on defined policies, a service mesh has the ability to do segmentation for certain HTTP protocols.

Limitations to keep in mind about using a service mesh for segmentation include:

  • Does not support all HTTP protocols, nor ICMP, or UDP. If you have applications requiring other protocol support you will need a multi-protocol container firewall.
  • Has no visibility into policy violations. If a connection attempt is blocked, is there logging of the event which makes it easy to drill down into the source and destination service names and IP addresses, network payload used, and other forensic details?
  • Can’t detect embedded network based attacks within trusted connections. A container firewall should be able to inspect network payloads for embedded application attacks such as SQL injection, even in authorized trusted connections.
  • Can’t perform DLP functions. Can’t inspect connections for sensitive data such as credit cards and PII.
  • Does not provide other alerting and network forensic capabilities. There are many required features in a true container security product such as alerting, response rules, packet captures and enterprise integration hooks.
  • Lacks management and automation.  Authorization policies must be defined manually. Although these can be automated during deployment, the creation and management of these policies is difficult to centrally administer and review.

It is therefore important to consider the protocol requirements for containerized applications today as well as for the next few years, as well as the desired level of security required for your applications. Modern container security tools should be integrated with service mesh technology to provide defense in depth, and enhance those built-in security features.

Layer 7 Container Segmentation

Segmentation of network traffic can be done at Layer 3/4 based on IP addresses and ports, but in cloud-native environments this is best done at Layer 7 to detect and verify the application protocol used. This provides better scalability, manageability, and flexibility for deployments to change without needing to change security rules. An added benefit of Layer 7 deep packet inspection is the ability for the container firewall to inspect network traffic for hidden, or embedded attacks, even within trusted connections between workloads.

Multi-protocol Layer 7 segmentation provides detection and enforcement of connections across multiple application protocols and should also support non-HTTP protocols such as ICMP and UDP.

True Workload Segmentation Across Clusters, Clouds

With a cloud-native, Layer 7 container segmentation solution, workloads can be segmented even if they are running on the same host, network, or cluster. The ability to mix workloads of different required trust levels on the same infrastructure provides the ultimate flexibility for architects and devops teams to maximize performance, resource utilization, and speed up the pipeline. It also limits the blast radius if one set of services is hacked from spreading laterally onto other workloads, even if running on the same host.

Container Segmentation Patterns

Containers and orchestration tools like Kubernetes are relatively new, so there will be many experiments using combinations of old and new technologies to achieve container segmentation. Use of traditional segmentation patterns based on physical networks as described above may provide temporary protections for containers while sacrificing many of the main benefits such as scalability and resource optimization. Here are a few example patterns, some a mix of old and new, and some which can only be achieved with cloud-native container firewalls.

  • Separate Clusters. It is probably most common to see multiple clusters being deployed. This is due to different reasons, with security focused network segmentation being only one of them.
    • Security focused. Application workloads with different security protection levels can be separated by Kubernetes clusters. This makes isolating traffic easier by using traditional firewalls or VPCs to prevent cross-cluster communication. If connections between clusters are required then it can be manually allowed but management can become cumbersome and error prone.
      • For example, one cluster runs the application workloads and a separate one running databases, file storage (such as S3/minio) and other persistent storage for the same project because different security profiles are required for each cluster.
    • Cluster Manageability. More often, separate clusters are deployed primarily for manageability reasons, with security being a secondary consideration. Separation can be based on:
      • Application Characteristics. For example, separate stateless and stateful clusters where management of services and workloads follows different processes. In the same example above separating application workloads from databases, the reason may include or be only due to the fact that each cluster requires different workload management approaches for rolling updates, backups, persistent data, etc.
      • Platform Management. Separating update and maintenance of the orchestration platforms and tools. For example, updating the Kubernetes version with all system containers and integrations may require a different process depending on the application workload requirements in the cluster.
      • Organizational. Separate clusters for divisions, departments, development teams or other reasons tied to how teams are organized.
      • Other deployment patterns we’ve seen could be based on availability of cloud resources in specific regions for public cloud providers, for example applications requiring GPU instances.
  • Container Zones. Many companies think of clusters as zones, with each zone representing a collection of related services and/or services with similar security requirements. Although typically one cluster is deployed per zone, a container cluster could span multiple zones. The segmentation policies are based on the connection requirements in each zone, but typically focus on ingress and egress policies between zones and to the internet.
  • One Large Cluster. Multiple application stacks, services, and workloads can be dynamically deployed in a large shared cluster. While this may present manageability issues described above, it may simplify maintenance of the orchestration platform and optimize resource utilization. Security issues, especially network segmentation policies for each service running in the cluster must be carefully managed and monitored to protect against lateral movement of attacks between workloads.
  • Cross cluster routing with Service Mesh. Cross cluster connections are made more dynamic with service mesh technologies like Istio. While cross cluster routing can be secured to some extent by using the authorization features of the service mesh, business critical applications will need true Layer 7 container firewalling described above to protect against embedded attacks, detect multiple protocols, and make container segmentation policies manageable and scalable.

 

Segmentation for Compliance – PCI-DSS

One segmentation pattern of particular interest is for PCI-DSS compliance. Sections 1.2 and 1.3 of PCI-DSS require in-scope CDE traffic to be firewalled and segmented from all other connections. Traditionally, this was accomplished by using separate networks separated by traditional firewalls.

While it is certainly possible to repeat this pattern for cloud native applications, doing so will ultimately add more friction to the modern CI/CD and deployment pipelines, as well as increase costs and reduce resource utilization of separate clusters. This means all of the potential benefit of cloud-native applications will not be possible to be realized.

The better solution is to achieve network segmentation automatically between CDE and non-CDE workloads, even if they are running on the same host, network, cluster, or cloud, as shown below.

In the diagram above, the nodes are containers (not hosts) which can run dynamically across any host within the cluster. They can be segmented virtually by service names, labels, application protocols or other application metadata.

NeuVector Container Segmentation

NeuVector provides a true cloud-native Layer 7 container firewall which does network segmentation automatically. By using behavioral learning, connections and the application protocols used between services are discovered and whitelist rules to isolate them are automatically created. This means that container segmentation is easy and automated, without requiring knowledge of connections beforehand or the manual creation and maintenance of segmentation rules.

In the screenshot below, NeuVector provides a virtual view of container segmentation rules, violations, attacks and vulnerabilities regardless of the physical hosts in use. This also shows service mesh enabled pods where an Istio sidecar container is used for encryption.

For more advanced users, NeuVector supports a declarative security policy where application level (e.g. Layer 7) policies can be specified during the CI/CD process by devops teams in order to fully automate the new releases or updating of application services. For example, the following is an example of how DevOps can declare the security rules in NeuVector in a yaml file as part of the application deployment process.

apiVersion: v1
prefix: new-app
suffix: auto
groups:
  redis:
    selectors:
      - app=redis-pod
  nodejsapp:
    selectors:
      - app=node-pod
    rules:
      rule01:
        applications:
          - Redis
        toTarget: redis
        action: allow

The example above creates the simple whitelist rule to allow the nodejs pods to connect to the redis pods only using the redis application protocol. The simplicity of such a layer 7 rule makes it scalable, flexible, and easy to manage. It also supports the ‘shift-left’ movement to push security further into the DevOps part of the pipeline, supporting faster deployments with automation.

All segmentation policies are centrally viewed, managed, and monitored so that conflicting rules are not created or connections start failing due to a forgotten deployment manifest.

Beyond container segmentation, NeuVector provides a complete Kubernetes security platform to secure the CI/CD pipeline from build to ship to run. Image vulnerability scanning starts during the build process and continuously monitors them for new vulnerabilities as soon as they’re deployed.

The run-time container security is provided by Layer 7 container firewall together with container process and file system security, as well as host security. The container firewall detects threats such as sql injections, DDoS, DNS attacks and other application layer attacks by inspecting the payload even for trusted connections. It is integrated with new service mesh technologies to provide threat detection and segmentation even if the connection between two pods is encrypted.

In this way, NeuVector can provide multi-vector threat protection with the combination of network security, application security, endpoint security, and host security.

The Ultimate Cloud Security Pattern – Container Segmentation by Workload

Ultimately, to give the business the most flexibility for rapid release and optimal resource utilization, container segmentation must be enforced on each pod and follow application workloads as they scale and move dynamically. In this micro-perimeter vision article, NeuVector CTO Gary Duan outlines a vision for cloud security where the protection perimeter surrounds the workload even as it moves across hybrid clouds.

 

Considerations When Designing Distributed Systems

Monday, 11 March, 2019

Introduction

Today’s applications are marvels of distributed systems development. Each function or service that makes up
an application may be executing on a different system, based upon a different system architecture, that is
housed in a different geographical location, and written in a different computer language. Components of
today’s applications might be hosted on a powerful system carried in the owner’s pocket and communicating
with application components or services that are replicated in data centers all over the world.

What’s amazing about this, is that individuals using these applications typically are not aware of the
complex environment that responds to their request for the local time, local weather, or for directions to
their hotel.

Let’s pull back the curtain and look at the industrial sorcery that makes this all possible and contemplate
the thoughts and guidelines developers should keep in mind when working with this complexity.

The Evolution of System Design

Designing Distributed

Figure 1: Evolution of system design over time

Source: Interaction Design Foundation, The
Social Design of Technical Systems: Building technologies for communities

Application development has come a long way from the time that programmers wrote out applications, hand
compiled them into the language of the machine they were using, and then entered individual machine
instructions and data directly into the computer’s memory using toggle switches.

As processors became more and more powerful, system memory and online storage capacity increased, and
computer networking capability dramatically increased, approaches to development also changed. Data can now
be transmitted from one side of the planet to the other faster than it used to be possible for early
machines to move data from system memory into the processor itself!

Let’s look at a few highlights of this amazing transformation.

Monolithic Design

Early computer programs were based upon a monolithic design with all of the application components were
architected to execute on a single machine. This meant that functions such as the user interface (if users
were actually able to interact with the program), application rules processing, data management, storage
management, and network management (if the computer was connected to a computer network) were all contained
within the program.

While simpler to write, these programs become increasingly complex, difficult to document, and hard to update
or change. At this time, the machines themselves represented the biggest cost to the enterprise and so
applications were designed to make the best possible use of the machines.

Client/Server Architecture

As processors became more powerful, system and online storage capacity increased, and data communications
became faster and more cost-efficient, application design evolved to match pace. Application logic was
refactored or decomposed, allowing each to execute on different machines and the ever-improving networking
was inserted between the components. This allowed some functions to migrate to the lowest cost computing
environment available at the time. The evolution flowed through the following stages:

Terminals and Terminal Emulation

Early distributed computing relied on special-purpose user access devices called terminals. Applications had
to understand the communications protocols they used and issue commands directly to the devices. When
inexpensive personal computing (PC) devices emerged, the terminals were replaced by PCs running a terminal
emulation program.

At this point, all of the components of the application were still hosted on a single mainframe or
minicomputer.

Light Client

As PCs became more powerful, supported larger internal and online storage, and network performance increased,
enterprises segmented or factored their applications so that the user interface was extracted and executed
on a local PC. The rest of the application continued to execute on a system in the data center.

Often these PCs were less costly than the terminals that they replaced. They also offered additional
benefits. These PCs were multi-functional devices. They could run office productivity applications that
weren’t available on the terminals they replaced. This combination drove enterprises to move to
client/server application architectures when they updated or refreshed their applications.

Midrange Client

PC evolution continued at a rapid pace. Once more powerful systems with larger storage capacities were
available, enterprises took advantage of them by moving even more processing away from the expensive systems
in the data center out to the inexpensive systems on users’ desks. At this point, the user interface and
some of the computing tasks were migrated to the local PC.

This allowed the mainframes and minicomputers (now called servers) to have a longer useful life, thus
lowering the overall cost of computing for the enterprise.

Heavy client

As PCs become more and more powerful, more application functions were migrated from the backend servers. At
this point, everything but data and storage management functions had been migrated.

Enter the Internet and the World Wide Web

The public internet and the World Wide Web emerged at this time. Client/server computing continued to be
used. In an attempt to lower overall costs, some enterprises began to re-architect their distributed
applications so they could use standard internet protocols to communicate and substituted a web browser for
the custom user interface function. Later, some of the application functions were rewritten in Javascript so
that they could execute locally on the client’s computer.

Server Improvements

Industry innovation wasn’t focused solely on the user side of the communications link. A great deal of
improvement was made to the servers as well. Enterprises began to harness together the power of many
smaller, less expensive industry standard servers to support some or all of their mainframe-based functions.
This allowed them to reduce the number of expensive mainframe systems they deployed.

Soon, remote PCs were communicating with a number of servers, each supporting their own component of the
application. Special-purpose database and file servers were adopted into the environment. Later, other
application functions were migrated into application servers.

Networking was another area of intense industry focus. Enterprises began using special-purpose networking
servers that provided fire walls and other security functions, file caching functions to accelerate data
access for their applications, email servers, web servers, web application servers, distributed name servers
that kept track of and controlled user credentials for data and application access. The list of networking
services that has been encapsulated in an appliance server grows all the time.

Object-Oriented Development

The rapid change in PC and server capabilities combined with the dramatic price reduction for processing
power, memory and networking had a significant impact on application development. No longer where hardware
and software the biggest IT costs. The largest costs were communications, IT services (the staff), power,
and cooling.

Software development, maintenance, and IT operations took on a new importance and the development process was
changed to reflect the new reality that systems were cheap and people, communications, and power were
increasingly expensive.

Designing Distributed

Figure 2: Worldwide IT spending forcast

Source: Gartner Worldwide IT
Spending Forecast, Q1 2018

Enterprises looked to improved data and application architectures as a way to make the best use of their
staff. Object-oriented applications and development approaches were the result. Many programming languages
such as the following supported this approach:

  • C++
  • C#
  • COBOL
  • Java
  • PHP
  • Python
  • Ruby

Application developers were forced to adapt by becoming more systematic when defining and documenting data
structures. This approach also made maintaining and enhancing applications easier.

Open-Source Software

Opensource.com offers the following definition for open-source
software: “Open source software is software with source code that anyone can inspect, modify, and enhance.”
It goes on to say that, “some software has source code that only the person, team, or organization who
created it — and maintains exclusive control over it — can modify. People call this kind of software
‘proprietary’ or ‘closed source’ software.”

Only the original authors of proprietary software can legally copy, inspect, and alter that software. And in
order to use proprietary software, computer users must agree (often by accepting a license displayed the
first time they run this software) that they will not do anything with the software that the software’s
authors have not expressly permitted. Microsoft Office and Adobe Photoshop are examples of proprietary
software.

Although open-source software has been around since the very early days of computing, it came to the
forefront in the 1990s when complete open-source operating systems, virtualization technology, development
tools, database engines, and other important functions became available. Open-source technology is often a
critical component of web-based and distributed computing. Among others, the open-source offerings in the
following categories are popular today:

  • Development tools
  • Application support
  • Databases (flat file, SQL, No-SQL, and in-memory)
  • Distributed file systems
  • Message passing/queueing
  • Operating systems
  • Clustering

Distributed Computing

The combination of powerful systems, fast networks, and the availability of sophisticated software has driven
major application development away from monolithic towards more highly distributed approaches. Enterprises
have learned, however, that sometimes it is better to start over than to try to refactor or decompose an
older application.

When enterprises undertake the effort to create distributed applications, they often discover a few pleasant
side effects. A properly designed application, that has been decomposed into separate functions or services,
can be developed by separate teams in parallel.

Rapid application development and deployment, also known as DevOps, emerged as a way to take advantage of the
new environment.

Service-Oriented Architectures

As the industry evolved beyond client/server computing models to an even more distributed approach, the
phrase “service-oriented architecture” emerged. This approach was built on distributed systems concepts,
standards in message queuing and delivery, and XML messaging as a standard approach to sharing data and data
definitions.

Individual application functions are repackaged as network-oriented services that receive a message
requesting they perform a specific service, they perform that service, and then the response is sent back to
the function that requested the service.

This approach offers another benefit, the ability for a given service to be hosted in multiple places around
the network. This offers both improved overall performance and improved reliability.

Workload management tools were developed that receive requests for a service, review the available capacity,
forward the request to the service with the most available capacity, and then send the response back to the
requester. If a specific service doesn’t respond in a timely fashion, the workload manager simply forwards
the request to another instance of the service. It would also mark the service that didn’t respond as failed
and wouldn’t send additional requests to it until it received a message indicating that it was still alive
and healthy.

What Are the Considerations for Distributed Systems

Now that we’ve walked through over 50 years of computing history, let’s consider some rules of thumb for
developers of distributed systems. There’s a lot to think about because a distributed solution is likely to
have components or services executing in many places, on different types of systems, and messages must be
passed back and forth to perform work. Care and consideration are absolute requirements to be successful
creating these solutions. Expertise must also be available for each type of host system, development tool,
and messaging system in use.

Nailing Down What Needs to Be Done

One of the first things to consider is what needs to be accomplished! While this sounds simple, it’s
incredibly important.

It’s amazing how many developers start building things before they know, in detail, what is needed. Often,
this means that they build unnecessary functions and waste their time. To quote Yogi Berra, “if you don’t
know where you are going, you’ll end up someplace else.”

A good place to start is knowing what needs to be done, what tools and services are already available, and
what people using the final solution should see.

Interactive Versus Batch

Since fast responses and low latency are often requirements, it would be wise to consider what should be done
while the user is waiting and what can be put into a batch process that executes on an event-driven or
time-driven schedule.

After the initial segmentation of functions has been considered, it is wise to plan when background, batch
processes need to execute, what data do these functions manipulate, and how to make sure these functions are
reliable, are available when needed, and how to prevent the loss of data.

Where Should Functions Be Hosted?

Only after the “what” has been planned in fine detail, should the “where” and “how” be considered. Developers
have their favorite tools and approaches and often will invoke them even if they might not be the best
choice. As Bernard Baruch was reported to say, “if all you have is a hammer, everything looks like a nail.”

It is also important to be aware of corporate standards for enterprise development. It isn’t wise to select a
tool simply because it is popular at the moment. That tool just might do the job, but remember that
everything that is built must be maintained. If you build something that only you can understand or
maintain, you may just have tied yourself to that function for the rest of your career. I have personally
created functions that worked properly and were small and reliable. I received telephone calls regarding
these for ten years after I left that company because later developers could not understand how the
functions were implemented. The documentation I wrote had been lost long earlier.

Each function or service should be considered separately in a distributed solution. Should the function be
executed in an enterprise data center, in the data center of a cloud services provider or, perhaps, in both.
Consider that there are regulatory requirements in some industries that direct the selection of where and
how data must be maintained and stored.

Other considerations include:

  • What type of system should be the host of that function. Is one system architecture better for that
    function? Should the system be based upon ARM, X86, SPARC, Precision, Power, or even be a Mainframe?
  • Does a specific operating system provide a better computing environment for this function? Would Linux,
    Windows, UNIX, System I, or even System Z be a better platform?
  • Is a specific development language better for that function? Is a specific type of data management tool?
    Is a Flat File, SQL database, No-SQL database, or a non-structured storage mechanism better?
  • Should the function be hosted in a virtual machine or a container to facilitate function mobility,
    automation and orchestration?

Virtual machines executing Windows or Linux were frequently the choice in the early 2000s. While they offered
significant isolation for functions and made it easily possible to restart or move them when necessary,
their processing, memory and storage requirements were rather high. Containers, another approach to
processing virtualization, are the emerging choice today because they offer similar levels of isolation, the
ability to restart and migrate functions and consume far less processing power, memory or storage.

Performance

Performance is another critical consideration. While defining the functions or services that make up a
solution, the developers should be aware if they have significant processing, memory or storage
requirements. It might be wise to look at these functions closely to learn if that can be further subdivided
or decomposed.

Further segmentation would allow an increase in parallelization which would potentially offer performance
improvements. The trade off, of course, is that this approach also increases complexity and, potentially,
makes them harder to manage and to make secure.

Reliability

In high stakes enterprise environments, solution reliability is essential. The developer must consider when
it is acceptable to force people to re-enter data, re-run a function, or when a function can be unavailable.

Database developers ran into this issue in the 1960s and developed the concept of an atomic function. That
is, the function must complete or the partial updates must be rolled back leaving the data in the state it
was in before the function began. This same mindset must be applied to distributed systems to ensure that
data integrity is maintained even in the event of service failures and transaction disruptions.

Functions must be designed to totally complete or roll back intermediate updates. In critical message passing
systems, messages must be stored until an acknowledgement that a message has been received comes in. If such
a message isn’t received, the original message must be resent and a failure must be reported to the
management system.

Manageability

Although not as much fun to consider as the core application functionality, manageability is a key factor in
the ongoing success of the application. All distributed functions must be fully instrumented to allow
administrators to both understand the current state of each function and to change function parameters if
needed. Distributed systems, after all, are constructed of many more moving parts than the monolithic
systems they replace. Developers must be constantly aware of making this distributed computing environment
easy to use and maintain.

This brings us to the absolute requirement that all distributed functions must be fully instrumented to allow
administrators to understand their current state. After all, distributed systems are inherently more complex
and have more moving parts than the monolithic systems they replace.

Security

Distributed system security is an order of magnitude more difficult than security in a monolithic
environment. Each function must be made secure separately and the communication links between and among the
functions must also be made secure. As the network grows in size and complexity, developers must consider
how to control access to functions, how to make sure than only authorized users can access these function,
and to to isolate services from one other.

Security is a critical element that must be built into every function, not added on later. Unauthorized
access to functions and data must be prevented and reported.

Privacy

Privacy is the subject of an increasing number of regulations around the world. Examples like the European
Union’s GDPR and the U.S. HIPPA regulations are important considerations for any developer of
customer-facing systems.

Mastering Complexity

Developers must take the time to consider how all of the pieces of a complex computing environment fit
together. It is hard to maintain the discipline that a service should encapsulate a single function or,
perhaps, a small number of tightly interrelated functions. If a given function is implemented in multiple
places, maintaining and updating that function can be hard. What would happen when one instance of a
function doesn’t get updated? Finding that error can be very challenging.

This means it is wise for developers of complex applications to maintain a visual model that shows where each
function lives so it can be updated if regulations or business requirements change.

Often this means that developers must take the time to document what they did, when changes were made, as
well as what the changes were meant to accomplish so that other developers aren’t forced to decipher mounds
of text to learn where a function is or how it works.

To be successful as a architect of distributed systems, a developer must be able to master complexity.

Approaches Developers Must Master

Developers must master decomposing and refactoring application architectures, thinking in terms of teams, and
growing their skill in approaches to rapid application development and deployment (DevOps). After all, they
must be able to think systematically about what functions are independent of one another and what functions
rely on the output of other functions to work. Functions that rely upon one other may be best implemented as
a single service. Implementing them as independent functions might create unnecessary complexity and result
in poor application performance and impose an unnecessary burden on the network.

Virtualization Technology Covers Many Bases

Virtualization is a far bigger category than just virtual machine software or containers. Both of these
functions are considered processing virtualization technology. There are at least seven different types of
virtualization technology in use in modern applications today. Virtualization technology is available to
enhance how users access applications, where and how applications execute, where and how processing happens,
how networking functions, where and how data is stored, how security is implemented, and how management
functions are accomplished. The following model of virtualization technology might be helpful to developers
when they are trying to get their arms around the concept of virtualization:

Designing Distributed

Figure 3: Architure of virtualized systems

Source: 7 Layer Virtualizaiton Model, VirtualizationReview.com

Think of Software-Defined Solutions

It is also important for developers to think in terms of “software defined” solutions. That is, to segment
the control from the actual processing so that functions can be automated and orchestrated.

Tools and Strategies That Can Help

Developers shouldn’t feel like they are on their own when wading into this complex world. Suppliers and
open-source communities offer a number of powerful tools. Various forms of virtualization technology can be
a developer’s best friend.

Virtualization Technology Can Be Your Best Friend

  • Containers make it possible to easily develop functions that can execute without
    interfering with one another and can be migrated from system to system based upon workload demands.
  • Orchestration technology makes it possible to control many functions to ensure they are
    performing well and are reliable. It can also restart or move them in a failure scenario.
  • Supports incremental development: functions can be developed in parallel and deployed
    as they are ready. They also can be updated with new features without requiring changes elsewhere.
  • Supports highly distributed systems: functions can be deployed locally in the
    enterprise data center or remotely in the data center of a cloud services provider.

Think In Terms of Services

This means that developers must think in terms of services and how services can communicate with one another.

Well-Defined APIs

Well defined APIs mean that multiple teams can work simultaneously and still know that everything will fit
together as planned. This typically means a bit more work up front, but it is well worth it in the end. Why?
Because overall development can be faster. It also makes documentation easier.

Support Rapid Application Development

This approach is also perfect for rapid application development and rapid prototyping, also known as DevOps.
Properly executed, DevOps also produces rapid time to deployment.

Think In Terms of Standards

Rather than relying on a single vendor, the developer of distributed systems would be wise to think in terms
of multi-vendor, international standards. This approach avoids vendor lock-in and makes finding expertise
much easier.

Summary

It’s interesting to note how guidelines for rapid application development and deployment of distributed
systems start with “take your time.” It is wise to plan out where you are going and what you are going to do
otherwise you are likely to end up somewhere else, having burned through your development budget, and have
little to show for it.

Sign up for Online Training

To continue to learn about the tools, technologies, and practices in the modern development landscape, sign up for free online training sessions. Our engineers host
weekly classes on Kubernetes, containers, CI/CD, security, and more.

We’re Rolling Out The Green Carpet – Only 4 Weeks To Go!!

Monday, 4 March, 2019

Show Your SUSE Style at SUSECON

Anyone who’s anyone will be seen at Open Source’s premier event, and the best dressed will be wearing green. Only four weeks to go for you to secure your spot and rub shoulders with the best-of-the-best in Open Source technology. 

Which Sessions Will You Choose?

Sessions abound about the topics that matter to you and your data center. Get ready to build your kind of open with insights into:

  • Application Delivery
  • Artificial Intelligence
  • Big Data
  • Business Applications & Middleware
  • Container Technologies
  • DevOps
  • Internet of Things
  • Interoperability in Heterogeneous Environments
  • Machine Learning
  • Private/Public/Hybrid Cloud Technologies
  • Virtualization Technologies
  • And more

Join SUSE Doc Day on Friday, April 5th!

Immortalize yourself by documenting your real-world experience and know-how as part of a group focused on writing the next generation of SUSE documentation. What is SUSE Doc Day?

 

If you haven’t registered, what are you waiting for?  Register today!

How to Protect Sensitive Data in Containers with Container DLP

Friday, 1 March, 2019

We recently announced the industry’s first Container DLP capability to help enterprises protect sensitive data. Let’s take a deeper look into data loss prevention (aka data leak protection) and how it applies to containers.

What is Data Loss Prevention (DLP)?

DLP solutions help detect potential sensitive data violations and prevent accidental or malicious data breaches. Sensitive data is anything that a company considers private or confidential. Common types of sensitive data protected by regulatory compliance include credit cards (PCI), personally identifiable information (PII, GDPR), patient information (HIPAA), social security and other government identities. Financial account and bank routing information is also considered sensitive data. Sensitive data can even include documents, drawings, contracts or other business and technical documents.

DLP solutions can involve a broad spectrum of potential systems including servers, databases, networks, laptops, desktops and email systems. For container DLP most of the current concern would be for applications that process, transmit, or store sensitive data because containers are primarily deployed for enterprise applications. In addition, containers themselves may store sensitive data such as secrets used to access applications, even though it is recommended to use a secure secrets management solution for this purpose.

Ways to Protect Data

Like all of security, a layered approach to DLP is recommended, and the extent of monitoring and controls required depends on the type of sensitive data involved as well as regulatory compliance requirements.

Encryption

The most basic preventative measure to be taken is encryption to make sure that all sensitive data, whether at rest in file systems or databases, or in motion being transmitted is protected. Most regulatory compliance standards require encryption.

Container DLP requires that connections between pods – ie east-west traffic between containers – be encrypted when transmitting sensitive data. Connections into and out of (ingress and egress) the container cluster should also be encrypted if they contain sensitive data. For pod to pod connections, new service mesh technologies like Istio offer a simple and scalable encryption solution. In addition to network connections, any storage accessed or written to by the container must encrypt the data at rest.

Detection

The encryption requirements are fairly straight forward, but how do you make sure there’s not a data breach or unintentional transmission of unencrypted sensitive data? The most effective way is to monitor the network for sensitive data. Additionally, periodic file and database scanning solutions can make provide an added layer of security for data at rest.

Network-based Container DLP

Detecting sensitive data in network connections is the most effective way to prevent data breaches and insure regulatory compliance. Network-based container DLP requires layer 7 (application layer) deep packet inspection to be able to inspect the network payloads for sensitive data.

The NeuVector container network security platform is an end-to-end solution for securing the entire container pipeline from build to ship to run-time. The industry’s first container firewall provides the critical function to perform container DLP by inspecting all container connections for sensitive data such as credit cards, PII, and financial data. The screen shot below shows an example of unencrypted credit card data being transmitted between pods, as well as to an external destination.

DETECTING CREDIT CARD DATA VIOLATIONS IN CONTAINER CONNECTIONS

The NeuVector container firewall detects this violation and can block this transmission in in Protect mode, or alert on it if in Monitor mode.

The NeuVector container DLP engine is flexible and extensible to be able to detect multiple types of sensitive data patterns and can be customized for customer specific detection.

Regulatory Compliance – PCI, GDPR, HIPAA etc

A container DLP solution can be used to meet regulatory compliance requirements by providing network segmentation, network monitoring, and encryption verification. PCI-DSS requires network segmentation as well as encryption for in-scope CDE environments. The NeuVector container firewall with DLP can provide the required network segmentation of CDE workloads while at the same time monitoring for unencrypted cardholder data which would violate the compliance requirements. The violations can be the first indications of a data breach, a misconfiguration of an application container, or an innocent mistake made by a customer support person pasting in credit card data into a case.

NeuVector can help maintain compliance to regulations such as PCI-DSS, GDPR, HIPAA and others for container deployments by:

  • Enforcing network segmentation based on layer 7 application protocols, so that no unauthorized connections are allowed in or out of containers
  • Enforcing that encrypted SSL connections are used for transmitting sensitive data between containers and for ingress/egress connections
  • Monitoring all unencrypted connections for sensitive data and either alerting or blocking when detected

In addition, through a service mesh integration, NeuVector can even alert if any sensitive data is included even in encrypted connections between service mesh pods. NeuVector is able to inspect the network traffic before it is encrypted by the service mesh sidecar proxy and detect threats as well as sensitive data in the payload. This is useful for detecting cases where sensitive data should not exist at all, even if it is encrypted by the service mesh.

Microservices vs. Monolithic Architectures

Friday, 1 March, 2019

Enterprises are increasingly pressured by competitors and their own customers to get applications working and online quicker while also minimizing development costs. These divergent goals have forced enterprise IT organization to evolve rapidly. After undergoing one forced evolution after another since the 1960s, many are prepared to take the step away from monolithic application architectures to embrace the microservices approach.

Figure 1: Architecture differences between traditional monolithic applications and microservices

Figure 1: Architecture differences between traditional monolithic applications and microservices

Image courtesy of BMC

Higher Expectations and More Empowered Customers

Customers that are used to having worldwide access to products and services now expect enterprises to quickly respond to whatever other suppliers are doing.

CIO magazine, in reporting upon Ovum’s research, pointed out:

“Customers now have the upper hand in the customer journey. With more ways to shop and less time to do it, they don’t just gather information and complete transactions quickly. They often want to get it done on the go, preferably on a mobile device, without having to engage in drawn-out conversations.”

IT Under Pressure

This intense worldwide competition also forces enterprises to find new ways to cut costs or find new ways to be more efficient. Developers have seen this all before. This is just the newest iteration of the perennial call to “do more with less” that enterprise IT has faced for more than a decade. Even though IT budgets grow, they’ve learned, the investments are often in new IT services or better communications.

Figure 2: Forcasted 2018 worldwide IT spending growth

Figure 2: Forcasted 2018 worldwide IT spending growth

Source: Gartner Market Databook, 4Q17

As enterprise IT organizations face pressure to respond, they have had to revisit their development processes. The traditional two-year development cycle, previously acceptable, is no longer satisfactory. There is simply no time for that now.

Enterprise IT has also been forced to respond to a confluence of trends that are divergent and contradictory.

  • The introduction of inexpensive but high-performance network connectivity that allows distributed functions to communicate with one another across the network as fast as processes previously could communicate with one another inside of a single system.
  • The introduction of powerful microprocessors that offer mainframe-class performance in inexpensive and small packages. After standardizing on the X86 microprocessor architecture, enterprises are now being forced to consider other architectures to address their need for higher performance, lower cost, and both lower power consumption and heat production.
  • Internal system memory capacity continues to increase making it possible to deploy large-scale applications or application components in small systems.
  • External storage use is evolving away from the use of rotating media to solid state devices to increase capability, reduce latency, decrease overall cost, and deliver enormous capacity.
  • The evolution of open-source software and distributed computing functions make it possible for the enterprise to inexpensively add a herd of systems when new capabilities are needed rather than facing an expensive and time-consuming forklift upgrade to expand a central host system.
  • Customers demand instant and easy access to applications and data.

As enterprises address these trends, they soon discover that the approach that they had been relying on — focusing on making the best use of expensive systems and networks — needs to change. The most significant costs are now staffing, power, and cooling. This is in addition to the evolution they made nearly two decades ago when their focus shifted from monolithic mainframe computing to distributed, X86-based midrange systems.

The Next Steps in a Continuing Saga

Here’s what enterprise IT has done to respond to all of these trends.

They are choosing to move from using the traditional waterfall development approach to various forms of rapid application development. They also are moving away from compiled languages to interpreted or incrementally compiled languages such as Java, Python, or Ruby to improve developer productivity.

IDC, for example, predicts that:

“By 2021 65% of CIOs will expand agile/DevOps practices into the wider business to achieve the velocity necessary for innovation, execution, and change.”

Complex applications are increasingly designed as independent functions or “services” that can be hosted in several places on the network to improve both performance and application reliability. This approach means that it is possible to address changing business requirements as well as to add new features in one function without having to change anything else in parallel. NetworkWorld’s Andy Patrizio pointed out in his predictions for 2019 that he expects “Microservices and serverless computing take off.”

Another important change is that these services are being hosted in geographically distributed enterprise data centers, in the cloud, or both. Furthermore, functions can now reside in a customer’s pocket or in some combination of cloud-based or corporate systems.

What Does This Mean for You?

Addressing these trends means that enterprise developers and operations staff have to make some serious changes to their traditional approach including the following:

  • Developers must be willing to learn technologies that better fits today’s rapid application development methodology. An experienced “student” can learn quickly through online schools. For example, Learnpython.org offers free courses in Python, while codecademy offers free courses in Ruby, Java, and other languages.
  • They must also be willing to learn how to decompose application logic from a monolithic, static design to a collection of independent, but cooperating, microservices. Online courses are available for this too. One example of a course designed to help developers learn to “think in microservices” comes from IBM. Other courses are available from Lynda.com.
  • Developers must adopt new tools for creating and maintaining microservices that support quick and reliable communication between them. The use of various commercial and open-source messaging and management tools can help in this process. Rancher Labs, for example, offers open-source software for delivering Kurbernetes-as-a-service.
  • Operations professionals need to learn orchestration tools for containers and Kubernetes to understand how they allow teams to quickly develop and improve applications and services without losing control over data and security. Operations has long been the gatekeepers for enterprise data centers. After all, they may find their positions on the line if applications slow down or fail.
  • Operations staff must allow these functions to be hosted outside of the data centers they directly control. To make that point, analysts at Market Research Future recently published a report saying that, “the global cloud microservices market was valued at USD 584.4 million in 2017 and is expected to reach USD 2,146.7 million by the end of the forecast period with a CAGR of 25.0%”.
  • Application management and security issues must now be part of developers’ thinking. Once again, online courses are available to help individuals to develop expertise in this area. LinkedIn, for example, offers a course in how to become an IT Security Specialist.

It is important for both IT and operations staff to understand that the world of IT is moving rapidly and everyone must be focused on upgrading their skills and enhancing their expertise.

How Do Microservices Benefit the Enterprise?

This latest move to distributed computing offers a number of real and measurable benefits to the enterprise. Development time and cost can be sharply reduced after the IT organization incorporates this form of distributed computing. Afterwards, each service can be developed in parallel and refined as needed without requiring an entire application to be stopped or redesigned.

The development organization can focus on developer productivity and still bring new application functions or applications online quickly. The operations organization can focus on defining acceptable rules for application execution and allowing the orchestration and management tools to enforce them.

What New Challenges Do Enterprises Face?

Like any approach to IT, the adoption of a microservices architecture will include challenges as well as benefits.

Monitoring and managing many “moving parts” can be more challenging than dealing with a few monolithic applications. The adoption of an enterprise management framework can help address these challenges. Security in this type of distributed computing needs to be top of mind as well. As the number of independent functions grows on the network, each must be analyzed and protected.

Should All Monolithic Applications Migrate to Microservices?

Some monolithic applications can be difficult to change. This may be due to technological challenges or may be due to regulatory constraints. Some components in use today may have come from defunct suppliers, making changes difficult or impossible.

It can be both time consuming and costly for the organization to go through a complete audit process. Often, organizations continue investing in older applications much longer than is appropriate in the belief that they’re saving money.

It is possible to evaluate what an monolithic application does to learn if some individual functions can be separated and run as smaller, independent services. These can be implemented either as cloud-based services or as container-based microservices.

Rather than waiting and attempting to address older technology as a whole, it may be wise to undertake a series of incremental changes to make enhancing or replacing an established system more acceptable. This is very much like the old proverb, “the best time to plant a tree was 20 years ago. The second best time is now.”

Is the Change Worth It?

Enterprises that have made the move towards the adoption of microservices-based application architectures have commented that their IT costs are often reduced. They also often point out that once their team mastered this approach, it was far easier and quicker to add new features and functions when market demands changed.

If your enterprise hasn’t adopted this approach, it would be wise to learn more about it. Suppliers like Rancher Labs have helped their clients safely make this journey and they may be able to help your organization.

Go Deeper with Online Training

Get free online training on our container management software, Rancher, or continue your education with more advaned topics in the Kubernetes master classes.

Tags: ,,, Category: Products Comments closed

What Do People Love About Rancher?

Thursday, 28 February, 2019
Read the Guide to Kubernetes with Rancher
This guide shows the challenges in running Kubernetes in production and how Rancher helps.

More than 20,000 environments have chosen Rancher as the solution to make the Kubernetes adventure painless in as many ways as possible. More than 200 businesses across finance, health care, military, government, retail, manufacturing, and entertainment verticals engage with Rancher commercially because they recognize that Rancher simply works better than other solutions.

Why is this? Is it really about one feature set versus another feature set, or is it about the freedom and breathing room that come from having a better way?

A Tale of Two Houses

Imagine that you’re walking down a street, and each side of the street is lined with houses. The houses on one side were constructed over time by different builders, and you can see that although every house contains walls, a floor, a roof, doors, and windows, they’re all completely different. Some were built from custom plans, while others were modified over time by the owner to fit a personal need.

You see a person working on his house, and you stop to ask him about the construction. You learn that the company that built his house did so with special red bricks that only come from one place. He paid a great deal of money to import the bricks and have the house built, and he beams with pride as he tells you about it.

“It’s artesenal,” he tells you. “The company who built my house is one of the biggest companies in the world. They’ve been building houses for years, so they know what they’re doing. My house only took a month to build!”

“What if you want to expand?” You point to other houses on his side of the street. “Does the builder come out and do the work?”

“Nope! I decide what I want to build, and then I build it. I like doing it this way. Being hands-on makes me feel like I’m in control.”

Your gaze moves to the other side of the street, where the houses were built by following a different strategy. Each house has an identical core, and where an owner made customizations, each house with that customization has it constructed the same.

You see a man outside of one of the houses, relaxing on his porch and drinking tea. He waves at you, so you walk over and strike up a conversation with him.

“Can you tell me about your house?”

“My house?” He smiles at you. “Sure thing! All of the houses on this side were built by one company. They use pre-fabricated components that are built off-site, brought in and assembled. It only takes a day to build one!”

“What about adding rooms and other features?”

“It’s easy,” he replies. The company has a standard interface for rooms, terraces, and any other add-on. When I want to expand, I just call them, and they come out and connect the room. Everything is pre-wired, so it goes in and comes online almost as fast as I can think of it.”

You ask if he had to do any extra work to connect to public utilities.

“Not at all!” he exclaims. “There’s a panel inside where I can choose which provider I want to connect to. I just had to pick one. If I want to change it in the future, I make a different selection. The house lets me choose everything – lawn care service provider, window cleaner, painter, everything I need to make the house liveable and keep it running. I just go to the panel, make my choice, and then go back to living.

“And best of all, my house was free.”

Rancher Always Works For You

Rancher Labs has designed Rancher to do the heaviest tasks around building and maintaining Kubernetes clusters.

Easily Launch Secure Clusters

Let’s start with the installation. Are you installing on bare metal? Cloud instances? Hosted provider? A mix? Do you want to give others the ability to deploy their own clusters, or do you want the flexibility to use multiple providers?

Maybe you just want to use AWS, or GCP, so multiple providers isn’t a big deal. Flexibility is still important. Your requirements today might be different in a month or a year.

With Rancher you can simply fire up a new cluster in another provider and begin migrating workloads, all from within the same interface.

Global Identity and RBAC

Whether you’re using multiple providers or not, the normal way of configuring access to a single cluster in one provider requires work. Access control policies take time to configure and maintain, and generally, once provisioned, are forgotten. If using multiple providers, it’s like learning multiple languages. Russian for AWS, Swahili for Google, Flemish for Azure, Uzbek for DigitalOcean or Rackspace…and if someone leaves the organization, who knows what they had access to? Who remembers how to speak Latin?

Rancher connects to backend identity providers, and from a global configuration it applies roles and policies to all of the clusters that it manages.

When you can deploy and manage multiple clusters as easily as you can a single one, and when you can do so securely, then it’s no big deal to spin up a cluster for UAT as part of the CI/CD test suite. It’s trivial to let developers have their own cluster to work on. You could even let multiple teams can share one cluster.

Solutions for Cluster Multi-Tenancy

How do you keep people from stepping on each other?

You can use Kubernetes Namespaces, but provisioning Roles across multiple Namespaces is tedious. Rancher collects Namespaces into Projects and lets you map Roles to the Project. This creates single-cluster multi-tenancy, so now you can have multiple teams, each only able to interact with their own Namespaces, all on the same cluster. You can have a dev/staging environment built exactly like production, and then you can easily get into the CD part of CI/CD.

Tools for Day Two Operations

What about all of the add-on tools? Monitoring. Alerts. Log shipping. Pipelines. You could provision and configure all of this yourself for every cluster, but it takes time. It’s easy to do wrong. It requires skills that internal staff may not have – do you want your staff learning all of the tools above, or do you want them focusing on business initiatives that generate revenue? To put it another way, do you want to spend your day spinning copper wire to connect to the phone system, or would you rather press a button and be done with it?

Rancher ships with tools for monitoring your clusters, dashboards for visualizing metrics, an engine for generating alerts and sending notifications, a pipeline system to enable CI/CD for those not already using an external system. With a click it ships logs off to Elasticsearch, Kafka, Fluentd, Splunk, or syslog.

Designed to Grow With You

The more a Kubernetes solution scales (the bigger or more complicated that it gets), the more important it is to have fast, repeatable ways to do things. What about using scripts like Ansible, Terraform, kops, or kubespray to launch clusters? They stop once the cluster is launched. If you want more, you have to script it yourself, and this adds a dependency on an internal asset to maintain and support those scripts. We’ve all been at companies where the person with the special powers left, and everyone who stayed had to scramble to figure out how to keep everything running. Why go down that path when there’s a better way?

Rancher makes everything related to launching and managing clusters easy, repeatable, fast, and within the skill set of everyone on the team. It spins up clusters reliably in any provider in minutes, and then it gives you a standard, unified interface for interacting with those clusters via UI or CLI. You don’t need to learn each provider’s nuances. You don’t need to manage credentials in each provider. You don’t need to create configuration files to add the clusters to monitoring systems. You don’t need to do a bunch of work on the hosts before installing Kubernetes. You don’t need to go to multiple places to do different things – everything is in one place, easy to find, and easy to use.

No Vendor Lock-In

This is significant. Companies who sell you a Kubernetes solution have a vested interest in keeping you locked to their platform. You have to run their operating system or use their facilities. You can only run certain software versions or use certain components. You can only buy complementary services from vendors they partner with.

Rancher Labs believes in something different. They believe that your success comes from the freedom to choose what’s best for you. Choose the operating system that you want to use. Build your systems in the provider you like best. If you want to build in multiple providers, Rancher gives you the tools to manage them as easily as you manage one. Use any provisioner.

What Rancher accelerates is the time between your decision to do something and when that thing is up and running. Rancher gets you out the gate and onto the track faster than any other solution.

The Wolf in a DIY Costume

Those who say that they want to “go vanilla” or “DIY” are usually looking at the cost of an alternative solution. Rancher is open source and free to use, so there’s no risk in trying it out and seeing what it does. It will even uninstall cleanly if you decide not to continue with it.

If you’re new to Kubernetes or if you’re not in a hands-on, in-the-trenches role, you might not know just how much work goes into correctly building and maintaining a single Kubernetes cluster, let alone multiple clusters. If you go the “vanilla Kubernetes” route with the hope that you’ll get a better ROI, it won’t work out. You’ll pay for it somewhere else, either in staff time, additional headcount, lost opportunity, downtime, or other places where time constraints interfere with progress.

Rancher takes all of the maintenance tasks for clusters and turns them into a workflow that saves time and money while keeping everything truly enterprise-grade. It will do this for single and multi-cluster Kubernetes environments, on-premise or in the cloud, for direct use or for business units offering Kubernetes-as-a-service, all from the same installation. It will even import the Kubernetes clusters you’ve already deployed and start managing them.

Having more than 20,000 deployments in production is something that we’re proud of. Being the container management platform for mission-critical applications in over 200 companies across so many verticals also makes us proud.

What we would really like is to have you be part of our community.

Join us in showing the world that there’s a better way. Download Rancher and start living in the house you deserve.

Read the Guide to Kubernetes with Rancher
This guide shows the challenges in running Kubernetes in production and how Rancher helps.
Tags: ,,, Category: Uncategorized Comments closed