Add-Ons In the Public Cloud

Wednesday, 26 September, 2018

SUSE offers a number of services and products around SUSE Linux Enterprise Server such as LTSS (Long Term Service Pack Support), SUSE Linux Enterprise Live Patching, and HA (SUSE Linux Enterprise High Availability Extension). While the additional products and services are agnostic to instance types and flavors (BYOS or on-demand), it is a matter of cost that favors the use of BYOS instances when you want to use any of the add-on services and products. Before answering the question why BYOS is in most cases favorable I will need to cover some technical details about repositories.

Add on services and products generally come in the form of extra repository streams that get added to a running system (VM). Repositories in turn are managed by a repository service. For BYOS instances this repository service is provided by SCC (SUSE Customer Center), your own SMT (Subscription Management Tool) server, or as of SUSE Linux Enterprise Server 15 RMT (Repository Management Tool),  or SUSE Manager.

For on-demand instances the repository service is provided by the SUSE operated update infrastructure in the Public Cloud framework, Amazon EC2, Google Compute Engine, or Microsoft Azure. Repository services are great in that they manage the system repositories based on what the service offers. If a repository is removed from the server that provides the repository that repository is also removed from the connected client system, rather than triggering an error upon refresh of the repos. However, all of this was designed and implemented when things like on-demand instances didn’t exist or were in their infancy. Therefore, functionality that would let  a client interact with more than one repository service provider  was not needed. Times have changed, of course, and we are trying to find a solution to this problem. It is a hard problem to solve and will take time. This is the technical background to keep in mind.

The business background is a data privacy issue and data privacy is very important. For on-demand usage of SUSE Linux Enterprise Server in the Public Cloud, SUSE receives revenue from the Cloud Service Providers (CSP) but no customer information. After all, on-demand users are customers of the framework provider and only indirect customers of SUSE and thus SUSE should not receive customer data from the service providers. This brings with it a disconnect of information. Since we cannot tell if a given user is running an on-demand instance, as this is not track-able, we also cannot correlate such usage to an entitlement of an add-on product. Since all add-on products also require a SLES subscription there is no way for SUSE to connect the dots.

But what does this all mean for using add-on products with Public Cloud instances?

Because we are unable to connect the dots on the business end between on-demand instances and direct subscriptions for add-on products you will have to get a SLES subscription from SUSE directly and then you can get a subscription to the add-on product. This implies that effectively you are paying a premium. For oon-demand instance we are already being paid through the cloud service provider. This is not a good situation for your budget which is why using BYOS instances for the use with add-on products is the favorable solution. When you use BYOS instances you are using a known SLES subscription for which you can easily add any of the available add-ons.

But this is not the end all. If you started with on-demand and you want to use add-ons you have 3 options:

1.) Rebuild your system starting with a BYOS image. A conversion from on-demand to BYOS is not possible, sorry. This is controlled by the framework providers and there is no going back and forth between BYOS and on-demand.
2.) Pay the above mentioned premium, the payment is somewhat implicit
3.) Use SUSE Manager

When you use SUSE Manager you can manage on-demand instances. This is described in SUSE Manager 3.0 arrives in the Public Cloud . Since SUSE Manager is only available as BYOS, you have a direct relationship with SUSE. This allows us to connect the dots and thus you can manage add-on products with SUSE Manager and push them to on-demand instances without incurring the premium described earlier.

None of this is really as bad as it may sound. Before firing up a new instance you have already thought about the workload that will be running on the instance. This provides you with the basic parameters to make a decision about BYOS or on-demand. Can the work load tolerate to go through a system upgrade once every roughly 18 months (yearly service pack releases + 6 months overlap period) and/or can the workload and usage thereof stand reboots for kernel updates on a more or less regular basis. If yes, then on-demand is a great way to go. On the other hand, maybe you have a work load that should not ever go down and thus Live Patching is what you want. In this case BYOS is the more cost effective choice to start out with.

As you can see the thought process about the workload that will be deployed also provides the necessary information that helps you make the decision about BYOS and on-demand and when in doubt, there is always the “Magic 8 Ball“.

Achieving PCI Compliance for Containers

Wednesday, 15 August, 2018

Although microservices and containers are not explicitly mentioned in PCI-DSS for PCI compliance, organizations implementing these technologies must focus carefully on monitoring, securing, and governance.

Microservices and containers offer some unique characteristics that support pci compliance. For example, microservices emphasize an architecture with one function per service/container. This aligns well with PCI-DSS 2.2.1, implementing only one primary function per server. Similarly, containers by design offer reduced functionality, aligning with PCI-DSS 2.2.2, enabling only necessary protocols and services.

At the same time, other aspects of microservices and containers make pci compliance a significant challenge. For example, the ephemeral nature of containers – potentially only “living” for a few minutes – means monitoring must be real-time and embedded to monitor and enforce all container activity. Plus, most container traffic is east-west in nature – versus north-south – meaning traditional security controls never see most container activity.

Finally, as containers come and go, so too does the scope of the Cardholder Data Environment (CDE). A continually changing CDE scope may be one of the most significant impacts of containers on monitoring and maintaining PCI-DSS compliance. As shown in the figure below, organizations must have visibility and control to define the in-scope CDE tightly. Without an advanced deep packet inspection (DPI) container firewall like NeuVector’s MultiVector container firewall, organizations implementing containers may have to consider the entire microservices environment in-scope! With Container DLP to detect unauthorized transmission of credit card PAN data, NeuVector helps ensure PCI compliance.

 

Watch the PCI-DSS for Containers Webinar

CyberEdge Senior Consultant Ted Ritter covers the recent PCI requirements as they relate to containers.

 

 

Download the PCI Guide

Download the complete guide to PCI Compliance with NeuVector. This report describes how NeuVector helps organizations comply with the Payment Card Industry Data Security Standard (PCI-DSS) version 3.2.1, issued in May 2018.

This guide covers the following critical PCI-DSS requirements which are affected by Docker and Kubernetes containers:

1.0 – Install and maintain a firewall configuration to protect cardholder data
2.0 – Do not use vendor-supplied defaults for system passwords and other security parameters
3.0 – Protect stored cardholder data
4.0 – Encrypt transmission of cardholder data across open, public networks
5.0 – Protect all systems against malware and regularly update anti-virus software or programs
6.0 – Develop/Maintain Secure Systems and Applications
7.0 – Restrict access to cardholder data by business need to know
8.0 – Identify and Authenticate Access to System Components
9.0 – (Does not apply)
10.0 – Track and Monitor All Access to Network Resources and Cardholder Data
11.0 – Regularly Test Security Systems and Processes
12.0 – (Does not apply)

DOWNLOAD GUIDE

SUSE Linux Enterprise 15 is Generally Available

Tuesday, 24 July, 2018

SUSE Linux Enterprise 15, a multimodal operating system, is generally available. You can download SUSE Linux Enterprise 15 from here.

SUSE Linux Enterprise 15 marks a milestone. It is the next major release since SUSE Linux Enterprise 12 in 2014Existing customers can use this as an opportunity to baseline their systems to SUSE Linux Enterprise 15 from older releases, so they are well positioned for many years to come.

Simplify Multimodal IT

SUSE Linux Enterprise 15 helps organizations bridge next-generation software-defined infrastructure and traditional infrastructure technologies. The modern, modular operating system helps simplify multimodal IT, makes traditional IT infrastructure more efficient and provides an engaging platform for developers. As a result, organizations can easily deploy and transition business-critical workloads across on-premise and public cloud environments.

Key features include:

  • Common Code Base. All SUSE Linux Enterprise 15 products share the same code base across all architectures. Packages are built using the same source code to ensure consistency and improve application portability across a multimodal IT
  • Modular+. With Modular+ architecture, everything is a module. Delivery of new features is easy, and you can get product updates and patches more frequently.
  • Unified Installer. The new Unified Installer simplifies IT operations. You can download just one Unified Installer to install all SUSE Linux Enterprise 15 products.
  • SUSE Linux Enterprise High Performance Computing 15 is launched as a separate productto address the growing compute and scale needs of high performance workloads for modeling, simulation and advanced analytics.

 

Learn more:

 

Stay tuned @RajMeel7

Deploying SLURM using SLE HPC patterns

Monday, 16 July, 2018

The expansion of High Performance Computing (HPC) beyond the niches of higher education and government realms to corporate and business computing use cases has been on the rise. One catalyst of this trend is increasing innovation in hardware platforms and software development. Both respectively drive down the cost of deploying supercomputing services with each iteration of advancement. Consider the commonality in science and business-based innovation spaces like Artificial Intelligence (AI) and machine learning. Access to affordable supercomputing services benefits higher education and business-based stakeholders alike. Thank you, Alexa?

Hardware overview

Making half of this point requires introduction to the hardware platform used for this example. Specifically, a collection of six ARMv8 CPU based Raspberry Pi 3 systems. Before scoffing at the example understand these Pis are really being used to demonstrate the latter, yet to be made point, of simplified HPC cluster software deployment. But, the hardware platform is still an important aspect.

ARM based Raspberry Pi cluster

Advanced RISC Machine (ARM) began working with Cray, a dominant force in the supercomputing space, in 2014. Initially collaborating with U.S. DoE and European Union based research projects interested in assessing the ARM CPU platform for scientific use. One motivator aside from the possibility of a lower cost hardware platform was the lessening of developer angst porting scientific software to ARM based systems. Most community authored scientific software does not fare well when ported to different hardware platforms (think x86 to GPU), but the move from x86 to ARMv8 is a far less troubled path. Often origin programming languages can be maintained, and existing code requires little change (and sometimes none).

Cray unveiled the first ARMv8 based supercomputer named “Isambard”, sporting 10,000 high performance cores, in 2017. . The debut comparison involved performance tests using common HPC code running on the most heavily utilised supercomputer in the U.K. at the University of Edinburgh, named “ARCHER”. The results demonstrated that the performance of the ARM based Isambard was comparable to the x86 Skylake processors used in ARCHER, but at a remarkably lower cost point.

Software overview

The Simple Linux Utility for Resource Management (SLURM), now known as the SLURM Workload Manager, is becoming the standard in many environments for HPC cluster use. SLURM is free to use, actively developed, and unifies some tasks previously distributed to discreet HPC software stacks.

  • Cluster Manager: Organising management and compute nodes into clusters that distribute computational work.
  • Job Scheduler: Computational work is submitted as jobs that utilise system resources such as CPU cores, memory, and time.
  • Cluster Workload Manager: A service that manages access to resources, starts, executes, and monitors work, and manages a pending queue of work.

 

Software packages

SLURM makes use of several software packages to provide the described facilities.

On workload manager server(s)

  • slurm: Provides the “slurmctld” service and is the SLURM central management daemon. It monitors all other SLURM daemons and resources, accepts work (jobs), and allocates resources to those jobs.
  • slurm-slurmdbd: Provides the “slurmdbd” service and provides an enterprise-wide interface to a database for SLURM. The slurmdbd service uses a database to record job, user, and group accounting information. The daemon can do so for multiple clusters using a single database.
  • mariadb: A MySQL compatible database that can be used for SLURM, locally or remotely.
  • munge: A program that obfuscates credentials containing the UID and GID of calling processes. Returned credentials can be passed to another process which can validate them using the unmunge program. This allows an unrelated and potentially remote process to ascertain the identity of the calling process. Munge is used to encode all inter-daemon authentications amongst SLURM daemons.

 

Recommendations:

  • Install multiple slurmctld instances for resiliency.
  • Install the database used by slurmdbd on a very fast disk/partition (SSD is recommended) and a very fast network link if a remote server is used.

 

On compute node servers

  • slurm-node: Provides the “slurmd” service and is the compute node daemon for SLURM. It monitors all tasks running on the compute node, accepts work (tasks), launches tasks, and kills running tasks upon request.
  • munge: A program that obfuscates credentials containing the UID and GID of calling processes. Returned credentials can be passed to another process which can validate them using the unmunge program. This allows an unrelated and potentially remote process to ascertain the identity of the calling process. Munge is used to encode all inter-daemon authentications amongst SLURM daemons.

 

Recommendations:

  • Install and configure the slurm-pam_slurm package to prevent users from logging into compute nodes not assigned to them, or where they do not have active jobs running.

 

Deployment

Identify the systems that will serve as workload manager hosts, database hosts, and compute nodes and install the minimal operating system components required. This example uses the openSUSE Leap 15 distribution. Because Leap 15 is based on the same code base as SLES 15, it is hoped this tutorial can be used interchangeably between them.

Fortunately, installing the packages required by the workload manager and compute node systems can be performed using existing installation patterns. Specifically, using the “HPC Workload Manager” and “HPC Basic Compute Node” patterns.

YaST HPC installation patterns

YaST HPC installation patterns

Note: The mariadb and slurm-pam_slurm packages are optional installations that can be selected when their respective patterns are selected.

Configuration

Following the software installations, some base configuration should be completed before implementing the SLURM control, database, or compute node daemons.

Workload manager and compute node systems

  • NTP services must be configured across all systems ensuring all are participate in the same time service and time zones.
  • DNS services are configured, and all cluster systems can resolve each other.
  • SLURM users and groups in the /etc/passwd and /etc/group files should have the same UID and GID values across systems. Adjust ownership of file system components if necessary.
    • /etc/slurm
    • /run/slurm
    • /var/spool/slurm
    • /var/log/slurm
  • Munge users and groups in the /etc/passwd and /etc/group files should have the same UID and GID values across systems. Adjust ownership of file system components if necessary.
    • /etc/munge
    • /run/munge
    • /var/lib/munge
    • /var/log/munge
  • The same munge secret key must be used across all systems.

 

By default, the munge secret key resides in /etc/munge/munge.key.

The munge.key file is created using /dev/urandom at installation time via the command:

~# dd if=/dev/urandom bs=1 count=1024

Subsequently it will differ from host to host. One option to ensure consistency across hosts is pick one from any host and copy it to all other hosts.

You can also create a new, arguably more secure, secret key using the following method:

~# dd if=/dev/random bs=1 count=1024 >/etc/munge/munge.key

The following tasks verify that the munge software has been properly configured.

Generate a credential package for the current user on stdout:

~# munge -n

Check if a credential package for the current user can be locally decoded:

~# munge -n | unmunge

Check if a credential package for the current user can be remotely decoded:

~# munge -n | ssh <somehost> unmunge

Workload manager and database systems

  • Open any required ports for the local firewall(s) as determined by daemon placement.

 

slurmctld port: 6817
slurmdbd port: 6819
scheduler port: 7321
mariadb port:    3306

Compute nodes must be able to communicate with the hosts running slurmctld.

For example, if the slurmctld, slurmdbd, and database are running on the same host:

~# firewall-cmd –permanent –zone=<cluster_network_zone> –add-port=6817/tcp
~# firewall-cmd –permanent –zone=<cluster_network_zone> –add-port=7321/tcp
~# firewall-cmd –reload

  • Configure the default database used by SLURM, “slurm_acct_db”, and the database user and password.

Assuming the local database was not configured during the pattern-based installation, use the following commands to configure the “slurm_acct_db” database and “slurmdb user” post installation.

Ensure the database is running.

~# systemctl start mariadb

~# mysql_secure_installation
~# mysql -u root -p

Provide the root password.

At the “MariaDB [(none)]>” prompt, issue the following commands:

Create the database access user and set the user password.

~# create user ‘slurmdb’@’localhost’ identified by ‘<user_password>’;

Grant rights for the user to the target database.

~# grant all on slurm_acct_db.* TO ‘slurmdb’@’localhost’;

Note: Modify ‘localhost’ with an actual FQDN if required.

Create the SLURM database.

~# create database slurm_acct_db;

Validate the user and database exist.

~# SELECT User,Host FROM mysql.user;
~# SHOW DATABASES;
~# exit

Ensure the database is enabled at system startup.

~# systemctl enable mariadb

  • Configure the database for real world use.

 

The default buffer size, log size, and lock wait time outs for the database should be adjusted before slurmdbd is started for the first time. Doing so prevents potential issues with database table and schema updates, and record purging operations.

Consider setting the buffer and log sizes equal in size to 50% or 75% of the host system memory and doubling the default time out settings.

Modify the settings in the /etc/my.cnf.d/innodb.cnf file:

[mysqld]
innodb_buffer_pool_size=256M
innodb_log_file_size=256M
innodb_lock_wait_timeout=1800

Note: The default buffer size is 128M.

To implement this change you must shut down the database and move/remove the log files:

~# systemctl stop mariadb
~# rm /var/lib/mysql/ib_logfile?
~# systemctl start mariadb

Verify the new buffer setting using the following command in the MariaDB shell:

~# SHOW VARIABLES LIKE ‘innodb_buffer_pool_size’;

  • Configure the slurmdbd.conf file.

 

Ensure the /etc/slurm/slurmdbd.conf file contains the following directives with valid values:

AuthType=auth/munge
DbdHost=localhost
SlurmUser=slurm
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/run/slurm/slurmdbd.pid
PluginDir=/usr/lib64/slurm
StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePass=<user_password>
StorageUser=slurmdb
StorageLoc=slurm_acct_db

Consider adding directives and values to enforce life-cycles across job related database records:

PurgeEventAfter=12months
PurgeJobAfter=12months
PurgeResvAfter=2months
PurgeStepAfter=2months
PurgeSuspendAfter=1month
PurgeTXNAfter=12months
PurgeUsageAfter=12months

  • Configure the slurm.conf file.

 

The /etc/slurm/slurm.conf file is used by the slurmctld and slurmd daemons. There are configuration file forms available online at slurm.schedmd.com site for the latest SLURM version to assist you in generating a slurm.conf file. Additionally, if the workload manager server also provides a web server the “/usr/share/doc/slurm-<version>/html” directory can be served locally to provide the SLURM documentation and configuration forms specific to the SLURM version installed.

For a feature complete configuration file:

https://slurm.schedmd.com/configurator.html

For a feature minimal configuration file:

https://slurm.schedmd.com/configurator.easy.html

Using the configurator.easy.html form, the following initial slurm.conf file was created:

ControlMachine=darkvixen102
AuthType=auth/munge
MpiDefault=none
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmUser=slurm
SwitchType=switch/none
TaskPlugin=task/none
SlurmctldPidFile=/run/slurm/slurmctld.pid
SlurmdPidFile=/run/slurm/slurmd.pid
SlurmdSpoolDir=/var/spool/slurm
StateSaveLocation=/var/spool/slurm
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
JobAcctGatherFrequency=30
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
NodeName=node[1-4] CPUs=4 RealMemory=950 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
PartitionName=normal_q Nodes=node[1-4] Default=YES MaxTime=480 State=UP

Add the following directives and values to the slurm.conf file to complete the database configuration and name the cluster. The cluster name will also be added to the database when all services are running.

ClusterName=hangar_hpc
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=localhost
JobAcctGatherType=jobacct_gather/linux

Copy the completed /etc/slurm/slurm.conf file to all compute nodes.

Note: The “scontrol” utility is used to view and modify the running SLURM configuration and state across a cluster. Most changes in modified slurm.conf files distributed to cluster nodes can be implemented using the scontrol utility. Using the “reconfigure” argument the utility can force all daemons to re-read updated configuration files and modify runtime settings without requiring daemon restarts. Some configuration file changes, such as authentication, system roles, or ports, will require all daemons to be restarted.

Issue the following command on a system running slurmctld to reconfigure a cluster:

~# scontrol reconfigure

  • Modify service systemd configuration files to honour daemon dependencies.

 

SLURM requires munge to be running before any SLURM daemon loads, the database to be up before slurmdbd loads, and slurmctld requires slurmdbd to be running before it loads. Modify the systemd service files for SLURM daemons to ensure these dependencies are met.

Locally customized systemd files must be placed in the /etc/systemd/system directory.

~# cp /usr/lib/systemd/system/slurmctld.service /usr/lib/systemd/system/slurmdbd.service /etc/systemd/system/

Add the prerequisite “After= services” to the file /etc/systemd/system/slurmdbd.service:

[Unit]
Description=Slurm DBD accounting daemon
After=network.target mariadb.service munge.service
ConditionPathExists=/etc/slurm/slurm.conf

Add the prerequisite “After= services” to the file /etc/systemd/system/slurmctld.service:

[Unit]
Description=Slurm controller daemon
After=network.target slurmdbd.service munge.service
ConditionPathExists=/etc/slurm/slurm.conf

  • Enable the slurmdbd and slurmctld daemons to load at system start up, and then start them.

 

~# systemctl enable slurmdbd
~# systemctl enable slurmctld
~# systemctl start slurmdbd
~# systemctl start slurmctld

  • Name the cluster within the SLURM account database.

 

Use the SLURM account information utility to write to, and read from the database.

~# sacctmgr add cluster hangar_hpc
~# sacctmgr list cluster
~# sacctmgr list configuration
~# sacctmgr list stats

Compute node systems

SLURM compute nodes are assigned to a job queue, in SLURM parlance called a partition, enabling them to receive work. Compute nodes ideally belong to partitions that align hardware with the type of compute work to be performed. The software required by a compute job can also dictate which partition in the cluster should be used for work.

Basic compute node deployment from a SLURM perspective is a straight forward task. Once the OS (a minimal pattern is again recommended) and the “HPC Basic Compute Node” pattern is deployed it becomes a matter of completing the following tasks.

  • Open any required ports for the local firewall(s) as determined by daemon placement.

 

slurmd port:     6818

 

Note: It is recommended that local firewalls not be implemented on compute nodes. Compute nodes should rely on the host infrastructure to provide the security required.

  • Distribute the cluster specific /etc/munge/munge.key file to the node.
  • Distribute the cluster specific /etc/slurm/slurm.conf file to the node.

 

Note: The slurm.conf file specifies the partition compute nodes belongs to.

  • Modify service systemd configuration files to honour daemon dependencies.

 

Again, SLURM requires munge to be running before any daemon loads. Specifically, munge needs to be running before slurmd loads. Modify the systemd service files for SLURM daemons to ensure these dependencies are met.

Locally customized systemd files must be placed in the /etc/systemd/system directory.

~# cp /usr/lib/systemd/system/slurmd.service /etc/systemd/system/

Add the prerequisite “After= services” to the file /etc/systemd/system/slurmd.service:

[Unit]
Description=Slurm node daemon
After=network.target munge.service
ConditionPathExists=/etc/slurm/slurm.conf

  • Enable the slurmd daemon to load at system start up, and then start it.

 

~# systemctl enable slurmd
~# systemctl start slurmd

Taking the new cluster for a walk

A basic assessment of the state of the cluster is now possible because all daemons are configured and running. The “sinfo” utility is used to view information about SLURM nodes and partitions, and again the “scontrol” command is used to view and modify the SLURM configuration and state across a cluster.

The following commands are issued from the management node running slurmctld:

SLURM node info

Assessing node states and information.

SLURM partition info

Assessing partition states and information.

SLURM configuration info

Assessing cluster configuration information.


SLURM maintenance commands

Changing compute node states.


Summary

What is detailed here could easily be applied to other open source distributions of both Linux and SLURM. It should also be said that this example is not intended to over simplify what represents a proper production HPC cluster. Without even mentioning data and workflow design considerations, many standard HPC cluster system roles are not discussed. The short list would include high performance parallel file systems used for compute work operating over high speed interconnects, high capacity storage used as longer-term storage for completed compute work, applications (delivered traditionally or using containers), job submission nodes, and data transfer nodes, also using high speed interconnects. Hopefully this example serves as a basic SLURM tutorial, and demonstrates how the SLE 15 based openSUSE distribution unifies software components into an easily deployable HPC cluster stack that will scale and run against existing x86 and emerging ARM based hardware platforms.

Consumability – What Does It Mean?

Friday, 13 July, 2018

Consumability, it’s an odd word that I like to toss around in regard to SUSE offerings in the market.  Given that it’s not well defined, I thought I’d try to help you understand what I mean when I say it.

Rather than being some narrowly defined term, I believe that the phrase consumability encompasses a number of ideas and themes, especially when used in context of enterprise customers. These themes include the ability to easily implement, manage, and receive support.  I’ll dive into each of these a bit and give some examples of how it is done by SUSE today.

When it comes to implementation, it’s important to understand the genesis of many open source projects.  They are started as an idea to scratch an itch that the developer has. Consequently, when it is good enough for their purpose, many developers will set it aside to go on to other things.  I know that I am also guilty of doing this with scripts that I use regularly.  The problem is that this represents only about 80% of the distance to being a tool that is easy to deploy, use, and support.  Now, I know I’ll probably warm up some haters here, but simple implementation != git clone and then fiddling around with config files.  When it comes to enterprise businesses, they expect to be able to insert the media, click setup, answer a few questions and then have the application installed and working for their environment.

This is where packages come in and help with the Linux crowd.  However, zypper in mypackagename, may not be intuitive enough for some customers and still doesn’t address the configuration of the packages.  That’s where a single interface for system work comes in.

For SUSE, this single management interface is YaST. With labels that aren’t cryptic for the functions provided and also being the one place that most all of your system administration can happen, this is about as easy as things get for managing a single system’s software load.  There are both text and graphical versions, thus meeting the needs of different sets of users and making it easy to implement new functionality.

While YaST is great for a single server environment, it’s definitely not the cat’s meow for doing things at a large scale.  When I think large scale, that’s where projects like OpenStack and Ceph come to mind.  These distributed environments can have dozens, hundreds, or even thousands of nodes.  Now, while YaST is a great tool, it’s not so great when you have 1000 systems to deploy.

This is where SUSE strikes again for the win.  Whether we’re talking about your data center, your retail environment, your private cloud, or distributed storage, SUSE has engineered a solution that makes it easy to roll it out.  For the data center and retail environments, we’ve got SUSE Manager, and it does quite a bit more than just deployment.  For distributed environments, we focus on how easy can we make it by balancing flexibility with, you guessed it, consumability.

This is also where SUSE brings the heat for OpenStack and Ceph customers.  If I told you that it’s possible to deploy a 5 PB, 100+ open-source distributed storage cluster in under two hours, would you believe me?  You should!  SUSE has done just this for a large customer in Europe.  And for OpenStack, just search for the results for the Rule The Stack competition.  SUSE employees dominated this because we have a heavy focus on making it easy to implement.

What about operational manageability? Well, I’ve already talked a bit about our management product for Linux datacenters, SUSE Manager. But outside of SUSE Manager, we have also included manageability in our individual products.  Again, YaST plays a role here, but we have done other things as well.  Take SUSE Enterprise Storage for example.  We brought the OpenATTIC project and people to SUSE over a year ago and kicked in the afterburners.  Since then, OpenATTIC has morphed into the best of breed open-source management interface for Ceph. You can see some screenshots here.

The last aspect is supportability.  SUSE has always excelled here with the highest customer satisfaction among commercial Linux vendors, but this goes beyond just our support people and encompasses the ability for an enterprise to roll SUSE into their support process and be able to provide an environment that is easy to add users to, modify or day to day administration, and provide information to SUSE support when something goes wrong.  The SUSE Customer Center and our supportconfig tool help round out this part of the equation.

SUSE partners are also part of the equation here.  The SUSE partner ecosystem is one where we work hard to educate our partners on how to implement and support SUSE products.  We look for opportunities to integrate our products tightly and test them together.  And most importantly, we provide escalation paths appropriate to the partner business model.  This helps create the most supportable environments possible for our joint customers.

Supportability also includes thinking about the future and scaling.  This requires looking at the software architecturally to make sure it is solid.  It requires evaluating the community around the software and the maturity and leadership offered there.  It requires gazing into the crystal ball and thinking about the future and if the software solves a long-term problem or not.

So, in summary, consumability is really looking at the big picture and trying to cover all the bases.  It’s about ensuring a positive experience and the goal of taking open-source to that 100% complete mark.  It’s a mentality, a way of thinking that takes a customer first mentality in product development and support.  So the net is, it’s the SUSE way.

 

Microsoft Azure Site Recovery supports SUSE Linux Enterprise Server

Wednesday, 11 July, 2018

Microsoft Azure Site Recovery (ASR) now supports SUSE Linux Enterprise Server 11 SP3/SP4 and SUSE Linux Enterprise Server 12 SP1/SP2/SP3. This is great for customers that are planning to migrate systems to Microsoft Azure or customers who need to have a business continuity strategy for their Azure deployments.

Migration

Azure Site Recovery enables SUSE customers to migrate their non-Azure virtual machines or physical servers to Microsoft Azure virtual machines. ASR requires a Process Server, Configuration Server and the installation of the Mobility Service on each of the source machines. The Mobility Service captures all data writes from memory and will send them to the Management Server which will cache, compress and encrypt the data transmitted to Azure. The data is securely replicated and validated by the ASR services. Microsoft Azure Site Recovery supports many replication scenarios where the source system is not on Azure. To see the supported scenarios with documentation and tutorials, click on the link containing Microsoft Azure Site Recovery Scenarios. Below is an image from Microsoft’s Physical to Azure Architecture scenario.

Business Continuity

Another challenge customers face is how do they provide disaster recovery for workloads on Microsoft Azure. ASR has made it easy for SUSE virtual machines running in an Azure region to failover to a different Azure region for business continuity. Using Azure native services such as resource groups and storage accounts, ASR provides flexible failover plans and configurable RPOs. Additionally, ASR customers are able to use a cold site disaster recovery strategy to save money. In other words, the replication target is not a set of servers that are running and incurring charges for stand-by systems, it is a special storage account named a Recovery Services vault that contains copies of data and configuration information of the customer’s protected virtual machines. This means you only have to bring up the virtual machines for testing your disaster recovery plan or in the event of an actual disaster recovery.

Conclusion

As you read the post you might be asking yourself is this production ready and who is using it? Luckily, we have an answer. Daimler has entrusted its global procurement system to SUSE Enterprise Linux Server for SAP Applications clustering capability and Azure Site Recovery technologies to meet their business continuity SLAs. To read more about Daimler’s story click on the link: Daimler – Microsoft Customer Story.

If you are a SUSE customer and are interested on how you can benefit from Site Recovery reach out to your SUSE account team or email azure@suse.com!

Making the Most of SAP S/4HANA: Committing to a Platform

Thursday, 28 June, 2018

Making the Most of SAP S/4HANA: Committing to a Platform

You’ve explored the range of possible solutions for SAP S/4HANA and have made some important decisions: getting a good understanding of your enterprise ecosystem and what data needs to be moved where; opting to work with an SAP migration consultant (or not); and selecting your preferred deployment model.

Because SAP S/4HANA only runs on the Linux operating system, your most critical choice in this phase is which Linux distribution to use—such as SUSE® Linux Enterprise Server for SAP Applications. Not all Linux distributions are created equal. They can vary widely in terms of the product quality, governance model, availability of added tools and features, and the level and quality of support.

To ensure you make the right choice for your business, keep the objectives below in mind when considering a platform for SAP S/4HANA.

Reducing Infrastructure Complexity

You chose SAP S/4HANA, SAP HANA and other SAP applications because they can greatly simplify operations and work together seamlessly. Your operating system should be just as easy to manage and run

SUSE Linux Enterprise Server for SAP Applications works so smoothly with SAP S/4HANA and other SAP solutions because it was developed specifically for SAP. In fact, SUSE Linux Enterprise Server for SAP Applications was the first Linux distribution for SAP HANA and SAP S/4HANA—and SAP itself uses it as its development platform.

SUSE helps reduce complexity by automating routing tasks such as adding servers and deploying upgrades. It also includes an installation wizard with tuning parameters and an installation configuration package that allows you to quickly and accurately configure the operating system to support SAP applications. It simplifies integration with Microsoft environments by integrating with Active Directory and Remote Desktop Protocol, so IT can use existing Microsoft credentials to access SUSE systems.

SUSE Manager is an additional tool that allows you to manage all of your Linux systems in one place, and a SUSE Manager add-on is even available that integrates it with Microsoft System Center Operations Manager.

Getting the Tools, Services and Support You Need

A big differentiator between Linux distributions is the built-in features and added tool options that come with them. With decades of enterprise support and expertise behind it, SUSE Linux Enterprise Server for SAP Applications includes a wealth of tools that help you achieve your goals.

For instance, SUSE helps increase service availability with built-in business continuity features that include automated data recovery, resource agents that automate the takeover in SAP HANA system replication setups, and full operating system rollback. An SAP-certified high-availability extension, certified for SAP NetWeaver, provides an integrated clustering solution for physical and virtual Linux deployments, which allows you to implement highly available Linux clusters to eliminate single points of failure.

When you choose SUSE Linux Enterprise Server for SAP Applications, you get a dedicated update channel that delivers operating system enhancements that correspond to SAP application updates. You also get priority support and maintenance 24 hours a day, seven days a week.

Finding a Partner That Delivers It All

Your Linux provider should have longtime experience in enterprise Linux and helping organizations overcome their challenges, especially when it comes to SAP migration. It should have a strong working relationship with SAP and should be continually evolving the operating system to keep up with SAP solution changes.

SUSE has worked with SAP for more than 20 years and our close relationship is evident in the fact that 95 percent of all SAP HANA installations run on SUSE Linux Enterprise Server for SAP Applications—including SAP’s own HANA installation. All of this is because SUSE Linux Enterprise Server for SAP Applications is designed to deliver the enterprise-level reliability, availability and scalability (RAS) capabilities today’s organizations require.

The Metrics that Matter: Horizontal Pod Autoscaling with Metrics Server

Tuesday, 26 June, 2018

Take a deep dive into Best Practices in Kubernetes Networking
From overlay networking and SSL to ingress controllers and network security policies, we’ve seen many users get hung up on Kubernetes networking challenges. In this video recording, we dive into Kubernetes networking, and discuss best practices for a wide variety of deployment options.

Sometimes I feel that those of us with a bend toward distributed systems engineering like pain. Building distributed systems is hard. Every organization regardless of industry, is not only looking to solve their business problems, but to do so at potentially massive scale. On top of the challenges that come with scale, they are also concerned with creating new features and avoiding regression. And even if they achieve all of those objectives with excellence, there’s still concerns about information security, regulatory compliance, and building value into all the investment of the business.

If that picture sounds like your team and your system is now in production – congratulations! You’ve survived round 1.

Regardless of your best attempts to build a great system, sometimes life happens. There’s lots of examples of this. A great product, or viral adoption, may bring unprecidented success, and bring with it an end to how you thought your system may handle scale.

Pokémon GO Cloud Datastore Transactions Per Second Expected vs. Actual

Source: Bringing Pokémon GO to life on Google Cloud, pulled 30 May 2018

You know this may happen, and you should be prepared. That’s what this series of posts is about. Over the course of this series we’re going to cover things you should be tracking, why you should track it, and possible mitigations to handle possible root causes.

We’ll walk through each metric, methods for tracking it and things you can do about it. We’ll be using different tools for gathering and analyzing this data. We won’t be diving into too many details, but we’ll have links so you can learn more. Without further ado, let’s get started.

Metrics are for Monitoring, and More

These posts are focused upon monitoring and running Kubernetes clusters. Logs are great, but at scale they are more useful for post-mortem analysis than alerting operators that there’s a growing problem. Metrics Server allows for the monitoring of container CPU and memory usage as well as on the nodes they’re running.

This allows operators to set and monitor KPIs (Key Performance Indicators). These operator-defined levels give operations teams a way to determine when an application or node is unhealthy. This gives them all the data they need to see problems as they manifest.

In addition, Metrics Server allows Kubernetes to enable Horizontal Pod Autoscaling. This capability allows Kubernetes autoscaling to scale pod instance count for a number of API objects based upon metrics reported by the Kubernetes Metrics API, reported by Metrics Server.

If you’re just getting underway with Kubernetes, read the Introduction to Kubernetes Monitoring, which will help you get the most out of the rest of this article.

Setting up Metrics Server in Rancher-Managed Kubernetes Clusters

Metrics Server became the standard for pulling container metrics starting with Kubernetes 1.8 by plugging into the Kubernetes Monitoring Architecture. Prior to this standardization, the default was Heapster, which has been deprecated in favor of Metrics Server.

Today, under normal circumstances, Metrics Server won’t run on a Kubernetes Cluster provisioned by Rancher 2.0.2. This will be fixed in a later version of Rancher 2.0. Check our Github repo for the latest version of Rancher.

In order to make this work, you’ll have to modify the cluster definition via the Rancher Server API. Doing so will allow the Rancher Server to modify the Kubelet and KubeAPI arguments to include the flags required for Metrics Server to function properly.

Instructions for doing this on a Rancher Provisioned cluster, as well as instructions for modifying other hyperkube-based clusters is availabe on github here.

Take a deep dive into Best Practices in Kubernetes Networking
From overlay networking and SSL to ingress controllers and network security policies, we’ve seen many users get hung up on Kubernetes networking challenges. In this video recording, we dive into Kubernetes networking, and discuss best practices for a wide variety of deployment options.

SUSE Introduces Multimodal OS to Bridge Traditional and Software-Defined Infrastructure

Monday, 25 June, 2018

SUSE today launched SUSE Linux Enterprise 15, the latest version of its flagship operating platform that bridges next-generation software-defined infrastructure with traditional infrastructure technologies. The modern, modular operating system helps simplify multimodal IT, makes traditional IT infrastructure more efficient and provides an engaging platform for developers. As a result, organizations can easily deploy and transition business-critical workloads across on-premise and public cloud environments.

#MultimodalOS - SUSE Linux Enterprise 15

 

Multimodal IT

As organizations around the world transform their enterprise systems to embrace modern and agile technologies, multiple infrastructures for different workloads and applications are needed. This often means integrating cloud-based platforms into enterprise systems, merging containerized development with traditional development, or combining legacy applications with microservices. To bridge traditional and software-defined infrastructure, SUSE has built a multimodal operating system – SUSE Linux Enterprise 15.

Multimodal IT, SUSE Linux Enterprise 15

The platform uses a “common code base” to ensure application mobility across multimodal IT environments. Whether organizations build microservices using SUSE CaaS Platform, deploy the latest SAP applications on SUSE Linux Enterprise Server or use SUSE OpenStack Cloud to manage system resources, the common code base ensures consistency and helps them move application workloads transparently across traditional and software-defined infrastructure.

You can easily transition to or leverage public cloud – Amazon Web Services, Google Cloud Platform or Microsoft Azure – through SUSE Linux Enterprise Bring-Your-Own-Subscription (BYOS) programs. Additionally, SUSE Linux Enterprise 15 now provides a custom-tuned kernel for workloads on Microsoft Azure to enable faster boot speeds with a decreased memory footprint. The Azure-tuned kernel will also enable faster access to new and upcoming Azure features.

Multimodal OS Architecture

Modular+

With an architectural emphasis on building bridges, SUSE recognizes the need for organizations to protect current IT investments while transforming and modernizing their IT infrastructure. The SUSE Linux Enterprise 15 “Modular+” architecture addresses the challenges customers are facing when trying to innovate within existing, traditional IT infrastructure and make it more efficient. In Modular+ architecture, everything is a module, meaning SUSE can deliver product updates and patches more frequently. The modular approach lets customers install only the features they need, which simplifies planning and reduces risk.

Modular+

Developer Friendly

SUSE Linux Enterprise 15 accelerates enterprise transition from free developer subscription or community Linux (openSUSE Leap) setups to production deployments of fully supported enterprise Linux. Designed to be integrated into commonly used modern development methodologies like DevOps and CI/CD, it boosts users with a faster time to market leveraging open source technology, methods and expertise.

The SUSE Linux Enterprise 15 product family includes:

Some of the key enhancements for the SUSE Linux Enterprise 15 product family include:

  • SUSE Linux Enterprise High Performance Computing 15 is launched as a separate product to address the growing needs of HPC market with a comprehensive set of supported tools specifically designed for the parallel computing environment, including workload and cluster management.
  • SUSE Linux Enterprise Server for SAP Applications. New version 15 capabilities include non-volatile dual in-line memory module (NVDIMM) support for disk-less databases and enhanced high availability features for IBM Power Systems. A new feature, “workload memory protection,” provides an open source-based, more-scalable solution to sustain high performance levels for SAP applications.
  • Starting with release 15, High Availability Extension will come integrated with Geo Clustering solution, so organizations can easily connect data centers across the world while providing a resilient and highly available infrastructure.

Note: SUSE Linux Enterprise 15 products will be generally available in mid-July, 2018.

Links at a quick glance

Stay tuned @RajMeel7