Elasticity as a Service

This page explaines how the elasticity as a service has been implemented in the BonFIRE stack.

General concept

The elasticity as a service is a feature that allows experimenters to run an elastic experiment. Elasticity means that is possible to dynamically increase or decrease the number of computing resources to cope with different situation of load.

The experimenter only need to prepare an image that contains his service and configure the Elasticity Engine in order to use this disk during an elastic experiment. The elasticity engine will use this disk to create the virtual machine that will serve users requests. In order to distribute the load among those virtual machine there are two different kind of load balancers: HAProxy and Kamailio (using the dispatcher module). Those are enough for HTTP and SIP application stateless load balancing.

The EaaS in BonFIRE uses the monitoring aggregator for retrieving information regarding the load of the virtual machines. It interoperates with the broker for dynamically adding and removing compute resources based on some “rules” specified by the user. For more information about how to do this, please see the How To Use the BonFIRE EaaS page.

../_images/elenArchitecture.png

Figure 1: Elasticity as a Service architecture

XML-RPC API

The Elasticity Engine offers an XML-RPC API which can be used by external clients for creating and destroying an elastic group, for retrieving the list of running VMs and the load balancer.

public Service addAndInitGroup(Map<String, Object> vmgroup , Template template, ServiceType serviceType, List<Trigger> triggers) throws CloudException, ElenException, NotFoundException;
public int stopGroup(int id);
public List<VirtualMachineModel> listRunningVMs();
public LoadBalancer getLoadBalancer();

API Used

The Elasticity Engine interoperates with the resource manager using the java client libraries. In particular it needs the following APIs for the storage resources:

client.getStorage(storage_id);
client.listStorages(location_id);

For the network resources:

client.getNetwork(location_id);
client.listNetworks(location_id);

And for the compute resources:

c = e.createCompute(vm.getName(), s, n,cpu, vcpu, memory, host, cluster); // prepare the compute resource
c = e.createCompute(vm.getName(), s, n, cpu, vcpu, memory);
c.addUsage("zabbix-agent-elasticity"); // add usage parameter
client.storeObject(c); // for creating the compute resource
c.isActive(); // for checking if the

ZABBIX API Used

For interoperating with zabbix, the elasticity engine uses the APIs which are xml-rpc based. In particular it uses the following APIs for creating / removing host groups and retrieving information about hosts

createHostGroup(String groupName);
deleteHostGroup(String groupName);
getTemplateIdByName(String template);
getHostIdByName(String name);

Additionally it uses also the APIs for retrieving items, their values and the status of the triggers.

createItem(String key, String hostid,String period);
removeItem(String key);
getLastItemValueBean(String metricId, String valueType);
getItemIdByKey(String hostId, String key);
getValueTypeByItemId(String itemId);
createTrigger(Trigger trig, String vmName);
getTriggerId(String templateTriggerId, String vmName);
getTriggerId(String description);
getTriggerID(String description,String vmName);
checkTriggerStatus(String triggerId);
getGroupIdByName(String group);

ZABBIX Message Queue

The trigger mechanism provided by zabbix has been used in order to scale up and down resources. Basically the Elasticity Engine, during the deployment, configures a trigger for the resources part of the BonFIRE elasticity group. Zabbix evaluates the expression periodically, and whenever it is true it executes certain actions. Those actions can be configured by the administrator. For this case the message queue has been employed as a delivery mechanism. Zabbix executes a script which sends the following message into the message queue:

description : {TRIGGER.NAME} - {STATUS}
        hostname : {HOSTNAME}
        lastValue : {{HOSTNAME}:{TRIGGER.KEY}.last(0)}
        triggerId : {TRIGGER.ID}
        triggerKey1 : {TRIGGER.KEY1}
        nodeId : {NODE.ID1}
        triggerComment : {TRIGGER.COMMENT}

Those are the information sent by zabbix when a trigger is executed.

Implementation details

For creating an elastic group is basically needed to configure some parameters using the contextualization of the Elasticity Engine virtual machine. Those parameter are necessary for providing some basic information to the Elasticity Engine that will be used during the execution. This is possible in two different ways: using the portal or the API.

Contextualization

For deploying an elasticity engine, it is just necessary to deploy a standard debian compute resource and specify additional tags into the usage and context sections. In particular the usage tag is: elasticity-engine while the content of the context section is:

<usage>zabbix-agent</usage>
<aggregator_ip>aggregator_ip</aggregator_ip>
<ELASTICITY_TRIGGER_UPSCALE_EXPRESSION>{system.cpu.usage.last(0)}>70</ELASTICITY_TRIGGER_UPSCALE_EXPRESSION>
<ELASTICITY_VMGROUP_MAX>5</ELASTICITY_VMGROUP_MAX>
<AGGREGATOR_USER>Admin</AGGREGATOR_USER>
<AGGREGATOR_PASSWD>zabbix</AGGREGATOR_PASSWD>
<ELASTICITY_INSTANCETYPE>lite</ELASTICITY_INSTANCETYPE>
<ELASTICITY_DISKSOURCE>disk_name</ELASTICITY_DISKSOURCE>
<ELASTICITY_LOCATIONS>fr-inria;uk-epcc;de-hlrs</ELASTICITY_LOCATIONS>
<ELASTICITY_NETWORK>BonFIRE WAN</ELASTICITY_NETWORK>
<ELASTICITY_LB_SCHEME>HAProxy</ELASTICITY_LB_SCHEME>
<ELASTICITY_VMGROUP_MIN>1</ELASTICITY_VMGROUP_MIN>
<ELASTICITY_LB_PORT>80</ELASTICITY_LB_PORT>
<ELASTICITY_LB_LOCATION>/locations/fr-inria</ELASTICITY_LB_LOCATION>
<ELASTICITY_VMGROUP_NAME>eaas</ELASTICITY_VMGROUP_NAME>
<usage>elasticity-engine</usage
<ELASTICITY_CLUSTER>uuid</ELASTICITY_CLUSTER>

Basically the information inserted in the context section of the request are parsed by the elasticity-engine script which is executed at boot time. It uses those information for creating the deploymentModel.xml file which is used by the elasticity engine directly.

Introducing new strategies

For changing the actions taken during the scaling up and down operations, it is necessary to create a new strategy class, and substitute its name into the deploymentModel.xml file.

<trigger>
        <!-- Special unique name -->
        <name>upscale</name>
        <type>CPU</type>
        <expression>{system.cpu.usage.last(0)}>80</expression>
        <action>addVm</action>
        <strategy>
                <solver>de.fhg.fokus.elen.strategies.CPUupscaleTriggersProcessor</solver>
        </strategy>
</trigger>

in particular it is necessary to change de.fhg.fokus.elen.strategies.CPUupscaleTriggersProcessor with the new class name. This allows experimenters to change algorithms executed when a specific alarm is triggered. For instance in the previous case the de.fhg.fokus.elen.strategies.CPUupscaleTriggersProcessor is executed when the cpu usage of the virtual machine is overcoming 80%. Please notice that having a group of virtual machine, this action is executed only once, when the first virtual machine part of the group overcomes the threshold. It has to be a Thread extending TriggersResolver class. In particular this is an example of the run method, which is executed when zabbix sends an alarm.

@Override
public void run() {

        parentTrigger.setInProcess(true); // this allows to execute this action only once, when receiving trigger (of the same problem) from different virtual machines part of the same group.
        do {
                try {
                        if (vmGroup.getLock().tryLock(2000, TimeUnit.MILLISECONDS)) {
                                try {
                                        try {
                                                log.debug("First attempt to resolve the PROBLEM for trigger with parent ID " + parentTrigger.getId());
                                                vmMesMan = MeasurementsManager.getVmMesMan();
                                                List<Trigger> triggers = TriggersContainer.getTriggersByParent(parentTrigger); // get the list of triggers which are part of the same group, identified by a parent id
                                                updateTriggersStatus(triggers); // contact zabbix for updating the trigger status. The status can be true or false.

                                                while (isResolved(triggers) != true) {
                                                        int recheckDelay = 5000;
                                                        int recheck = 60000;
                                                        if(groupInTriggerState()) // internal function for checking if the average of the group is over the threshold or not
                                                        {
                                                                log.debug("The Group is in trigger state, adding a new VM");
                                                                vmGroup.addVmFromPool();
                                                                log.debug("Wait for "+recheck/1000+" seconds and check again.");

                                                                try {
                                                                        Thread.sleep(recheck);
                                                                } catch (InterruptedException e) {
                                                                        log.debug("CPU check received an interrupt");
                                                                }
                                                        }
                                                        else
                                                        {
                                                                log.debug("The received trigger was only an alarm from a single VM, is not necessary to add a new VM in the group");
                                                                log.debug("Wait for " + recheckDelay / 1000+ " seconds and check again.");
                                                                try {
                                                                        Thread.sleep(recheckDelay);
                                                                } catch (InterruptedException e) {
                                                                        log.debug("CPU check received an interrupt");
                                                                }
                                                        }

                                                        triggers = TriggersContainer.getTriggersByParent(parentTrigger); // Done in case a new VM was added and a new trigger created
                                                        updateTriggersStatus(triggers);
                                                }
                                                log.debug("CPU problem solved!");
                                                Thread.sleep(60000);
                                                parentTrigger.setInProblemState(false);
                                                parentTrigger.setInProcess(false);

                                        } catch (Exception e) {
                                                log.error("CPU upscaling thread was interrupted due to an Exception", e);
                                        }
                                } finally {
                                        vmGroup.getLock().unlock();
                                }
                                break;
                        } else {
                                log.debug("Unable to lock the resource vmGroup!!!");
                        }
                } catch (InterruptedException e) {
                        log.error("The thread was interrupted while was getting the lock...");
                }

        }while (true);
}