BonFIRE logo and link to main BonFIRE site

Table Of Contents

Previous topic

Release Notes

Next topic

Tips and best practices for experimenting

This Page

An overview of reservation

In R4.1, BonFIRE introduced a new system allowing users to reserve resources in advance to better plan and execute their experiments. This reservation system also changed how compute resource usage is accounted for, moving away from policing resource usage based on a maximum number of cores allowed at any point in time. Compute usage is instead accounted for in terms of how much is used over time.

Commonly Cloud providers use core/hours to account for how much compute resource is used. Because BonFIRE comprises several heterogeneous sites, our system operates with a slightly different measurement; unit/hours. In principle, this is the same kind of measurement, where a unit is the smallest reservation possible at a given site (for deploying a VM), which is as low as 1/4 core. We will explain more with examples of what this means below in the resource accounting section.

BonFIRE remains flexible to use with the reservation system; users can choose to reserve resources in advance, or if you want to use the system as before, a reservation will be made automatically for you if you create an experiment on the fly. Further information below explains how the resource accounting works and details of using the reservation system.

Resource accounting

As a user of BonFIRE, it is important to understand how your resource quota works and how it relates to the reservation system. Each BonFIRE experiment has got a group with a quota for how much storage and compute resources are allowed to use, as agreed in the application process. What’s important to note is that the accounting for storage and compute resources is different!

Storage accounting: storage resources are accounted based on the number of MB used. Both the OS storage of VMs (based on the VM image used) and additional storage resources are accounted for. Each experiment group will have a quota of maximum usage at any point in time (no temporal accounting). That is, if you have 10 VMs of 2GB each, this is 20GB of the quota used when they are running. However, when the VMs are undeployed, 0GB of the quota is used.

Compute accounting: compute resources are accounted based on the number of unit/hours used. A unit is the smallest reservation possible at a given site (for deploying a VM), which is as low as 1/4 core on most sites. Each experiment group will have a quota of a number of unit/hours allowed, and over time as VMs are running, the unit/hours will accumulate.

Although the reservation system in BonFIRE operates with unit/hours, it is not very different to core/hours. There’s a simple conversion from, as a unit is defined as 1/4 of a core in BonFIRE (the smallest amount that’s possible to reserve). The amount of RAM varies depending on the physical host, as seen below.


Definition of a unit per site
Site Host Unit (CPU) Unit (RAM) Minumum reservable units
EPCC (uk-epcc) vmhost0-1 1/4 core 475MB RAM 1
  vmhost2-6 1/4 core 300MB RAM 1
Inria (fr-inria) all 1/4 core 672MB RAM 1
iMinds (be-ibbt) all 1/4 core 512MB RAM 48
PSNC (pl-psnc) all 1/4 core 256MB RAM 1

Note that for most sites you can reserve 1 unit, except for at iMinds (be-ibbt), because each VM is mapped to a physical host. Therefore, reservations on iMinds have to be in blocks of 48 units.

Let’s consider a few examples to understand how the unit system works!

Example 1: a compute with 1 core and 1GB RAM on PSNC takes up 4 units

Example 2: a compute with 1 core and 2GB RAM on PSNC takes up 8 units because of the increased RAM (1 unit is 256MB RAM)

Example 3: a compute with 2 cores and 2GB RAM on PSNC takes up 8 units as well

Example 4: a compute with 1 core and 1.5GB RAM on host vmhost3 at EPCC takes up 5 units (due to the low RAM per unit)

Example 5: a compute with 1 core and 1.5GB RAM on host vmhost0 at EPCC takes up 4 units

Example 6: a compute at iMinds takes 48 units in any case because of the mapping to physical hosts. You may still deploy a VM that would consume less virtual CPU and RAM, but the reservation system works on the host level.

As mentioned above, since R4.1, BonFIRE does not enforce any cap on the maximum number of cores you can use. It is up to the experimenter to manage their quota, and it is possible to reserve large experiments in advance provided there is free capacity. It is possible to run many small scale experiments, for example, but one must be careful with larger experiments not to waste the quota. Take the following simple calculations as an example:

If you run 4 VMs of 1 core each for 8 hours a day for 3 days a week for 4 weeks, this is 384 core/hours – equivalent to 1,536 unit/hours.

If you run 50 VMs of 4 cores each for 24 hours a day for 3 days, this is 14,400 core/hours – equivalent to 57,600 unit/hours.

Note that once a reservation is made and the start time of the reservation has begun, it is not possible to cancel this even if the actual experiment finishes early to help save the impact on the quota. This is to avoid the reservation system being exploited by creating a lot of advanced reservations “just in case” without using them, which would inconvenience other experimenters.

Reservation system introduction

The reservation system is based on OAR.

The compute resources will be modelled using the following hierarchy:

  • location (e.g. /locations/uk-epcc)
  • cluster (e.g. CoreBonfireCluster)
  • host (e.g. vmhost1)
  • allocation units of fixed chunks of core and memory

This hierarchy is visible from the OAR API, and rebuilt from OAR service for the demo code written for Seville, but not directly exposes through an API call (DM to check). The principle in OAR is that you can query the list of resources (here allocation units), and with each resources, you get the id of all elements higher up in the hierarchy as well as of each property associated to that resource.

To create a reservation a PUT request must be sent to:

https://api.bonfire-project.eu/reservations

The body of the request has XML like:

<?xml version="1.0" encoding="UTF-8"?>
  <reservation xmlns="http://api.bonfire-project.eu/doc/schemas/occi">
  <group>myGroup</group>
  <name>myReservation</name>
  <description>optional description</description>
  <walltime>PT15H</walltime>
  <starttime>2012-11-30T14:00:00Z</starttime>
  <resources>
    <!-- Resources to be filled in here - see later -->
  </resources>
</reservation>

The <group> element specifes the group that is permitted to use the reservation. The user must belong to this group.

The <name> element specifies the name of the reservation. This is simply a label that can be used to look by the reservation of need be or used to display the reservation in tools. It does not have to be a unique name.

The <description> element specifies an description of the reservation. This is optional. This is simply some text that tools can display to remind users of the aim of the reservation.

The <walltime> element specifies the duration of the reservation. It must either be in integer specifying the duration in seconds or it can be specified using the XSD duration format (ISO8601). If the XSD format is used it can easily be detected as the string will start with “P”.

The <starttime> element specifies a desired starttime for the reservation. This is optional. If the reservation cannot start at this time then the attempt to create the reservation will fail. If no start time is specified then the reservation should start as soon as possible. It must be specified using the XSD data time format . If no timezone is specified UTC is assumed.

The <resources> element describes the resources they wish to reserve. Compute resources can be described in the following ways:

  • BonFIRE instance types, including custom types
  • hosts, i.e. physical machines
  • cores

These resource types are described in more detail below. The initial response from the server will be:

201 CREATED:

<?xml version="1.0" encoding="UTF-8"?>
<reservation href="/reservations/1234" xmlns="http://api.bonfire-project.eu/doc/schemas/occi">
  <id>1234</id>
  <status>submitted</status>
  <user>fred</user>
  <group>myGroup</group>
  <name>myReservation</name>
  <description>optional description</description>
  <walltime>54000</walltime>
  <starttime>2012-11-30T14:00:00Z</starttime>
  <resources>
    <!-- Resources to be filled in here - see later -->
  </resources>
</reservation>

Note that when the reservation is in the submitted state the <starttime> element is only returned if it was specified by the user at creation time.

Later, if the resources have been reserved successfully, but the start time is not reached then the reservation goes to waiting state. In the waiting state the start time will be returned even if it was not specified by the user. Note that in waiting state the start time is an estimated start time and may change.

Code:

<?xml version="1.0" encoding="UTF-8"?>
<reservation href="/reservations/1234" xmlns="http://api.bonfire-project.eu/doc/schemas/occi">
  <id>1234</id>
  <status>waiting</status>
  <user>fred</user>
  <group>myGroup</group>
  <name>myReservation</name>
  <description>optional description</description>
  <walltime>54000</walltime>
  <starttime>2012-11-30T14:00:00Z</starttime>
  <resources>
    <!-- Resources to be filled in here - see later -->
  </resources>
</reservation>

When the reservation starts (i.e. the resources are now available for use) the status of the reservation changes to running:

<?xml version="1.0" encoding="UTF-8"?>
<reservation href="/reservations/1234" xmlns="http://api.bonfire-project.eu/doc/schemas/occi">
  <id>1234</id>
  <status>running</status>
  <user>fred</user>
  <group>myGroup</group>
  <name>myReservation</name>
  <description>optional description</description>
  <walltime>54000</walltime>
  <starttime>2012-11-30T14:00:00Z</starttime>
  <resources>
    <!-- Resources to be filled in here - see later -->
  </resources>
</reservation>

The full state model of the reservation resource has the following states:

submitted, waiting, running, terminated, error

Apart from submitted these correspond to the OAR states with the same names. The submitted state exists for reservations in the process of being posted to OAR. To cancel a reservation send a DELETE request to the resource. BonFIRE will deny delete requests to reservations using by any running experiment.

Reservations will not accept any UPDATE requests.

Instances

Instances are BonFIRE instance types. When reserving instances the caller MUST specify the location, the instance type name and the number of instances required:

<resources>
  <instance>
    <instance_type>lite</instance_type>
    <location href="/locations/uk-epcc"/>
    <count>2<count/>
  </instance>
  <instance>
    <instance_type>small</instance_type>
    <location href="/locations/fr-inria"/>
    <count>3<count/>
  </instance>
</resources>

This example requests a reservation of two lite instance types at EPCC and three small instance types at Inria.

When the reservation is in the waiting or running state details are returned:

<resources>
  <instance>
    <instance_type>lite</instance_type>
    <location href="/locations/uk-epcc" />
    <count>2<count/>
  </instance>
  <instance>
    <instance_type>small</instance_type>
    <location href="/locations/fr-inria" />
    <count>3<count/>
  </instance>
</resources>

When reserving instances users may optionally specify the actual cluster and or and host that the instances are to be placed on. For example:

<instance>
  <instance_type>small</instance_type>
  <location href="/locations/fr-inria" />
  <cluster>cluster1</cluster>
  <count>3<count/>
</instance>
  <instance>
  <instance_type>small</instance_type>
  <location href="/locations/uk-epcc" />
  <host>vmhost1</host>
  <count>3<count/>
</instance>
<instance>
<instance_type>small</instance_type>
  <location href="/locations/uk-epcc" />
  <cluster>default</cluster>
  <host>vmhost1</host>
  <count>3<count/>
</instance>

Hosts can be specified without specifying the cluster because within a single locations host name are unique.

The <cluster> and <host> elements can specify numbers rather than names. When number are used they specify the number of hosts or clusters that should be reserved for that instance type. Note that numbers specifying the number of occurence that should be created for each instance higher up the hierarchy. Thus if the user specified x clusters, y hosts and z count then they are specifying x*y*z instances: z instance on each host and y hosts on each of x clusters.

For example, to create two instance on different hosts:

<instance>
  <instance_type>small</instance_type>
  <location href="/locations/uk-epcc" />
  <host>2</host>
  <count>1<count/>
</instance>

To create two instance on the same host:

<instance>
  <instance_type>small</instance_type>
  <location href="/locations/uk-epcc" />
  <host>1</host>
  <count>2<count/>
</instance>

To create 6 instances over 2 clusters and 3 hosts on each cluster:

<instance>
  <instance_type>small</instance_type>
  <location href="/locations/uk-epcc" />
  <cluster>2</cluster>
  <host>3</host>
  <count>1<count/>
</instance>

Custom instances

Custom instances are handled in much the same way as non-custom instances except the user must specify the cpu and memory must be specified. For example:

<resources>
  <instance>
    <instance_type>custom</instance_type>
    <cpu>1</cpu>
    <memory>1024</memory>
    <location href="/locations/uk-epcc" />
    <count>2<count/>
  </instance>
</resources>

The placement of custom instances on specific clusters or hosts or specifying that instances should be on the same or different hosts is handled in the same way as for other instances.

Hosts

To request the reservation of a complete physical host the user must specify the location and can optionally specify the cluster or the host. Examples follow. To reserve 3 hosts at Inria:

<resources>
  <host>
    <location href="/locations/fr-inria" />
    <count>3<count/>
  </host>
</resources>

To reserve 3 hosts at Inria on a specific cluster:

<resources>
  <host>
    <location href="/locations/fr-inria" />
    <cluster>foo</cluster>
    <count>3<count/>
  </host>
</resources>

To reserve 3 specific hosts at Inria:

<resources>
  <host>
    <location href="/locations/fr-inria" />
    <name>foo</name>
    <count>1<count/>
  </host>
  <host>
    <location href="/locations/fr-inria" />
    <name>bar</name>
    <count>1<count/>
  </host>
  <host>
    <location href="/locations/fr-inria" />
    <name>baz</name>
    <count>1<count/>
  </host>
</resources>

Specifying that host should belong to the same or different clusters is handled in the same way as for instances. For example, to request 6 hosts, 3 on each of 2 clusters the request would be:

<resources>
  <host>
    <location href="/locations/fr-inria" />
    <cluster>2</cluster>
    <count>3<count/>
  </host>
</resources>

Cores

To request the reservation of a cores the user must specify the location and can optionally specify the cluster and the host. Examples follow.

To reserve 3 cores at Inria:

<resources>
  <core>
    <location href="/locations/fr-inria" />
    <count>3<count/>
  </core>
</resources>

To reserve 3 cores at Inria on a specific cluster:

<resources>
  <core>
    <location href="/locations/fr-inria" />
    <cluster>foo</cluster>
    <count>3<count/>
  </core>
</resources>

To reserve 1 core on each of specific hosts at Inria:

<resources>
  <core>
    <location href="/locations/fr-inria" />
    <host>foo</host>
    <count>1<count/>
  </core>
  <core>
    <location href="/locations/fr-inria" />
    <host>bar</host>
    <count>1<count/>
  </core>
  <core>
    <location href="/locations/fr-inria" />
    <host>baz</host>
    <count>1<count/>
  </core>
</resources>

Specifying that cores should belong to the same or different clusters is handled in the same way as for instances. For example, to request 6 cores, 3 on each of 2 clusters the request would be:

<resources>
  <core>
    <location href="/locations/fr-inria" />
    <cluster>2</cluster>
    <count>3<count/>
  </core>
</resources>

Using a reservation

Creating an experiment

The first thing one can do with a reservation is create an experiment that will use the resources in the reservation. To do this users must specify the reservation when creating the experiment:

<experiment xmlns="http://api.bonfire-project.eu/doc/schemas/occi">
  <name>My First Reserved Experiment </name>
  <reservation href="/reservations/524" />
  <description>Experiment description</description>
  <walltime>3600</walltime>
</experiment>

If the reservation is in running state then if the experiment walltime is not longer than the remaining reservation time the experiment got into ready state (the current first state of an experiment). If the experiment wall time is longer than the remaining reservation time the the experiment creation MUST fail. If the reservation is in waiting state then the experiment is created but itself enters the waiting state. If the reservation state is terminated or error then the creation of the experiment fails. Experiments may be created without any reservation. In which case the experiment goes directly to the ready state as is currently the case.

Examples

Example using OCCI/curl

(change the login:password, mygroup and also the date and check the reservation information):

curl -u "login:password" -kni https://api.bonfire-project.eu/reservations -H 'Content-Type: application/vnd.bonfire+xml' -X POST -d'<?xml version="1.0" encoding="UTF-8"?>
  <reservation xmlns="http://api.bonfire-project.eu/doc/schemas/occi">
    <group>mygroup</group>
    <name>myReservation</name>
    <description>optional description</description>
    <walltime>3600</walltime>
    <starttime>2014-04-22T15:00:00Z</starttime>
    <resources>
      <instance>
        <instancetype>small</instancetype>
        <location href="/locations/fr-inria" />
        <count>1</count>
      </instance>
    </resources>
  </reservation>'

This will return the reservation ID number, we will consider it being 8210 in this exemple

check reservation information:

curl -kni -u "login:password"  https://api.bonfire-project.eu/reservations/8210

create an experiment with a reservation:

Please note that due to the experiment lifecycle in BonFIRE, experiments require 10 minutes for clean up at the end of the experiment, hence an experiment walltime will have to be 11 minutes shorter than the reservation walltime (1 minute for launching delay and 10 for clean up)

Delete a reservation:

curl -kni -u "login:password" https://api.bonfire-project.eu/reservations/8210 -X DELETE)

Example using Restfully

This is an example of how to reserve and deploy a simple experiment using restfully.

The first script “create_reservation.rb” create a reservation 10 minutes after being launched and display bonfire-api answer.

The second script “use_reservation.rb” use the reservation name given in the first script to create a simple experiment.

create_reservation.rb:

#!/usr/bin/env ruby

require 'restfully'
require 'restfully/addons/bonfire'
require 'pp'
require 'time'

RES_NAME        = "TestRes"
DESCRIPTION     = "Reservation - #{Time.now.to_s}"
GROUP           = "mygroup"
VM_INSTANCE     = "small"
VM_LOCATION     = "fr-inria"
RES_WALLTIME    = 3600   # ISO8601 time intervals are also accepted, exemple: "PT1H"
RES_STARTTIME   = (Time.now + 10*60).to_datetime.rfc3339   # Timezones are important

logger = Logger.new(STDOUT)
logger.level = Logger::WARN

session = Restfully::Session.new(
  :configuration_file => "~/.restfully/api.production.bonfire.grid5000.fr.yml",
  :gateway            => "ssh.bonfire.grid5000.fr",
  :keys               => ["~/.ssh/id_rsa"],
  :cache              => false,
  :logger             => logger
)

begin
  puts "\n--> Restfully script started at: #{Time.now}"
  puts "\n--> Create reservation starting at: #{RES_STARTTIME} for #{RES_WALLTIME}"
  # Create shortcuts for location resources:
  vm_location = session.root.locations[:"#{VM_LOCATION}"]
  fail "Can't select the location" if vm_location.nil?

  res = session.root.reservations.submit(
    :group        => GROUP,
    :name         => RES_NAME,
    :description  => DESCRIPTION,
    :walltime     => RES_WALLTIME,
    :starttime    => RES_STARTTIME,
    :resources    => [{
      :instance => {
        :instance_type => VM_INSTANCE,
        :location      => vm_location,
        :count         => 1,
      },
    }],
  )

  puts "\n--> Reservation object:"
  pp res
  puts
end

The second script collect the starttime and walltime from the reservation using the provided reservation name, wait for the starttime and launch an experiment with a walltime reduced of 11 minutes (see the documentation above for details)

use_reservation.rb:

#!/usr/bin/env ruby

require 'restfully'
require 'restfully/addons/bonfire'
require 'pp'
require 'time'

RES_NAME        = "TestRes"
EXPERIMENT_NAME = "ExpWithRes_#{Time.now.to_i}"
DESCRIPTION     = "Experiment using Reservation - #{Time.now.to_s}"
GROUP           = "mygroup"
VM_IMAGE        = "BonFIRE Debian Squeeze v6"
VM_INSTANCE     = "small"
VM_WAN          = "BonFIRE WAN"
VM_LOCATION     = "fr-inria"
VM_NAME         = "VMTestRes"

logger = Logger.new(STDOUT)
logger.level = Logger::WARN

session = Restfully::Session.new(
  :configuration_file => "~/.restfully/api.production.bonfire.grid5000.fr.yml",
  :gateway            => "ssh.bonfire.grid5000.fr",
  :keys               => ["~/.ssh/id_rsa"],
  :cache              => false,
  :logger             => logger
)
exp = nil

begin
  # Create shortcuts for location resources:
  vm_location = session.root.locations[:"#{VM_LOCATION}"]
  fail "Can't select the location" if vm_location.nil?

  res = session.root.reservations[:"#{RES_NAME}"]

  puts "\n--> Reservation object:"
  pp res
  puts

  time_res = Time.parse(res["starttime"])

  puts "\n --> Waiting for scheduled time"
  sleep(time_res - Time.now)

  until ['Running', 'running'].include?(res.reload['status'])
    fail "the Reservation has failed" if res['status'] == 'Error'
    puts "\n Reservation is not ready yet ..."
    sleep 5
  end

  exp = session.root.experiments.submit(
    :name        => EXPERIMENT_NAME,
    :groups      => GROUP,
    :description => DESCRIPTION,
    :walltime    => res["walltime"].to_i - 11*60,  # removing 11 minutes to comply with BonFIRE lifecycle
    :reservation => res['id']
  )

  puts "\n--> Creating the compute at #{VM_LOCATION} ..."

  vm = exp.computes.submit(
    :name          => VM_NAME,
    :instance_type => VM_INSTANCE,
    :groups        => GROUP,
    :disk          => [{
      :storage => vm_location.storages.find{|s|
        s['name'] == VM_IMAGE
      },
      :type    => "OS"
    }],
    :nic           => [
      {:network => vm_location.networks.find{|n| n['name'] == VM_WAN}}
    ],
    :location      => vm_location
  )

  # Pass the experiment status to running.
  exp.update(:status => "running")

  # Wait until all VMs are ACTIVE or RUNNING and ssh-able.
  # Fail if one of them has FAILED.
  until ( ['RUNNING', 'ACTIVE'].include?(vm.reload['state']) && vm.ssh.accessible? )
    fail "the VM has failed" if vm['state'] == 'FAILED'
    puts "\n At least one of the computes is not ready. Waiting ..."
    sleep 20
  end

  puts "\n--> The computes are now READY!"
  puts "  #{vm['name']} IP: #{vm['nic'][0]['ip']}"
  puts "testing ssh"
  vm.ssh do |ssh|
    puts "\n Launching commands on #{vm['name']} ..."
    commands = [
      "echo Hello World",
    ]
    command_line = commands * " && "
    output = ssh.exec!(command_line)
    session.logger.warn output
  end
  puts "\n--> Done!"

  puts "\n--> Experiment object:"
  pp exp
  puts

  puts "\n--> VM object:"
  pp vm
  puts

  puts "\n--> Press 'Enter' to delete the experiment and exit ..."
  gets
  exp.delete unless exp.nil?
  puts "\n--> Press 'Enter' to delete the reservation and exit ..."
  gets
  res.delete unless res.nil?
end