Accounting Service

The aim of the Accounting Service is to gather all the events for Experiments and specific resource types (Computes, Storages, Networks, SiteLinks, and Federica Networks). Via an easy API this information can be requested and aggregated, allowing experimenters to see what resources were utilized by a particular experiment, user, or group. This information can be requested in a JSON format or as XML.

The raw data, available in the Accounting DB, lies at the basis of the weekly AC6 usage reports.

APIs provided

  • The Accounting Service (AS) provides the following REST API.

    This API is accessible via the Portal:

    • /accounting?path=/experiments/{experiment-id}

      Retrieves the used resources for the Experiment with ID {experiment-id}. It returns the following XML:

      <?xml version="1.0" encoding="UTF-8"?>
      <experiment id="/experiments/{id}">
        <sumCpu>{total number of CPU*seconds used in this Experiment}</sumCpu>
        <sumVcpu>{idem for VCPU*seconds}</sumVcpu>
        <sumMemory>{idem for memory (in MB)*seconds}</sumMemory>
        <sumStorage>
               {sum of all the storage sizes (in MB), both osimages and disks, used by the computes in this
               Experiment times (*) the time the compute is used or up until now if the computes
               is still being used}
        </sumStorage>
        <computes>
              <compute id="{full compute id}" inUse="true|false">
                <cpu>{number of CPU*seconds as defined for this compute}</cpu>
                <vcpu>{idem for VCPU*seconds}</vcpu>
                <memory>{idem for memory (in MB)*seconds}</memory>
                <sumStorage>{sum of all the storage sizes (in MB), both osimages and disks, used by this compute}</sumStorage>
                <createdTime>{epoch timestamp}</createdTime>
                <endTime>{epoch timestamp}</endTime>
                <storages>
                      <storage id="{full storage id}">
                        <size>{size of this storage (in MB)}</size>
                      </storage>
                      ...
                </storages>
              </compute>
              ...
        </computes>
      </experiment>
      
    • /accounting?path=/groups/{group name}

      Get the used resources for a specific group ({group_name}) based on the experiments created by that group. It returns a collection of experiment elements:

      <collection>
        <experiment id="..."> ... </experiment>
        ...
      </collection>
      
    • /accounting?path=/users/{user_name}

      Get the used resources for a specific user ({user_name}) based on the experiments created by that user. Updates made by other users are also taken into account. It also returns a collection of experiment elements:

      <collection>
        <experiment id="..."> ... </experiment>
        ...
      </collection>
      
    • /accgraph?id={id_list}&start={start_time}&end={end_time}&period={period}&path={path}

      Get the used resources in JSON format. Which type of resources is defined using the {id_list}. This is a comma separated list with possible values: C (include CPU), M (include Memory), S (include Storage). E.g. id=S,M includes the values for Storage usage and Memory usage, but not CPU. The id is optional, when it is not provided all resource types are shown. start and end are EPOCH timestamps that bound the returned values. The period, formatted as XmlSchema duration, shows the interval on which data samples are provided in the JSON response. The path is either: /experiments/{experiment_id}, /user/{user_name}, or /group/{group_name}. As mentioned, the response is provided as a JSON formatted text that follows the Google’s Chart Tools Datasource Protocol, e.g.

      {"cols":[
              {"id":"t","label":"Time","type":"datetime","pattern":""},
              {"id":"S","label":"Storage","type":"number","pattern":""},
              {"id":"C","label":"CPU","type":"number","pattern":""},
              {"id":"M","label":"Memory","type":"number","pattern":""}],
       "rows":[
              {"c":[{"v":new Date(2013,8,13,7,37,37)},{"v":0.0},{"v":0.4845833333333333},{"v":248.10666666666665}]},
              {"c":[{"v":new Date(2013,8,13,8,37,37)},{"v":0.0},{"v":0.5},{"v":256.0}]}],
       "p":{"foo":"Bon","bar":"FIRE"}
      }
      
  • Not strictly speaking an API, but the Accounting Service regularly takes an SQL Dump (see MySQL dump) of the Accounting DB. The resulting file is sent to a configurable list of e-mail addresses. This is configured using Puppet as well as the interval of the SQL Dump.

Source code location

All code is maintained in the BonFIRE SVN.

APIs used

The Accounting Service only uses the Resource Manager (RM)’s BonFIRE API to check for potential Zombie Experiments and Zombie Computes. Zombie Experiments are Experiments that are finished or should have been finished according to the RM’s internal state. The same holds for Zombie Computes.

Message queue use

The Accounting Service only reads from the MMQ, using its own credentials to connect to the ResourceUsage Exchange. Currently it listens to the following topics:
  • res-mng.experiment.#: All the Experiment events comming from the Resource Manager.
  • {site}.#: All the site events for the resource specific events.
  • sfa-adapt.#,abahn-adapt.#,aws-adapt.#: All the Enactor adapters.

Implementation details

The following picture shows how the Accounting Service is connected to its depending BonFIRE central entities.

../_images/AS_arch.png
../_images/AS_data.png
When the MMQ consumer receives an event, it forwards, depending on whether it is a create, delete or state event, the event to the correct handler. For these handlers we use the Visitor pattern.
  • For a create event a new resources or experiment record is created in the DB - all attributes are filled in except for the endTime. Next to this, a resourceLog entry is put in the DB, providing the resource’s attribute and stating it as create log.
  • Receiving a delete event only changes the endTime attribute in the correct experiment or resource record.
  • The state event handler maintains its own table (statelog). For each entity and for each state change, a log entry is kept. The state provided in the event is divided in a generic part (generic), which can be found in the MMQ section, and a site specific part (site).
As already mentioned the Zombie Detection mechanism detects Experiments or Computes that ought to be already deleted according to the Resource Manager’s internal state. It is unfortunately possible that at some point, the termination events for the Experiment or Resources were not picked up by the Accounting Service. This can be caused by malfunction of:
  • the Accounting Service (e.g. disconnect from the MMQ, restart of the service, or crash of the service)
  • the sites not putting the event on the MMQ (e.g. caused by manual overhaul of the system, etc.)
  • the MMQ (e.g. network problems, restart of the service, etc.)

To at least flag these instances, we added the Zombie detection mechanism for both Experiments and Computes. This mechanism is scheduled to run at 2 AM for the Experiments and 3 AM for the Computes. Currently this scheduling is hard-coded via JAVA annotations. If due to some unknown reason, the service would be down during these scheduled times, the mechanism is still executed when the service is back restarted. Generally speaking, Zombie detection simply checks the Accounting Service’s state of the entity with the state kept by the RM. For example the Experiment detection, for each Experiment that has no endTime we do a GET on the RM. If according to the RM, the Experiment is terminated, we have a Zombie Experiment. Then we try to, as accurately as possible, fill in the endTime for that Experiment. We are not changing the Accounting DB Experiment’s state, because otherwise we would loose the knowledge that we are dealing with a Zombie Experiment. This information is stored in separate tables. Determining the endTime is easy, because the RM provides this information in its GET response (unless the Experiment was not found in the RM’s DB, then we use the time we detected the Zombie as endTime). For Computes this info is not stored in the RM’s history, so we always use the time of detection. We always add the reason why a particular entity was flagged as a Zombie to the Accounting Service’s DB, since this can help in tracing back the origin of the Zombies.

Table Of Contents

Previous topic

Message Queues

Next topic

Authorization Service

This Page