AMiCO

HELP: amico-troubles@fisica.unimi.it
MANAGEMENT: amico-resp@fisica.unimi.it
HTCONDOR: condor-troubles@mi.infn.it
ACCOUNT
JOBS SUBMISSION
CLUSTERS OVERVIEW
HOWTO
USE CASE
CLUSTERS DETAILS
Acknowledge the use of AMiCO

AMiCO is “Apparato MIlanese per il Calcolo Opportunistico”
Implemented by Dipartimento di Fisica and INFN, AMiCO is a project aiming to federate heterogeneous computing clusters at Physics Department of Università degli Studi and INFN Milano.
Clusters can

  • spill out their excess jobs to unused resources when they are overloaded
  • preserving the possibility for cluster owners to preempt alien jobs.

We use the starting and flocking features of HTCondor. We added support for:

  • dynamic slots
  • parallel scheduling
  • Docker jobs

For jobs needing data access while running outside their home cluster we provide:

  • a CEPH readable/writeable storage accessible via S3 on RADOS gateway
  • CVMFS, mounted on worker nodes and inside Docker containers.

ACCOUNT: requirement and howto request access

You need to have an “idefix” (INFN – Sezione di Milano) or an UNIMI account (@ unimi.it or @ studenti.unimi.it). An “idefix” account request must be forwarded to the INFN Administration: (on this web site) follow menu ACCOUNTING E POSTA, menu item RICHIESTA DI ACCOUNT.
To request access to submit jobs with HTCondor you can

  • contact one of the following Cluster managers:
    Leonardo Carminati
    Particle Physics –> David Rebatto
    Nuclear Physics –> Giovanna Benzoni
    Cosmology-LSS (Large-Scale Structure) –> Luigi Guzzo
    Theoretical Physics –> Alessandro Vicini / Marco Zaro
    Department’s cluster (“MAGI”) –> Franco Leveraro
  • Undergraduates and graduate students and guests without a research group manager can ask to amico-troubles@mi.infn.it (specifying email, role and research activities) an access authorization to the clusters “MAGI” (network point of entry: “gaspare”) with a user-quota of 50 GB and a storage quota on CEPH.

SUBMISSION of the JOBS:

TUTORIAL:
AMiCO: practical introduction di Francesco Prelz (INFN – MI)
AMiCO: special cases di Francesco Prelz (INFN – MI)
Shortly…
The computing resources are organised in a number of privately owned and operated computing clusters.
Inside each cluster, one “head node” is usually charged with co-ordinating the cluster, and sometimes also acts as a single network point of entry.
Other nodes in the cluster execute jobs (“execution nodes”). In the AMiCO’s infrastructure, all executing nodes in any cluster can communicate directly over the local area network.
In the TABLE below we’ll shortly take an overall look at the list of the available clusters. Just an idea about the computer power.
Nodes where jobs are submitted and queued are called “submit nodes”.
Typically, users who need to submit jobs share some interests with cluster owners, so they have priority access to some cluster.
Interactive execution and (possibly) various batch systems are used to organise the workload in each cluster.
Typically with less than 100% resource occupancy.
AMiCO wants to be friendly to local cluster owners, and will suspend, then migrate jobs, when local workload appears.
Current default policy:
Suspend after 2 minutes of local activity.
Vacate and migrate if the job cannot be restarted within 10 minutes.
An upper-tier service (or “Central Manager“, codename: superpool-cm) matching available computing resources with pending job requests can compensate load peaks across clusters and increase goodput.
The semantics of this resource sharing service is opportunistic: HTCondor is a specialised solution for this scenario.
If HTCondor is also used as a local cluster ‘batch system’, then local and AMiCO’s jobs can be handled in a uniform way.
This scenario cannot be serviced with any number of FIFO (first-in-first-out) queues.

Main characteristics of the AMICO’s federate CLUSTER

Pool nameNodesTotal memoryTotal cpusTotal diskMax memoryMax cpusMax disk
doraemon4201G602262G62G241074G
proof-pool91171G2401054G755G40323G
magi-pool3312G160406G125G64141G
teor-pool542989G308842982G125G1281815G
gamma-pool1126G322G126G322G

Pool nameResearch groupHead node name
(central manager)
doraemonCosmology-LSS
proof-poolHEP - ATLAS
magi-poolDepartment of Physicsgaspare
teor-poolTheoretical Physics
gamma-poolNuclear Physics

HOWTO:

  • 1) HOWTO AMICO
    SLIDE di Francesco Prelz (INFN – MI)
    1a) Conceptual introduction: Distributed computing and storage, Opportunistic computing Practical introduction: available tools, AMICO distributed storage (CEPH), AMICO distributed computing (HTCondor).
    Job examples: “Hello world”, File transfer via sandbox, Multiple/parametric job submission and control, File access via Object Storage, Script submission, Object Storage file staging, Interactive Jobs
    AMiCO: practical introduction
    1b) More complex cases: Common dependences and how to require them, Docker and HTCondor, Parallel jobs (MPI)
    AMiCO: special cases
  • 2) HOWTO HTCondor
    HTCondor – web site:
    http://research.cs.wisc.edu/htcondor/
    HTCondor – readthedocs:
    https//htcondor.readthedocs.io/en/latest/
    Howto submit, monitor and manage a job by Miguel Villaplana (Dip. di Fisica e INFN – MI): howto_condor.pdf (Sept. 2017 – download PDF)
  • 3) An overview: POSTER di David Rebatto (INFN – MI)
  •  

Use Case:

Clusters details:

  • CLUSTER GALILEO: page with some info HW / SW by LCM’s admins (LCM: “Laboratorio di Calcolo & Multimedia”, Department of Physics, UNIMI).

Acknowledge the use of AMiCO:

If you wish to acknowledge the use of AMiCO in a paper or report we suggest the following text:
We gratefully acknowledge the computing resources provided by AMiCO (http://amico.fisica.unimi.it, http://amico.mi.infn.it), an opportunistic resource cluster operated by the IT service of the Physics Department of Università degli Studi and INFN Milano – Italy.

AMiCO
PROJECT MANAGEMENT: Leonardo Carminati
TEAM: David Rebatto, Franco Leveraro, Francesco Prelz
Collaborators: Francesca Milanini
Previous collaborators: Paolo Salvestrini, Miguel Villaplana