AMiCO is “Apparato MIlanese per il Calcolo Opportunistico”
Implemented by Dipartimento di Fisica and INFN, AMiCO is a project aiming to federate heterogeneous computing clusters at Physics Department of Università degli Studi and INFN Milano.
- spill out their excess jobs to unused resources when they are overloaded
- preserving the possibility for cluster owners to preempt alien jobs.
- dynamic slots
- parallel scheduling
- Docker jobs
- a CEPH readable/writeable storage accessible via S3 on RADOS gateway
- CVMFS, mounted on worker nodes and inside Docker containers.
ACCOUNT: requirement and howto request accessYou need to have an “idefix” (INFN – Sezione di Milano) or an UNIMI account (@ unimi.it or @ studenti.unimi.it). An “idefix” account request must be forwarded to the INFN Administration: (on this web site) follow menu ACCOUNTING E POSTA, menu item RICHIESTA DI ACCOUNT.
To request access to submit jobs with HTCondor you can
- contact one of the following Cluster managers:
Physics of Matter –> Nicola Manini / Davide Galli
Cosmology-CMB (Cosmic Microwave Back-ground radiation) –> Davide Maino
Cosmology-LSS (Large-Scale Structure) –> Luigi Guzzo
Theoretical Physics –> Alessandro Vicini
Department’s cluster (“MAGI”) –> Giovanni Onida
- Undergraduates and graduate students and guests without a research group manager can ask to email@example.com (specifying email, role and research activities) an access authorization to the clusters “MAGI” (network point of entry: “gaspare”) with a user-quota of 50 GB and a storage quota on CEPH.
SUBMISSION of the JOBS:TUTORIAL:
AMiCO: practical introduction di Francesco Prelz (INFN – MI)
AMiCO: special cases di Francesco Prelz (INFN – MI)
The computing resources are organised in a number of privately owned and operated computing clusters.
Inside each cluster, one “head node” is usually charged with co-ordinating the cluster, and sometimes also acts as a single network point of entry.
Other nodes in the cluster execute jobs (“execution nodes”). In the AMiCO’s infrastructure, all executing nodes in any cluster can communicate directly over the local area network.
In the TABLE below we’ll shortly take an overall look at the list of the available clusters. Just an idea about the computer power.
Nodes where jobs are submitted and queued are called “submit nodes”.
Typically, users who need to submit jobs share some interests with cluster owners, so they have priority access to some cluster.
Interactive execution and (possibly) various batch systems are used to organise the workload in each cluster.
Typically with less than 100% resource occupancy.
AMiCO wants to be friendly to local cluster owners, and will suspend, then migrate jobs, when local workload appears.
Current default policy:
Suspend after 2 minutes of local activity.
Vacate and migrate if the job cannot be restarted within 10 minutes.
An upper-tier service (or “Central Manager“, codename: superpool-cm) matching available computing resources with pending job requests can compensate load peaks across clusters and increase goodput.
The semantics of this resource sharing service is opportunistic: HTCondor is a specialised solution for this scenario.
If HTCondor is also used as a local cluster ‘batch system’, then local and AMiCO’s jobs can be handled in a uniform way.
This scenario cannot be serviced with any number of FIFO (first-in-first-out) queues.
Main characteristics of the AMICO’s federate CLUSTER
|Pool name||Nodes||Total memory||Total cpus||Total disk||Max memory||Max cpus||Max disk|
|Pool name||Research group||Head node name
|proof-pool||HEP - ATLAS|
|magi-pool||Department of Physics||gaspare|
- 1) HOWTO AMICO
SLIDE di Francesco Prelz (INFN – MI)
1a) Conceptual introduction: Distributed computing and storage, Opportunistic computing Practical introduction: available tools, AMICO distributed storage (CEPH), AMICO distributed computing (HTCondor).
Job examples: “Hello world”, File transfer via sandbox, Multiple/parametric job submission and control, File access via Object Storage, Script submission, Object Storage file staging, Interactive Jobs
AMiCO: practical introduction
1b) More complex cases: Common dependences and how to require them, Docker and HTCondor, Parallel jobs (MPI)
AMiCO: special cases
- 2) HOWTO HTCondor
HTCondor – web site:
HTCondor – readthedocs:
Howto submit, monitor and manage a job by Miguel Villaplana (Dip. di Fisica e INFN – MI): howto_condor.pdf (Sept. 2017 – download PDF)
- 3) An overview: POSTER di David Rebatto (INFN – MI)
- Guide for future users of the Milan computational cluster
(version 2, October 2019) by courtesy of Enrico Ragusa and Benedetta Veronesi
- CLUSTER ETSFMI: page about ETSFMI by Nicola Manini (Department of Physics – UNIMI)
- CLUSTER LAGRANGE: page about LAGRANGE by Paolo Salvestrini (CNR – MI)
- CLUSTER HEISENBERG: page with some info HW / SW about HEISENBERG by LCM’s admins (LCM: “Laboratorio di Calcolo & Multimedia”, Department of Physics, UNIMI).
PROJECT MANAGEMENT: Leonardo Carminati, Laura Perini
TEAM: Franco Leveraro, Francesco Prelz, David Rebatto, Paolo Salvestrini
Collaborators: Miguel Villaplana, Francesca Milanini