HELP: amico-troubles@fisica.unimi.it
MANAGEMENT: amico-resp@fisica.unimi.it
HTCONDOR: condor-troubles@mi.infn.it
ACCOUNT
JOBS SUBMISSION
CLUSTERS OVERVIEW
HOWTO
USE CASE
CLUSTERS DETAILS
Acknowledge the use of AMiCO
AMiCO is “Apparato MIlanese per il Calcolo Opportunistico”
Implemented by Dipartimento di Fisica and INFN, AMiCO is a project aiming to federate heterogeneous computing clusters at Physics Department of Università degli Studi and INFN Milano.
Clusters can
- spill out their excess jobs to unused resources when they are overloaded
- preserving the possibility for cluster owners to preempt alien jobs.
We use the starting and flocking features of HTCondor. We added support for:
- dynamic slots
- parallel scheduling
- Docker jobs
For jobs needing data access while running outside their home cluster we provide:
- a CEPH readable/writeable storage accessible via S3 on RADOS gateway
- CVMFS, mounted on worker nodes and inside Docker containers.
ACCOUNT: requirement and howto request access
You need to have an “idefix” (INFN – Sezione di Milano) or an UNIMI account (@ unimi.it or @ studenti.unimi.it). An “idefix” account request must be forwarded to the INFN Administration: (on this web site) follow menu ACCOUNTING E POSTA, menu item RICHIESTA DI ACCOUNT.
To request access to submit jobs with HTCondor you can
- contact one of the following Cluster managers:
Leonardo Carminati
Particle Physics –> David Rebatto
Nuclear Physics –> Giovanna Benzoni
Cosmology-LSS (Large-Scale Structure) –> Luigi Guzzo
Theoretical Physics –> Alessandro Vicini / Marco Zaro
Department’s cluster (“MAGI”) –> Franco Leveraro - Undergraduates and graduate students and guests without a research group manager can ask to amico-troubles@mi.infn.it (specifying email, role and research activities) an access authorization to the clusters “MAGI” (network point of entry: “gaspare”) with a user-quota of 50 GB and a storage quota on CEPH.
SUBMISSION of the JOBS:
TUTORIAL:
AMiCO: practical introduction di Francesco Prelz (INFN – MI)
AMiCO: special cases di Francesco Prelz (INFN – MI)
Shortly…
The computing resources are organised in a number of privately owned and operated computing clusters.
Inside each cluster, one “head node” is usually charged with co-ordinating the cluster, and sometimes also acts as a single network point of entry.
Other nodes in the cluster execute jobs (“execution nodes”). In the AMiCO’s infrastructure, all executing nodes in any cluster can communicate directly over the local area network.
In the TABLE below we’ll shortly take an overall look at the list of the available clusters. Just an idea about the computer power.
Nodes where jobs are submitted and queued are called “submit nodes”.
Typically, users who need to submit jobs share some interests with cluster owners, so they have priority access to some cluster.
Interactive execution and (possibly) various batch systems are used to organise the workload in each cluster.
Typically with less than 100% resource occupancy.
AMiCO wants to be friendly to local cluster owners, and will suspend, then migrate jobs, when local workload appears.
Current default policy:
Suspend after 2 minutes of local activity.
Vacate and migrate if the job cannot be restarted within 10 minutes.
An upper-tier service (or “Central Manager“, codename: superpool-cm) matching available computing resources with pending job requests can compensate load peaks across clusters and increase goodput.
The semantics of this resource sharing service is opportunistic: HTCondor is a specialised solution for this scenario.
If HTCondor is also used as a local cluster ‘batch system’, then local and AMiCO’s jobs can be handled in a uniform way.
This scenario cannot be serviced with any number of FIFO (first-in-first-out) queues.
Main characteristics of the AMICO’s federate CLUSTER
Pool name | Nodes | Total memory | Total cpus | Total disk | Max memory | Max cpus | Max disk |
---|---|---|---|---|---|---|---|
doraemon | 4 | 201G | 60 | 2262G | 62G | 24 | 1074G |
proof-pool | 9 | 1171G | 240 | 1054G | 755G | 40 | 323G |
magi-pool | 3 | 312G | 160 | 406G | 125G | 64 | 141G |
teor-pool | 54 | 2989G | 3088 | 42982G | 125G | 128 | 1815G |
gamma-pool | 1 | 126G | 32 | 2G | 126G | 32 | 2G |
Pool name | Research group | Head node name (central manager) |
---|---|---|
doraemon | Cosmology-LSS | |
proof-pool | HEP - ATLAS | |
magi-pool | Department of Physics | gaspare |
teor-pool | Theoretical Physics | |
gamma-pool | Nuclear Physics |
HOWTO:
- 1) HOWTO AMICO
SLIDE di Francesco Prelz (INFN – MI)
1a) Conceptual introduction: Distributed computing and storage, Opportunistic computing Practical introduction: available tools, AMICO distributed storage (CEPH), AMICO distributed computing (HTCondor).
Job examples: “Hello world”, File transfer via sandbox, Multiple/parametric job submission and control, File access via Object Storage, Script submission, Object Storage file staging, Interactive Jobs
AMiCO: practical introduction
1b) More complex cases: Common dependences and how to require them, Docker and HTCondor, Parallel jobs (MPI)
AMiCO: special cases - 2) HOWTO HTCondor
HTCondor – web site:
http://research.cs.wisc.edu/htcondor/
HTCondor – readthedocs:
https//htcondor.readthedocs.io/en/latest/
Howto submit, monitor and manage a job by Miguel Villaplana (Dip. di Fisica e INFN – MI): howto_condor.pdf (Sept. 2017 – download PDF) - 3) An overview: POSTER di David Rebatto (INFN – MI)
Use Case:
- Guide for future users of the Milan computational cluster
(version 2, October 2019) by courtesy of Enrico Ragusa and Benedetta Veronesi
Clusters details:
- CLUSTER GALILEO: page with some info HW / SW by LCM’s admins (LCM: “Laboratorio di Calcolo & Multimedia”, Department of Physics, UNIMI).
Acknowledge the use of AMiCO:
If you wish to acknowledge the use of AMiCO in a paper or report we suggest the following text:
We gratefully acknowledge the computing resources provided by AMiCO (http://amico.fisica.unimi.it, http://amico.mi.infn.it), an opportunistic resource cluster operated by the IT service of the Physics Department of Università degli Studi and INFN Milano – Italy.
AMiCO PROJECT MANAGEMENT: Leonardo Carminati TEAM: David Rebatto, Franco Leveraro, Francesco Prelz Collaborators: Francesca Milanini Previous collaborators: Paolo Salvestrini, Miguel Villaplana