Distributed Computation System SISIFO
What is SISIFO?
SISIFO is a client-server based system designed to allow a problem to be solved using distributed computations. Working in a conceptual way much like BOINC, Sisifo is able to assign tasks to a set of PCs, wait for the tasks to complete and collect the results for further analysis. However, and in contrast to BOINC (which requires a team of specialists working for weeks just to install and configure it), Sisifo is made with simplicity as main aim, giving as a result a system that requires almost no maintenance, needs very little configuration time, and can be deployed in just a couple of hours.
- Near zero maintenance efford (no database, no web server)
- Near zero installation required (copy the files in a folder, configure some paths and run)
- Can work in multiple O.S. (currently available for Windows and Linux platforms)
- Very low resources required (just a single program that may run in a low spec PC with Ethernet connection)
- Not suitable for use with non-trusted clients
- Does not handle separate users, individual statistics, forums and other Web features
- Does not handle solution redundancy and matching
- Lacks of a fancy interface
The so called "Client" is a program stored in one (or several) PC. This Client connects to the Server, and asks for a Work packet. This work packet is composed of two elements: a "Problem" (a file containing the data that defines the problema to be solved) and an "Executor" (the executable file able to solve the given problema). The Client, once the work packet is received, extracts the Executor, the Problem, and calls the Executor so it can execute, access to the Problem, read the data, solve it and store back a Solution. Once the Executor ends operation, the Client gathers the Solution and sends it back to the Server.
The so called "Server" runs in a fixed IP computer, and keeps listening for request of the Clients. The Server has stored one or more executors, a set of problems to be solved, and the solutions sent. A Server attends to a specific configured TCP/IP port, and is able to run concurrently with many other Servers, so it is possible to configure several Servers, each one attending to one port and serving different problems. The typical configuration of the Server contains, among others, the following files:
The config.cfg file looks like this:
- PUERTO_DE_ESCUCHA (Listening Port)
Is the TCP/IP port used by the Server to listen Client's requests.
- CADUCIDAD (Caducity)
Is the time in seconds the Server will wait for a given Client to solve a problem and return the solution. Should the Client fail to return the solution, the Server will re-send the problem to a different Client. This value depends on the problem to solve.
- CARPETA_EJECUTOR_WIN32 (Folder of Executor for Win32)
This is the path to the Executor compiled for Win32 (or Win64) computers.
- CARPETA_EJECUTOR_LIN32 (Folder of Executor for Linux32)
Same as previous point, for Linux computers.
- RUTA_PROBLEMAS (Path to problems)
Path to the folder where the problems are stored (each problem in one single file). Problems to be solved have extension ".pro", problems sent to Clients with solution pending have extension ".pen" and resolved problems whose solution has already been received have extension ".res"
- RUTA_SOLUCIONES (Path to solutions)
This is the path to the folder where the results are going to be stored. The name of the solution file is the same as the problema file but with extension ".sol".
- ARCHIVO_REGISTRO (Registry file)
Name of the file where the Server will store execution events (Client connections, time, date, data sent and received, etc.), usually used for debug purposes.
- ARCHIVO_TIEMPOS (Time file)
Name of the file where the Server will store the CPU time used for every Client to solve every problem. For statistic and debug purposes.
- ARCHIVO_INFORME (Report file)
Name of the file where the Server will store historic data (total CPU time and processed units). This file is parsed by an auxilary Monitor program later described.
- CRC_INCLUIDO_EN_ARCHIVO_DE_SOLUCION (Inclusion of CRC data in Solution file)
Indicates the type of checksum used in the validation of the received data.
0 -> No checksum verification of solution file
1 -> Binary checksum in solution file (deprecated)
2 -> ASCII checksum in solution file
This optional file contains a list of IP addresses used to grant or ban access to specific Computers. Its format is as follows:
The first parameter indicates if the list is for granting or for baning access. If yes, then only IPs listed there are processed and any other rejected. If no, only IPs listed there are processed and any other accepted.
The second parameter is the same but using Client's ID instead of IP.
This file is the Windows executable required to solve the problem.
The same for Linux.
It is the Sisifo Sever executable. Runs in console mode, and once started reads the config.cfg file to initialise itself and shows its status like this:
From left to right, it is shown the Client's IP, Client's ID, problems sent, solutions received, expired problems, transmission errors, and other errors. The last column to the right is the time elapsed since the last time the Client communicated with the Server.
This is an auxiliary program used to show information about pending and solved tasks, as well as other maintenance operations.
CPU and RAM allocation considerations
Each Client uses one CPU core. The more cores a PC has, the more Clients can be executed at the same time. However, it has to be considered how much RAM each problem requires when deciding how many Clients execute in a specific PC.
In our own four-core PC cluster we have the following configuration :
- 4 Clients are configured to work in port A
- 2 Clients are configured to work in port B
- 1 Client is configured to work in port C
If the problems require not so much RAM and the PCs can accommodate 4 at the same time, then Server at port A is used. If 2 problems can be accommodated in a PC at the same time, then Server at port B is used. Finally, for these problems to RAM demanding that only one can be executed in the PCs at a time, Server at port C is executed.
In order to allow Sisifo to handle an Executor application, that application has to use some conventions with respect to data input and output format.
Any project to be solved by Sisifo requires at least one executable file able to accept a problem file and give back in another file the solution to that problem. Up to two different executable files are allowed, so it is possible to use at the same time Windows and Linux clients.
Problems to solve. They have to be defined in a text file with name
being XXXXXX a number from 000000 to 999999.
The content of the file has to be as follows:
Upon start, the Executor should be able to look for any file with .pro extension, open it, read it, solve the problem, generate the solution and store it in another file. The solution file usually has the same name as the problem but with ".sol" extension. This file should be created by the Executor at the very beginning of the startup, store inside the text "ERROR" and overwrite it with the solution one it has been found. Doing this the Server can check that the solution is wrong and reject it. A typical solution file has this format:
- 'problema' is a variable containing the problem ID and its associated parameters. This is done to have together with a solution exactly the problem that produced it.
- 'solucion' is a variable containing the solution to the problem.
- 'auxiliaries' is a debug variable containing statistical information, such as executed rounds, RAM used, CPU time used, and so on.
- 'CHECKSUM' is a 8 characters long string containing the 32 bit checksum of the file contents.
Client watchdog feature
The Client has a watchdog used to know if the Executor is alive or has crashed and has to be killed. This is done using a file called "watchdog.txt". The Executor has to create this file periodically (doesn't matter the content), and the Client will check for it and delete it. If the Client realises that there is no watchdog file, it will assume that the Executor is not working and will kill it.
Where can you get Sisifo?
Sisifo is a system designed and created by a team of researchers of the IMM and the UCM. It can be provided, as well as support for its deployment, under request to the following person:
Dr. Javier Villanueva
Email jvillanueva AT pdi DOT ucm DOT es
Phone number +34 91 809 92 00 extension 210
Universidad Complutense de Madrid - CES Felipe II
Aranjuez - Madrid - Spain