Package titan.network

Class SchedulerServer

java.lang.Object
titan.network.SchedulerServer

public class SchedulerServer extends Object
The SchedulerServer class acts as the central network endpoint for the Titan distributed scheduling system. It listens for incoming client connections, processes various commands from workers and clients (e.g., registration, job submission, asset management, statistics requests), and delegates core scheduling logic to an associated Scheduler instance.

This server operates in a multithreaded manner, using an ExecutorService to handle each client connection concurrently, ensuring responsiveness and scalability.

  • Field Details

  • Constructor Details

    • SchedulerServer

      public SchedulerServer(int port, Scheduler scheduler) throws IOException
      Constructs a new SchedulerServer instance. Initializes the server to listen on the specified port and associates it with the provided Scheduler. A cached thread pool is created to manage client connections.
      Parameters:
      port - The port number on which the server will listen for incoming client connections.
      scheduler - The Scheduler instance responsible for managing workers, jobs, and system state.
      Throws:
      IOException - If an I/O error occurs when attempting to open the server socket on the specified port.
  • Method Details

    • start

      public void start()
      Starts the SchedulerServer, making it listen for incoming client connections. This method enters a continuous loop, accepting new client sockets and submitting them to the internal thread pool for asynchronous handling by the clientHandler(Socket) method. The server will continue to run until the stop() method is called or a critical error occurs. System messages are printed to indicate the server's listening status.
    • handleRegistration

      private String handleRegistration(Socket socket, String request)
      Handles a worker registration request from a client. This method parses the registration payload, extracts the worker's port, capabilities, and permanence status, and then registers the worker with the central Scheduler.
      Parameters:
      socket - The Socket object representing the client connection, used to determine the worker's host address.
      request - The payload string containing registration details, typically in the format "workerPort||capability||isPerm".
      Returns:
      A string indicating the outcome of the registration, e.g., "REGISTERED" on success or "ERROR_INVALID_REGISTRATION" on failure.
    • clientHandler

      public void clientHandler(Socket socket)
      Handles a single client connection from end-to-end. This method reads an incoming TitanProtocol.TitanPacket, dispatches it to the appropriate command processing method based on its operation code, and then sends a response TitanProtocol.TitanPacket back to the client. It uses DataInputStream and DataOutputStream for binary protocol communication. Errors during processing are caught, logged, and an error response is sent to the client.
      Parameters:
      socket - The Socket object representing the client connection to be handled.
    • handleDeployRequest

      private String handleDeployRequest(String fileName, String port, String requirement)
      Processes a request to deploy a file (e.g., a JAR or script) to a worker. It reads the specified file from the server's perm_files directory, encodes its content in Base64, and constructs a DEPLOY_PAYLOAD job. This job is then submitted to the Scheduler for execution on a suitable worker.
      Parameters:
      fileName - The name of the file to be deployed. This file must exist in the perm_files directory.
      port - The port number to associate with the deployed worker, primarily used when deploying a 'Worker.jar'. Can be null or empty.
      requirement - The capability requirement (e.g., "GPU", "GENERAL") for the worker that should execute this deployment job.
      Returns:
      A string indicating the status of the deployment job, such as "DEPLOY_QUEUED" or an error message if the file is not found or an I/O issue occurs.
    • handleRunScript

      private String handleRunScript(String fileName, String requirement)
      Handles a request to execute a script or an executable file on a worker. This method searches for the specified file recursively within the perm_files directory, encodes its content in Base64, and creates a RUN_PAYLOAD job. The job is then submitted to the Scheduler for execution on a worker matching the specified requirements.
      Parameters:
      fileName - The name of the script or executable file to be run.
      requirement - The capability requirement (e.g., "PYTHON", "GENERAL") for the worker that should execute this run job.
      Returns:
      A string indicating the status of the job, such as "JOB_QUEUED" along with the job ID, or an error message if the file is not found or an I/O issue occurs.
    • parseAndSubmitDAG

      private void parseAndSubmitDAG(String request)
      Parses a string containing multiple job definitions for a Directed Acyclic Graph (DAG) and submits each job to the Scheduler. Each job definition within the request string is expected to be separated by a semicolon (;). Jobs are parsed using Job.fromDagString(String).
      Parameters:
      request - A string containing one or more job definitions, typically in a format like "job1_def;job2_def;..." where each definition can be parsed into a Job object.
    • processCommand

      private String processCommand(TitanProtocol.TitanPacket packet)
      Dispatches incoming TitanProtocol.TitanPackets to the appropriate handler method based on their opCode. This method acts as the central command router for all operations received from clients, excluding initial worker registration. It supports a wide range of operations including deployment, script execution, system statistics, service management, job submission (single and DAG), worker control, logging, asset transfer, and key-value store interactions.
      Parameters:
      packet - The TitanProtocol.TitanPacket received from the client, containing the operation code and its payload.
      Returns:
      A response string to be sent back to the client, indicating the result of the command execution or an error message.
    • findFileRecursive

      private File findFileRecursive(String fileName)
      Searches for a specified file within the PERM_FILES_DIR and its subdirectories. This method first checks for a direct match in the root of PERM_FILES_DIR and then performs a recursive walk.
      Parameters:
      fileName - The name of the file to search for.
      Returns:
      A File object representing the found file, or null if the file is not found or an IOException occurs during the directory traversal.
    • stop

      public void stop()
      Shuts down the SchedulerServer gracefully. This method sets a flag to terminate the main server listening loop, shuts down the internal thread pool, and attempts to close the ServerSocket to release the bound port. Any IOException encountered during socket closure is caught and printed to the error stream.