Batch Processing with Spring Cloud Data Flow

1. Overview

In the first article of the series, we introduced Spring Cloud Data Flow‘s architectural component and how to use it to create a streaming data pipeline.

As opposed to a stream pipeline, where an unbounded amount of data is processed, a batch process makes it easy to create short-lived services where tasks are executed on demand.

2. Local Data Flow Server and Shell

The Local Data Flow Server is a component that is responsible for deploying applications, while the Data Flow Shell allows us to perform DSL commands needed for interacting with a server.

In the previous article, we used Spring Initilizr to set them both up as a Spring Boot Application.

After adding the @EnableDataFlowServer annotation to the server’s main class and the @EnableDataFlowShell annotation to the shell’s main class respectively, they are ready to be launched by performing:

mvn spring-boot:run

The server will boot up on port 9393 and a shell will be ready to interact with it from the prompt.

You can refer to the previous article for the details on how to obtain and use a Local Data Flow Server and its shell client.

3. The Batch Application

As with the server and the shell, we can use Spring Initilizr to set up a root Spring Boot batch application.

After reaching the website, simply choose a Group, an Artifact name and select Cloud Task from the dependencies search box.

Once this is done, click on the Generate Project button to start downloading the Maven artifact.

The artifact comes preconfigured and with basic code. Let’s see how to edit it in order to build our batch application.

3.1. Maven Dependencies

First of all, let’s add a couple of Maven dependencies. As this is a batch application, we need to import libraries from the Spring Batch Project:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch</artifactId>
</dependency>

Also, as the Spring Cloud Task uses a relational database to store results of an executed task, we need to add a dependency to an RDBMS driver:

<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
</dependency>

We’ve chosen to use the H2 in-memory database provided by Spring. This gives us a simple method of bootstrapping development. However, in a production environment, you’ll want to configure your own DataSource.

Keep in mind that artifacts’ versions will be inherited from Spring Boot’s parent pom.xml file.

3.2. Main Class

The key point to enabling desired functionality would be to add the @EnableTask and @EnableBatchProcessing annotations to the Spring Boot’s main class. This class level annotation tells Spring Cloud Task to bootstrap everything:

@EnableTask
@EnableBatchProcessing
@SpringBootApplication
public class BatchJobApplication {

    public static void main(String[] args) {
        SpringApplication.run(BatchJobApplication.class, args);
    }
}

3.3. Job Configuration

Lastly, let’s configure a job – in this case a simple print of a String to a log file:

@Configuration
public class JobConfiguration {

    private static Log logger
      = LogFactory.getLog(JobConfiguration.class);

    @Autowired
    public JobBuilderFactory jobBuilderFactory;

    @Autowired
    public StepBuilderFactory stepBuilderFactory;

    @Bean
    public Job job() {
        return jobBuilderFactory.get("job")
          .start(stepBuilderFactory.get("jobStep1")
          .tasklet(new Tasklet() {
            
              @Override
              public RepeatStatus execute(StepContribution contribution, 
                ChunkContext chunkContext) throws Exception {
                
                logger.info("Job was run");
                return RepeatStatus.FINISHED;
              }
        }).build()).build();
    }
}

Details on how to configure and define a job are outside the scope of this article. For more information, you can see our Introduction to Spring Batch article.

Finally, our application is ready. Let’s install it inside our local Maven repository. To do this cd into the project’s root directory and issue the command:

mvn clean install

Now it’s time to put the application inside the Data Flow Server.

4. Registering the Application

To register the application within the App Registry we need to provide a unique name, an application type, and a URI that can be resolved to the app artifact.

Go to the Spring Cloud Data Flow Shell and issue the command from the prompt:

app register --name batch-job --type task 
  --uri maven://com.maixuanviet.spring.cloud:batch-job:jar:0.0.1-SNAPSHOT

5. Creating a Task

A task definition can be created using the command:

task create myjob --definition batch-job

This creates a new task with the name myjob pointing to the previously registeredbatch-job application .

A listing of the current task definitions can be obtained using the command:

task list

6. Launching a Task

To launch a task we can use the command:

task launch myjob

Once the task is launched the state of the task is stored in a relational DB. We can check the status of our task executions with the command:

task execution list

7. Reviewing the Result

In this example, the job simply prints a string in a log file. The log files are located within the directory displayed in the Data Flow Server’s log output.

To see the result we can tail the log:

tail -f PATH_TO_LOG\spring-cloud-dataflow-2385233467298102321\myjob-1472827120414\myjob
[...] --- [main] o.s.batch.core.job.SimpleStepHandler: Executing step: [jobStep1]
[...] --- [main] o.b.spring.cloud.JobConfiguration: Job was run
[...] --- [main] o.s.b.c.l.support.SimpleJobLauncher:
  Job: [SimpleJob: [name=job]] completed with the following parameters: 
    [{}] and the following status: [COMPLETED]

8. Conclusion

In this article, we have shown how to deal with batch processing through the use of Spring Cloud Data Flow.

The example code can be found in the GitHub project.

Related posts:

Hướng dẫn sử dụng Java Generics
Implementing a Runnable vs Extending a Thread
Java InputStream to String
How to Define a Spring Boot Filter?
Deploy a Spring Boot WAR into a Tomcat Server
Spring Data – CrudRepository save() Method
Mệnh đề if-else trong java
Java Program to Perform Searching Using Self-Organizing Lists
HTTP Authentification and CGI/Servlet
Java Program to Implement the Schonhage-Strassen Algorithm for Multiplication of Two Numbers
Spring Data Reactive Repositories with MongoDB
Java Program to Find Second Smallest of n Elements with Given Complexity Constraint
Java Program to Perform Partition of an Integer in All Possible Ways
Control Structures in Java
Custom Thread Pools In Java 8 Parallel Streams
Apache Camel with Spring Boot
Java Program to Implement ScapeGoat Tree
Copy a List to Another List in Java
Removing all duplicates from a List in Java
Java Program to Implement Flood Fill Algorithm
Using a Spring Cloud App Starter
Java Program to Perform the Sorting Using Counting Sort
Java Program to Implement Binomial Tree
Java Program to Compute the Volume of a Tetrahedron Using Determinants
Java Program to Find MST (Minimum Spanning Tree) using Kruskal’s Algorithm
Java Program to Find Basis and Dimension of a Matrix
Java Program to Find Median of Elements where Elements are Stored in 2 Different Arrays
Multi Dimensional ArrayList in Java
Java Program to Generate a Graph for a Given Fixed Degree Sequence
Registration with Spring Security – Password Encoding
Spring Cloud AWS – Messaging Support
Làm thế nào tạo instance của một class mà không gọi từ khóa new?