Arconia Docling

Arconia provides seamless integration with Docling, a powerful AI-powered document conversion service that transforms documents into structured formats like Markdown. The integration provides an auto-configured DoclingClient that can be used in Spring Boot applications to interact with a Docling Server for converting various document formats including PDFs, Word documents, and web pages.

Quick Start

Let’s see how you can get started with Arconia Docling in your Spring Boot application.

Dependencies

To add Docling support to your Spring Boot application, include the Arconia Docling Spring Boot Starter dependency in your project.

  • Gradle

  • Maven

dependencies {
    implementation 'io.arconia:arconia-docling-spring-boot-starter'
}
<dependency>
    <groupId>io.arconia</groupId>
    <artifactId>arconia-docling-spring-boot-starter</artifactId>
</dependency>

Arconia publishes a BOM (Bill of Materials) that you can use to manage the version of the Arconia libraries. While not required, it is recommended to use the BOM to ensure that all dependencies are compatible.

  • Gradle

  • Maven

dependencyManagement {
    imports {
        mavenBom "io.arconia:arconia-bom:0.15.0"
    }
}
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>io.arconia</groupId>
            <artifactId>arconia-bom</artifactId>
            <version>0.15.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

Dev Services

Arconia Dev Services provide zero-code integrations for services your application depends on, both at development and test time, relying on the power of Testcontainers and Spring Boot.

When working with Docling, you can use the Docling Dev Service to automatically start a Docling server during development and testing, giving you the possibility to convert documents without manually setting up a Docling server.

To enable the Docling Dev Service, add the following dependency to your project:

  • Gradle

  • Maven

dependencies {
    testAndDevelopmentOnly "io.arconia:arconia-dev-services-docling"
}
<dependency>
    <groupId>io.arconia</groupId>
    <artifactId>arconia-dev-services-docling</artifactId>
    <scope>runtime</scope>
    <optional>true</optional>
</dependency>

By default, the Dev Service is configured to expose the Docling UI on a specific port. The application logs will show you the URL where you can access that.

... Docling UI: http://localhost:<port>/ui

Running the Application

When using the Arconia Dev Services, you can keep running your application as you normally would. The Dev Services will automatically start when you run your application.

  • CLI

  • Gradle

  • Maven

arconia dev
./gradlew bootRun
./mvnw spring-boot:run
Unlike the lower-level Testcontainers support in Spring Boot, Arconia doesn’t require special tasks to run your application when using Dev Services (./gradlew bootTestRun or ./mvnw spring-boot:test-run) nor requires you to define a separate @SpringBootApplication class for configuring Testcontainers.

The application logs will show you the URL where you can access the Docling Server UI for interactive document conversion.

Configuration

The Arconia Docling integration provides sensible defaults for connecting to a Docling server. You can customize the connection settings and timeouts via configuration properties.

Table 1. Docling Client Configuration Properties
Property Default Description

arconia.docling.url

http://localhost:5001

The base URL for the Docling server API.

arconia.docling.connect-timeout

5s

Timeout to establish a connection to the Docling server.

arconia.docling.read-timeout

30s

Timeout for receiving a response from the Docling server.

Actuator

Health Indicator

When Spring Boot Actuator is present on the classpath, Arconia automatically configures a health indicator for the Docling integration. This health indicator checks the connectivity to the Docling server by calling its health endpoint. You can customize it via configuration properties.

Table 2. Health Configuration Properties
Property Default Description

management.health.docling.enabled

true

Whether the Docling health indicator should be enabled.

When enabled, the health status will be included in the actuator /health endpoint response, showing whether the Docling server is reachable and operational.

Using the Docling Client

Once you have added the dependency and optionally configured the connection settings, you can autowire and use the auto-configured DoclingClient in your Spring components.

Basic Usage

@Component
public class DocumentService {

    private final DoclingClient doclingClient;

    public DocumentService(DoclingClient doclingClient) {
        this.doclingClient = doclingClient;
    }

    public String convertWebPage(String url) {
        ConvertDocumentRequest request = ConvertDocumentRequest.builder()
                .addHttpSources(url)
                .build();

        ConvertDocumentResponse response = doclingClient.convertSource(request);
        return response.document().markdownContent();
    }
}

Converting HTTP Sources

You can convert web pages or documents accessible via HTTP/HTTPS URLs:

ConvertDocumentRequest request = ConvertDocumentRequest.builder()
        .addHttpSources("https://example.com/document.pdf")
        .build();

ConvertDocumentResponse response = doclingClient.convertSource(request);
String markdownContent = response.document().markdownContent();
String filename = response.document().filename();

Converting File Sources

You can also convert local files by encoding them as Base64:

byte[] fileContent = new ClassPathResource("document.pdf").getContentAsByteArray();
String base64Content = Base64.getEncoder().encodeToString(fileContent);

ConvertDocumentRequest request = ConvertDocumentRequest.builder()
        .addFileSources("document.pdf", base64Content)
        .build();

ConvertDocumentResponse response = doclingClient.convertSource(request);
String markdownContent = response.document().markdownContent();

Conversion Options

You can customize the conversion process using ConvertDocumentOptions:

ConvertDocumentOptions options = ConvertDocumentOptions.builder()
        .includeImages(true)
        .doOcr(true)
        .build();

ConvertDocumentRequest request = ConvertDocumentRequest.builder()
        .addHttpSources("https://example.com/document.pdf")
        .options(options)
        .build();

ConvertDocumentResponse response = doclingClient.convertSource(request);

Error Handling

The DoclingClient will throw appropriate runtime exceptions for different error conditions, as managed by the underlying RestClient.

try {
    ConvertDocumentRequest request = ConvertDocumentRequest.builder()
            .addHttpSources("https://invalid-url.com/document.pdf")
            .build();
    ConvertDocumentResponse response = doclingClient.convertSource(request);
} catch (HttpClientErrorException.NotFound ex) {
    log.warn("Document not found: {}", ex.getMessage());
} catch (HttpClientErrorException ex) {
    log.error("Client error during conversion: {}", ex.getMessage());
} catch (HttpServerErrorException ex) {
    log.error("Server error during conversion: {}", ex.getMessage());
}

Health Check

You can also programmatically check the health of the Docling server:

HealthCheckResponse health = doclingClient.health();
if ("ok".equals(health.status())) {
    // Docling server is healthy
} else {
    // Handle unhealthy server
}