Series/Learn Java SQL, JDBC, Transactions, Connection Management & HikariCP

Deepen PracticeOrdered learning track

Large Objects and Streaming Data

Learn Java SQL, JDBC, Transactions, Connection Management & HikariCP - Part 027

Large Objects and Streaming Data in JDBC: BLOB/CLOB, InputStream/Reader, memory safety, transaction duration, driver behavior, and production-grade anti-patterns.

[2026-06-27]19 min read3741 words

In This Lesson

1. Kaufman Lens: Deconstruct the Skill 2. JDBC Large Object API Surface 3. Storage Decision: Database vs Object Storage

PrevNext

Lesson 2732 lesson track19–27 Deepen Practice

#java#jdbc#sql#blob+5 more

Part 027 — Large Objects and Streaming Data

Di part sebelumnya kita membahas batch dan bulk operation. Sekarang kita masuk ke area yang sering disepelekan: large objects dan streaming data.

Banyak engineer menganggap BLOB/CLOB hanya "kolom besar". Itu framing yang terlalu dangkal. Dalam production system, BLOB/CLOB adalah gabungan dari:

memory pressure di JVM,
network transfer pressure,
database storage pressure,
transaction duration,
connection hold time,
backup/replication impact,
observability blind spot,
dan lifecycle ownership antara application, database, object storage, serta downstream consumer.

JDBC menyediakan API untuk menangani binary/text large data melalui Blob, Clob, stream, reader, dan method binding seperti setBinaryStream, setBlob, setCharacterStream, dan setClob. Tetapi API availability tidak berarti desainnya otomatis benar.

Target part ini: setelah selesai, kita tidak hanya tahu cara menyimpan file ke database. Kita memahami kapan itu desain yang masuk akal, kapan harus memakai object storage, bagaimana menjaga memory tetap bounded, bagaimana menghindari transaction panjang, dan bagaimana men-debug incident large object di production.

1. Kaufman Lens: Deconstruct the Skill

Skill "menguasai LOB/streaming JDBC" bisa dipecah menjadi beberapa sub-skill:

Sub-skill	Pertanyaan Inti
Data placement	Apakah data besar memang harus berada di database?
Streaming write	Bagaimana upload besar dilakukan tanpa byte array raksasa?
Streaming read	Bagaimana download besar dilakukan tanpa load seluruh payload ke heap?
Transaction boundary	Berapa lama connection dan transaction ditahan?
Failure handling	Apa yang terjadi jika stream putus di tengah?
Cleanup	Siapa yang menutup stream, result set, statement, connection?
Observability	Bagaimana membuktikan bottleneck ada di DB, network, pool, atau client?
Security	Apakah payload perlu scanning, encryption, access control, audit?

Mental model utama:

Large object bukan sekadar value. Large object adalah data movement workflow yang kebetulan bisa melewati JDBC.

2. JDBC Large Object API Surface

JDBC menyediakan beberapa cara menangani data besar.

2.1 Binary Data

Untuk binary data:

byte[]
InputStream
Blob
ResultSet#getBytes(...)
ResultSet#getBinaryStream(...)
ResultSet#getBlob(...)
PreparedStatement#setBytes(...)
PreparedStatement#setBinaryStream(...)
PreparedStatement#setBlob(...)

2.2 Character Data

Untuk text/character data besar:

String
Reader
Clob
NClob
ResultSet#getString(...)
ResultSet#getCharacterStream(...)
ResultSet#getClob(...)
PreparedStatement#setString(...)
PreparedStatement#setCharacterStream(...)
PreparedStatement#setClob(...)
PreparedStatement#setNCharacterStream(...)
PreparedStatement#setNClob(...)

2.3 Rule of Thumb

Ukuran Data	Approach Umum
kecil, bounded, sering dipakai sebagai value	`byte[]` atau `String` bisa diterima
besar, tidak boleh memenuhi heap	stream/reader
database-specific LOB semantics dibutuhkan	`Blob`/`Clob`
file besar, CDN, direct browser upload, versioning	object storage + metadata di DB

Yang penting: jangan memilih API berdasarkan kenyamanan call site saja. Pilih berdasarkan data lifecycle.

3. Storage Decision: Database vs Object Storage

Sebelum menulis satu baris JDBC LOB code, jawab pertanyaan desain ini:

Apakah database adalah tempat terbaik untuk payload besar ini?

3.1 Menyimpan LOB di Database Masuk Akal Jika

payload relatif kecil dan bounded;
payload harus ikut transaction yang sama dengan metadata;
consistency lebih penting daripada latency/throughput file delivery;
akses file jarang dan mostly internal;
backup/restore database memang harus mencakup payload;
access control harus mengikuti row-level authorization;
payload bagian integral dari aggregate, bukan asset eksternal.

Contoh:

PDF final kecil untuk dokumen compliance;
signature image kecil;
generated report archive yang harus immutable;
payload audit yang harus atomically terkait dengan event bisnis.

3.2 Object Storage Lebih Masuk Akal Jika

file besar;
file sering diunduh;
perlu CDN;
perlu multipart upload;
perlu lifecycle policy;
perlu virus scanning asynchronous;
perlu presigned URL;
perlu versioning object;
database backup tidak boleh membengkak;
database replication tidak boleh membawa payload besar.

Contoh:

video;
image original resolusi besar;
attachment user upload ukuran puluhan MB/GB;
export file besar;
data lake ingestion object.

3.3 Hybrid Pattern: Metadata in DB, Payload in Object Storage

Pattern production yang sering paling sehat:

Metadata di DB:

object_key
file_name
content_type
size_bytes
sha256
status
owner_id
created_at
scan_status
version

Payload di object storage.

Keuntungan:

DB transaction tetap pendek;
file delivery tidak melewati DB;
object lifecycle bisa dikendalikan;
scanning bisa async;
metadata tetap queryable;
authorization tetap bisa dikelola oleh aplikasi.

4. Anti-Mental Model: "BLOB = File System di Database"

BLOB bukan file system.

Database tidak otomatis menjadi tempat terbaik untuk file hanya karena ada tipe BLOB.

Masalah umum jika database dipakai sebagai file store tanpa desain:

backup membengkak;
restore lambat;
replication lag;
query cache/page cache terganggu;
table bloat;
vacuum/cleanup cost;
transaction log/WAL/binlog besar;
connection ditahan selama transfer;
pool exhaustion ketika banyak download besar;
aplikasi gagal karena heap penuh.

Rule:

Simpan large object di database hanya jika ada alasan consistency, atomicity, governance, atau operational simplicity yang lebih kuat daripada cost-nya.

5. Writing Binary Data: Bad, Better, Safer

5.1 Bad: Load Entire File into `byte[]`

byte[] bytes = Files.readAllBytes(path);

try (Connection connection = dataSource.getConnection();
     PreparedStatement ps = connection.prepareStatement("""
         insert into document_file(id, file_name, content_type, payload)
         values (?, ?, ?, ?)
         """)) {

    ps.setObject(1, documentId);
    ps.setString(2, fileName);
    ps.setString(3, contentType);
    ps.setBytes(4, bytes);
    ps.executeUpdate();
}

Ini mungkin acceptable untuk file kecil dan limit ketat. Tetapi sebagai default production pattern, ini berbahaya.

Problem:

payload masuk heap penuh;
GC pressure tinggi;
concurrent upload memperbanyak duplikasi byte array;
tidak ada backpressure natural selain memory collapse;
upload besar bisa membunuh service.

5.2 Better: `setBinaryStream` dengan Length

public void insertDocumentFile(
        DataSource dataSource,
        UUID documentId,
        String fileName,
        String contentType,
        Path filePath
) throws SQLException, IOException {

    long size = Files.size(filePath);

    try (InputStream input = Files.newInputStream(filePath);
         Connection connection = dataSource.getConnection();
         PreparedStatement ps = connection.prepareStatement("""
             insert into document_file(id, file_name, content_type, size_bytes, payload)
             values (?, ?, ?, ?, ?)
             """)) {

        ps.setObject(1, documentId);
        ps.setString(2, fileName);
        ps.setString(3, contentType);
        ps.setLong(4, size);
        ps.setBinaryStream(5, input, size);

        int updated = ps.executeUpdate();
        if (updated != 1) {
            throw new IllegalStateException("Expected exactly one inserted row, got " + updated);
        }
    }
}

Keuntungan:

heap tidak perlu menampung seluruh payload;
API memberi driver informasi length;
lebih friendly untuk transfer besar;
lebih mudah dipasangi validation size.

Tetapi tetap ada trade-off:

connection ditahan sepanjang stream dikirim;
transaction log bisa besar;
database session sibuk;
query/statement timeout harus sesuai;
request timeout harus tidak lebih pendek dari write path tanpa desain cleanup.

5.3 Safer: Validate Before Opening Transaction

Jangan buka connection/transaction sebelum validasi murah selesai.

public void insertDocumentFile(
        DataSource dataSource,
        UUID documentId,
        String fileName,
        String contentType,
        Path filePath
) throws SQLException, IOException {

    long size = Files.size(filePath);
    validateFileName(fileName);
    validateContentType(contentType);
    validateMaxSize(size, 10 * 1024 * 1024L); // 10 MiB example limit

    String sha256 = sha256Hex(filePath);

    try (InputStream input = Files.newInputStream(filePath);
         Connection connection = dataSource.getConnection()) {

        connection.setAutoCommit(false);

        try (PreparedStatement ps = connection.prepareStatement("""
             insert into document_file(
                 id, file_name, content_type, size_bytes, sha256, payload
             ) values (?, ?, ?, ?, ?, ?)
             """)) {

            ps.setObject(1, documentId);
            ps.setString(2, fileName);
            ps.setString(3, contentType);
            ps.setLong(4, size);
            ps.setString(5, sha256);
            ps.setBinaryStream(6, input, size);
            ps.executeUpdate();

            connection.commit();
        } catch (Throwable t) {
            rollbackQuietly(connection, t);
            throw t;
        }
    }
}

Important subtlety:

hash computation sebelum transaction memperpendek transaction;
validation sebelum connection borrow mengurangi pool hold time;
InputStream dibuat tepat sebelum execute;
rollback tetap dibutuhkan karena manual transaction.

6. Writing Text Data: `String` vs `Reader` vs `Clob`

6.1 Small Text

Untuk text kecil/menengah yang bounded, setString biasanya cukup.

ps.setString(1, commentBody);

Tetapi "bounded" harus nyata:

ada column constraint;
ada validation;
ada API request size limit;
ada test untuk large input.

6.2 Large Text

Untuk text besar:

try (Reader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);
     Connection connection = dataSource.getConnection();
     PreparedStatement ps = connection.prepareStatement("""
         insert into report_body(id, body)
         values (?, ?)
         """)) {

    long charCount = countCharacters(path, StandardCharsets.UTF_8);

    ps.setObject(1, reportId);
    ps.setCharacterStream(2, reader, charCount);
    ps.executeUpdate();
}

Catatan penting:

byte length dan character length tidak sama untuk UTF-8;
setCharacterStream(..., length) memakai jumlah character, bukan byte;
menghitung character bisa mahal;
beberapa driver punya behavior berbeda untuk overload tanpa length.

Untuk payload text sangat besar, pertimbangkan menyimpan sebagai object storage, bukan CLOB.

7. Reading Binary Data: Avoid `getBytes()` for Large Payload

7.1 Bad: `getBytes()` untuk File Besar

byte[] payload = rs.getBytes("payload");

Problem:

seluruh payload masuk heap;
response besar memperparah GC;
satu request bisa memakai ratusan MB;
concurrent download bisa menumbangkan service.

7.2 Better: `getBinaryStream()`

public void streamDocumentFile(
        DataSource dataSource,
        UUID documentId,
        OutputStream output
) throws SQLException, IOException {

    try (Connection connection = dataSource.getConnection();
         PreparedStatement ps = connection.prepareStatement("""
             select payload
             from document_file
             where id = ?
             """)) {

        ps.setObject(1, documentId);

        try (ResultSet rs = ps.executeQuery()) {
            if (!rs.next()) {
                throw new NoSuchElementException("Document file not found: " + documentId);
            }

            try (InputStream input = rs.getBinaryStream("payload")) {
                input.transferTo(output);
            }
        }
    }
}

Ini menghindari byte array raksasa, tetapi tetap ada consequence penting:

connection ditahan selama output ditulis;
jika client lambat, DB connection ikut tertahan;
jika response streaming langsung ke HTTP client, slow client bisa menyebabkan pool exhaustion.

7.3 Better for Web Download: Decouple DB Read from Slow Client Carefully

Ada trade-off.

Option A: stream langsung DB → HTTP response.

Kelebihan:

memory rendah;
simple.

Risiko:

client lambat menahan DB connection;
pool bisa habis jika banyak download;
timeout harus end-to-end.

Option B: DB → bounded temp file → HTTP response.

Kelebihan:

DB connection dilepas sebelum client download selesai;
lebih tahan slow client.

Risiko:

butuh disk space;
cleanup harus kuat;
latency awal lebih besar;
temp file security.

Option C: payload di object storage.

Kelebihan:

DB tidak menanggung transfer payload;
app tidak menjadi file proxy;
CDN/presigned URL bisa dipakai.

Risiko:

consistency antara DB dan object storage harus didesain;
cleanup orphan object;
access control harus hati-hati.

8. Transaction Duration and Connection Hold Time

LOB operation sering bermasalah bukan karena query-nya kompleks, tetapi karena durasi transfer.

Database connection ditahan sejak:

connection borrowed

sampai:

result consumed + statement closed + connection closed

Untuk streaming read, ResultSet dan InputStream biasanya bergantung pada statement/connection yang masih hidup. Jadi connection tidak boleh ditutup sebelum stream selesai dibaca.

Akibat:

10 concurrent download besar bisa memakai 10 pool connection selama puluhan detik;
OLTP request kecil bisa menunggu pool;
connectionTimeout naik;
thread request menumpuk;
cascading failure terjadi padahal database CPU rendah.

Rule:

Untuk operasi LOB besar, ukur connection hold time, bukan hanya query execution time.

9. API Design: Do Not Return `InputStream` from DAO Casually

Anti-pattern:

public InputStream findPayload(UUID id) throws SQLException {
    Connection connection = dataSource.getConnection();
    PreparedStatement ps = connection.prepareStatement("select payload from file where id = ?");
    ps.setObject(1, id);
    ResultSet rs = ps.executeQuery();
    rs.next();
    return rs.getBinaryStream("payload");
}

Ini sangat berbahaya.

Problem:

siapa yang menutup ResultSet?
siapa yang menutup PreparedStatement?
siapa yang mengembalikan Connection ke pool?
apa yang terjadi jika caller lupa close stream?
bagaimana jika caller membaca stream setelah DAO method selesai?
bagaimana mengukur connection hold time?

9.1 Safer Pattern: Consumer Callback

public void withPayloadStream(
        UUID id,
        ThrowingConsumer<InputStream> consumer
) throws SQLException, IOException {

    try (Connection connection = dataSource.getConnection();
         PreparedStatement ps = connection.prepareStatement("""
             select payload
             from document_file
             where id = ?
             """)) {

        ps.setObject(1, id);

        try (ResultSet rs = ps.executeQuery()) {
            if (!rs.next()) {
                throw new NoSuchElementException("Document not found: " + id);
            }

            try (InputStream input = rs.getBinaryStream("payload")) {
                consumer.accept(input);
            }
        }
    }
}

@FunctionalInterface
public interface ThrowingConsumer<T> {
    void accept(T value) throws SQLException, IOException;
}

Keuntungan:

ownership resource jelas;
connection close tetap di method yang borrow;
caller tidak bisa menyimpan stream tanpa sadar;
instrumentation bisa mengukur durasi consumer.

9.2 Alternative: Return Materialized Bounded Value

Untuk file kecil:

public byte[] findSmallPayload(UUID id) throws SQLException {
    try (Connection connection = dataSource.getConnection();
         PreparedStatement ps = connection.prepareStatement("""
             select payload
             from small_document_file
             where id = ?
             """)) {

        ps.setObject(1, id);

        try (ResultSet rs = ps.executeQuery()) {
            if (!rs.next()) {
                throw new NoSuchElementException("File not found: " + id);
            }
            return rs.getBytes("payload");
        }
    }
}

Syarat:

ukuran benar-benar bounded;
enforce di schema/app;
ada limit maksimal;
tidak dipakai untuk arbitrary upload.

10. Streaming Upload in HTTP Applications

Skenario umum:

HTTP multipart upload -> Java app -> JDBC BLOB column

Risiko utamanya:

request body besar;
multipart framework bisa buffer ke memory/disk;
app membuka DB connection terlalu awal;
client lambat menahan DB connection;
upload gagal di tengah;
transaction dibiarkan menggantung;
timeout antara proxy/app/DB tidak selaras.

10.1 Bad Flow

Masalah: connection dipinjam terlalu awal.

10.2 Better Flow

Kelebihan:

slow client tidak menahan DB connection;
validation bisa dilakukan sebelum transaction;
retry/cleanup lebih jelas;
size limit bisa ditegakkan.

Trade-off:

perlu temp storage;
perlu cleanup;
ada double IO.

Untuk file besar, flow object storage biasanya lebih baik.

11. LOB and Auto-Commit

Jangan gunakan auto-commit secara asal untuk LOB write besar.

Dengan auto-commit true, setiap statement menjadi transaction sendiri. Untuk single insert BLOB, ini tampak sederhana. Tetapi failure semantics tetap perlu dipahami:

jika stream gagal sebelum execute selesai, insert gagal;
jika network gagal setelah DB commit tetapi sebelum app menerima success, status commit ambiguous;
jika metadata dan payload butuh atomicity, gabungkan dalam satu transaction;
jika ada side effect lain, jangan letakkan external call di tengah transaction.

Pattern:

connection.setAutoCommit(false);
try {
    insertMetadata(...);
    insertPayloadStream(...);
    insertAuditEvent(...);
    connection.commit();
} catch (Throwable t) {
    rollbackQuietly(connection, t);
    throw t;
}

Tetapi jangan membuat transaction lebih panjang dari perlu. Precompute hash, validate file, scan preliminary metadata, dan prepare temp file sebelum borrow connection.

12. Chunking: When One LOB Row Is Not Enough

Kadang payload besar perlu dipecah menjadi chunks.

Schema contoh:

create table file_object (
    id uuid primary key,
    file_name varchar(255) not null,
    content_type varchar(255) not null,
    size_bytes bigint not null,
    sha256 varchar(64) not null,
    status varchar(32) not null,
    created_at timestamp not null
);

create table file_object_chunk (
    file_id uuid not null references file_object(id),
    chunk_index integer not null,
    payload bytea not null,
    primary key (file_id, chunk_index)
);

Keuntungan chunking:

retry partial upload lebih mungkin;
memory per chunk bounded;
bisa resume;
bisa parallelize pada storage tertentu;
transaction per chunk lebih pendek.

Risiko:

complexity lebih tinggi;
reconstruct order harus benar;
partial state harus dikelola;
cleanup incomplete upload;
integrity hash wajib.

Jangan membuat chunking custom kecuali benar-benar diperlukan. Untuk banyak kasus, object storage multipart upload sudah menyelesaikan masalah ini lebih baik.

13. Integrity: Size, Hash, and Content Validation

Large payload tanpa integrity metadata adalah bom waktu.

Minimal metadata:

size_bytes bigint not null,
sha256 char(64) not null,
content_type varchar(255) not null,
original_file_name varchar(255) not null

13.1 Validate Size

private static void validateMaxSize(long size, long maxSize) {
    if (size < 0) {
        throw new IllegalArgumentException("Invalid negative size");
    }
    if (size > maxSize) {
        throw new IllegalArgumentException("File too large: " + size);
    }
}

13.2 Compute Hash While Streaming

Untuk menghindari membaca file dua kali, bisa pakai DigestInputStream.

MessageDigest digest = MessageDigest.getInstance("SHA-256");
try (InputStream raw = Files.newInputStream(path);
     DigestInputStream input = new DigestInputStream(raw, digest)) {

    // Pass input into JDBC setBinaryStream or copy to temp destination.
}
byte[] hash = digest.digest();

Namun hati-hati: jika driver membaca stream saat executeUpdate(), digest baru final setelah execute selesai. Jika hash harus masuk row yang sama, lebih mudah precompute hash sebelum insert, atau gunakan staging state.

Pattern staging:

insert metadata dengan status UPLOADING;
stream payload;
update hash/status READY setelah sukses;
cleanup jika gagal.

14. Security: Large Payload Is an Attack Surface

LOB path sering menjadi attack surface:

decompression bomb;
oversized upload;
malicious PDF/image;
filename path traversal;
content type spoofing;
malware upload;
PII leakage;
unauthorized download;
resource exhaustion via slow upload/download.

Checklist:

Concern	Control
Size abuse	request size limit + app validation + schema max expectation
Slow upload	proxy/app read timeout
Slow download	response timeout + separate pool/object storage
Filename attack	store original safely, never use as path directly
Content type spoof	sniff/validate content when needed
Malware	scanning workflow
Unauthorized access	metadata authorization before payload access
PII	encryption/audit/data retention
DoS	rate limit + quota + bounded temp storage

15. Observability for LOB Operations

LOB operation harus punya metrics berbeda dari query OLTP biasa.

Metrics yang penting:

upload size histogram;
download size histogram;
DB connection hold time;
stream read/write duration;
bytes transferred;
temp storage usage;
failed stream count;
client abort count;
pool active connections during LOB;
pending connection count;
large-object transaction duration;
object storage orphan count jika hybrid.

Log structured minimal:

{
  "event": "document_file_downloaded",
  "document_id": "...",
  "size_bytes": 7340032,
  "duration_ms": 4210,
  "db_connection_hold_ms": 4190,
  "storage": "database_blob",
  "request_id": "..."
}

Untuk upload:

{
  "event": "document_file_uploaded",
  "document_id": "...",
  "size_bytes": 10485760,
  "sha256": "...",
  "db_write_ms": 1830,
  "validation_ms": 210,
  "request_id": "..."
}

Tanpa metrics ini, incident hanya terlihat sebagai:

HikariPool-1 - Connection is not available, request timed out

Padahal root cause bisa berupa client download lambat.

16. Pool Isolation for LOB Workload

Untuk workload campuran, pisahkan pool.

Mengapa?

download besar tidak menghabiskan pool OLTP;
timeout bisa beda;
metrics lebih jelas;
alert bisa spesifik;
blast radius mengecil.

Contoh:

app.datasource.oltp.maximumPoolSize=16
app.datasource.oltp.connectionTimeout=250

app.datasource.lob.maximumPoolSize=4
app.datasource.lob.connectionTimeout=1000

Angka di atas hanya ilustrasi. Sizing harus mengikuti workload dan DB capacity.

17. Driver-Specific Behavior

JDBC API menyediakan abstraction, tetapi LOB behavior bisa sangat driver-specific.

Yang harus diuji pada driver/database nyata:

apakah setBinaryStream benar-benar streaming atau buffering;
apakah length wajib untuk streaming optimal;
apakah getBinaryStream menahan cursor/server resource;
apakah auto-commit mempengaruhi streaming;
apakah fetch size relevan untuk LOB;
apakah BLOB locator valid setelah transaction commit;
apakah Blob#free() diperlukan/berguna;
apakah query timeout berlaku saat stream transfer;
bagaimana behavior saat client abort.

Rule:

Untuk LOB, jangan percaya H2/in-memory test sebagai bukti production behavior.

Gunakan database dan driver target via integration test/Testcontainers.

18. `Blob`/`Clob` Object Lifecycle

Blob dan Clob bukan sekadar wrapper value biasa. Mereka bisa merepresentasikan locator/resource database.

Pattern aman:

try (Connection connection = dataSource.getConnection();
     PreparedStatement ps = connection.prepareStatement("""
         select payload
         from document_file
         where id = ?
         """)) {

    ps.setObject(1, id);

    try (ResultSet rs = ps.executeQuery()) {
        if (!rs.next()) {
            throw new NoSuchElementException();
        }

        Blob blob = rs.getBlob("payload");
        try (InputStream input = blob.getBinaryStream()) {
            input.transferTo(output);
        } finally {
            blob.free();
        }
    }
}

Tetapi sering kali getBinaryStream() langsung lebih sederhana:

try (InputStream input = rs.getBinaryStream("payload")) {
    input.transferTo(output);
}

Gunakan Blob/Clob jika ada alasan spesifik:

API driver/database memerlukannya;
butuh partial access;
butuh metadata LOB;
butuh explicit free.

Jika tidak, stream API biasanya lebih clean.

19. Timeout Design for LOB

Timeout LOB harus berbeda dari OLTP query kecil.

Layer timeout:

Untuk LOB:

pool acquisition timeout sebaiknya tetap relatif pendek;
stream transfer timeout harus mempertimbangkan size dan bandwidth;
query timeout tidak selalu mencakup seluruh stream consumption secara identik di semua driver;
client/proxy timeout harus lebih besar daripada expected transfer;
jangan membuat OLTP timeout ikut longgar karena LOB.

Anti-pattern:

Karena upload file sering timeout, semua timeout aplikasi dinaikkan menjadi 5 menit.

Konsekuensi:

request stuck bertahan lama;
thread pool penuh;
DB connection tertahan;
failure detection lambat;
cascading failure lebih parah.

Solusi yang lebih sehat:

pisahkan endpoint/pool;
gunakan object storage direct upload;
gunakan async processing;
enforce max size;
desain resumable upload jika perlu.

20. Failure Modes

20.1 Stream Fails Mid-Write

Kemungkinan:

client disconnect;
temp file read error;
DB connection reset;
socket timeout;
disk issue;
payload exceeds DB limit.

Response:

rollback transaction;
mark upload failed jika metadata sudah dibuat;
cleanup temp file/object;
emit metric/log;
do not retry blindly unless idempotency exists.

20.2 Commit Ambiguous After LOB Write

Kasus:

executeUpdate sukses, commit dikirim, network putus sebelum app menerima response.

App tidak tahu apakah DB commit berhasil.

Mitigation:

use deterministic ID;
unique key/idempotency key;
hash metadata;
read-after-failure reconciliation;
avoid duplicate insert with generated random ID only on server side.

20.3 Client Abort During Download

Jika streaming DB → HTTP langsung:

output stream throws IOException;
app harus close JDBC resources;
connection harus return ke pool;
metric client abort harus dicatat;
jangan log sebagai DB corruption.

20.4 Pool Exhaustion During Download Spike

Signal:

active connections naik;
pending threads naik;
DB CPU tidak selalu naik;
query latency OLTP naik karena wait pool;
HTTP request small endpoints ikut timeout.

Mitigation:

separate LOB pool;
object storage redirect;
rate limit download;
cap concurrent downloads;
temp-file decoupling;
backpressure.

21. Implementation Pattern: Database LOB Repository

Contoh repository yang lebih production-minded.

public final class DocumentFileRepository {
    private final DataSource dataSource;
    private final long maxSizeBytes;

    public DocumentFileRepository(DataSource dataSource, long maxSizeBytes) {
        this.dataSource = Objects.requireNonNull(dataSource);
        this.maxSizeBytes = maxSizeBytes;
    }

    public void insert(DocumentFileUpload upload) throws SQLException, IOException {
        validate(upload);

        try (InputStream input = upload.openStream();
             Connection connection = dataSource.getConnection()) {

            connection.setAutoCommit(false);

            try (PreparedStatement ps = connection.prepareStatement("""
                insert into document_file(
                    id,
                    owner_id,
                    file_name,
                    content_type,
                    size_bytes,
                    sha256,
                    payload,
                    created_at
                ) values (?, ?, ?, ?, ?, ?, ?, current_timestamp)
                """)) {

                ps.setObject(1, upload.id());
                ps.setObject(2, upload.ownerId());
                ps.setString(3, upload.fileName());
                ps.setString(4, upload.contentType());
                ps.setLong(5, upload.sizeBytes());
                ps.setString(6, upload.sha256Hex());
                ps.setBinaryStream(7, input, upload.sizeBytes());

                int updated = ps.executeUpdate();
                if (updated != 1) {
                    throw new IllegalStateException("Expected one inserted document file, got " + updated);
                }

                connection.commit();
            } catch (Throwable t) {
                rollbackQuietly(connection, t);
                throw t;
            }
        }
    }

    public void streamPayload(UUID id, OutputStream output) throws SQLException, IOException {
        long startNanos = System.nanoTime();

        try (Connection connection = dataSource.getConnection();
             PreparedStatement ps = connection.prepareStatement("""
                 select payload
                 from document_file
                 where id = ?
                 """)) {

            ps.setObject(1, id);

            try (ResultSet rs = ps.executeQuery()) {
                if (!rs.next()) {
                    throw new NoSuchElementException("Document file not found: " + id);
                }

                try (InputStream input = rs.getBinaryStream("payload")) {
                    input.transferTo(output);
                }
            }
        } finally {
            long elapsedMs = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startNanos);
            // record metric: document_file.stream.duration
        }
    }

    private void validate(DocumentFileUpload upload) {
        if (upload.sizeBytes() < 0 || upload.sizeBytes() > maxSizeBytes) {
            throw new IllegalArgumentException("Invalid file size: " + upload.sizeBytes());
        }
        if (upload.fileName() == null || upload.fileName().isBlank()) {
            throw new IllegalArgumentException("File name is required");
        }
        if (upload.contentType() == null || upload.contentType().isBlank()) {
            throw new IllegalArgumentException("Content type is required");
        }
        if (upload.sha256Hex() == null || upload.sha256Hex().length() != 64) {
            throw new IllegalArgumentException("Invalid SHA-256");
        }
    }

    private static void rollbackQuietly(Connection connection, Throwable original) {
        try {
            connection.rollback();
        } catch (SQLException rollbackFailure) {
            original.addSuppressed(rollbackFailure);
        }
    }
}

Model input:

public record DocumentFileUpload(
        UUID id,
        UUID ownerId,
        String fileName,
        String contentType,
        long sizeBytes,
        String sha256Hex,
        ThrowingSupplier<InputStream> streamSupplier
) {
    public InputStream openStream() throws IOException {
        return streamSupplier.get();
    }
}

@FunctionalInterface
public interface ThrowingSupplier<T> {
    T get() throws IOException;
}

Design notes:

repository tidak menerima byte[] untuk arbitrary upload;
stream supplier membuat ownership lebih eksplisit;
validasi sebelum borrow connection;
connection hold time bisa diukur;
generated ID dari caller mendukung idempotency;
hash disimpan untuk integrity/reconciliation.

22. Object Storage Pattern with Transactional Metadata

Untuk file besar, pattern ini sering lebih baik.

22.1 Tables

create table document_object (
    id uuid primary key,
    owner_id uuid not null,
    object_key varchar(1024) not null unique,
    file_name varchar(255) not null,
    content_type varchar(255) not null,
    size_bytes bigint not null,
    sha256 char(64),
    status varchar(32) not null,
    created_at timestamp not null,
    completed_at timestamp
);

Status:

PENDING_UPLOAD -> UPLOADED -> SCANNED_CLEAN -> READY
                             -> SCANNED_BLOCKED
                             -> EXPIRED

22.2 Flow

22.3 Consistency Problem

Database transaction cannot atomically commit together with object storage write unless you introduce distributed transaction-like complexity. Usually we do not.

Instead, design reconciliation:

DB row pending but object missing → expire row;
object exists but DB row missing → delete orphan;
DB says uploaded but scan missing → reschedule scan;
hash mismatch → block object;
ready object but owner deleted → lifecycle cleanup.

This is a workflow, not a single transaction.

23. Testing Strategy

23.1 Unit Tests

Test:

size validation;
content type validation;
filename sanitization;
hash validation;
status transitions;
cleanup logic.

23.2 Integration Tests with Real DB

Test with target DB/driver:

insert using setBinaryStream;
read using getBinaryStream;
large payload above normal size;
rollback after failed stream;
connection close after client abort simulation;
transaction timeout behavior;
statement timeout behavior;
pool exhaustion with concurrent streaming;
memory usage does not scale with file size.

23.3 Heap Safety Test

Pseudo-test:

@Test
void streamingDownloadDoesNotMaterializeEntirePayload() {
    // Use a large payload, bounded heap, and measure memory before/after.
    // This is not a perfect unit test, but useful as regression guard.
}

Better: run performance/integration profile with heap cap:

java -Xmx256m -jar app.jar

Then test several concurrent large downloads.

24. Code Review Checklist

For every JDBC LOB change, ask:

Is the payload size bounded?
Is validation done before borrowing connection?
Are streams closed deterministically?
Does the DAO leak InputStream ownership?
Is connection hold time measured?
Is direct DB → HTTP streaming acceptable for slow clients?
Should this use object storage instead?
Does this operation need a separate pool?
Is the transaction duration bounded?
Are hash/size/content type stored?
Is there cleanup for partial failure?
Are retries idempotent?
Are timeouts configured per layer?
Is this tested with the real database driver?

25. Anti-Patterns Catalog

25.1 `Files.readAllBytes()` for User Upload

Bad because memory scales with file size and concurrency.

25.2 Returning `InputStream` from DAO Without Resource Ownership

Bad because connection/result set/statement lifecycle becomes unclear.

25.3 Streaming DB BLOB Directly to Slow HTTP Clients Using OLTP Pool

Bad because slow clients consume database connections.

25.4 Storing Large Videos in OLTP Database

Usually bad because database is not CDN/object storage.

25.5 No Size Limit

Bad because attacker/user can force resource exhaustion.

25.6 No Hash

Bad because reconciliation and integrity checks become weak.

25.7 Long Transaction Including Upload from Client

Bad because client network speed controls DB transaction duration.

25.8 Testing Only with H2

Bad because driver-specific LOB behavior matters.

25.9 Using One Pool for Everything

Bad because large transfer can starve normal OLTP requests.

25.10 Treating LOB Timeout as Query Timeout Only

Bad because stream transfer, client timeout, socket timeout, pool timeout, and lock timeout are different.

26. Production Incident Playbook

Symptom:

Connection is not available, request timed out after 30000ms

Ask:

Did active connections rise?
Did pending threads rise?
Did DB CPU rise?
Are many active requests file download/upload endpoints?
Are clients slow?
Is average connection hold time high?
Are query durations high or stream durations high?
Did a recent release switch from object storage to DB BLOB?
Did file size distribution change?
Are response aborts increasing?

Decision tree:

Immediate mitigations:

rate-limit LOB endpoints;
reduce concurrent LOB operations;
temporarily route downloads to object storage if available;
increase pool only if DB can handle it and root cause is verified;
kill pathological sessions if safe;
rollback bad release if behavior changed.

Long-term fixes:

separate LOB pool;
object storage pattern;
bounded upload size;
temp file decoupling;
better metrics;
connection hold-time alerting.

27. Practice Exercises

Exercise 1 — Refactor Unsafe Upload

Given this code:

byte[] bytes = multipartFile.getBytes();
ps.setBytes(1, bytes);

Refactor to:

enforce max size;
avoid materializing arbitrary file;
use stream binding;
store sha256;
use explicit transaction;
measure DB write duration.

Exercise 2 — Diagnose Pool Exhaustion

Scenario:

At 09:00, export/download endpoint traffic increased 4x.
OLTP endpoints started timing out.
DB CPU stayed below 35%.
Hikari active=max, pending increased.

Explain likely root cause and propose immediate/long-term fix.

Expected reasoning:

DB not CPU-bound;
active=max means connections held;
download likely holds connection during streaming;
slow clients or large payloads can starve pool;
fix with separate LOB pool/object storage/rate limit.

Exercise 3 — Design Object Storage Metadata Workflow

Design:

table schema;
statuses;
upload flow;
scanner flow;
orphan cleanup;
download authorization.

28. Key Takeaways

LOB is data movement, not just a column type.
byte[]/String are acceptable only for bounded payloads.
Use stream/reader APIs for large data, but understand connection hold time.
Streaming DB → HTTP directly can starve your pool when clients are slow.
Database BLOB is not a general-purpose file storage replacement.
Object storage + DB metadata is often the better production architecture.
Validation should happen before borrowing connection when possible.
Resource ownership must be explicit; do not casually return InputStream from DAO.
Measure connection hold time, stream duration, and payload size distribution.
Test LOB behavior with the real database and driver.

29. References

Oracle Java SE 25 API — PreparedStatement
Oracle Java SE 25 API — ResultSet
Oracle Java SE 25 API — Blob
Oracle Java SE 25 API — Clob
HikariCP official repository and configuration documentation
PostgreSQL/MySQL vendor documentation for large object and binary/text column behavior when applicable

30. Next

Part 028 akan membahas Observability: Metrics, Logs, Traces, and Database Correlation.

Kita akan mengubah JDBC dari black box menjadi sistem yang bisa diukur:

pool metrics;
query latency;
transaction duration;
connection hold time;
slow query correlation;
OpenTelemetry tracing;
structured logging;
incident dashboards;
alert design.

Lesson Recap

You just completed lesson 27 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 26

Batch Processing and Bulk Data Operations

Next Lesson

Lesson 28

Observability: Metrics, Logs, Traces, and Database Correlation

Large Objects and Streaming Data

Part 027 — Large Objects and Streaming Data

1. Kaufman Lens: Deconstruct the Skill

2. JDBC Large Object API Surface

2.1 Binary Data

2.2 Character Data

2.3 Rule of Thumb

3. Storage Decision: Database vs Object Storage

3.1 Menyimpan LOB di Database Masuk Akal Jika

3.2 Object Storage Lebih Masuk Akal Jika

3.3 Hybrid Pattern: Metadata in DB, Payload in Object Storage

4. Anti-Mental Model: "BLOB = File System di Database"

5. Writing Binary Data: Bad, Better, Safer

5.1 Bad: Load Entire File into byte[]

5.2 Better: setBinaryStream dengan Length

5.3 Safer: Validate Before Opening Transaction

6. Writing Text Data: String vs Reader vs Clob

6.1 Small Text

6.2 Large Text

7. Reading Binary Data: Avoid getBytes() for Large Payload

7.1 Bad: getBytes() untuk File Besar

7.2 Better: getBinaryStream()

7.3 Better for Web Download: Decouple DB Read from Slow Client Carefully

8. Transaction Duration and Connection Hold Time

9. API Design: Do Not Return InputStream from DAO Casually

9.1 Safer Pattern: Consumer Callback

9.2 Alternative: Return Materialized Bounded Value

10. Streaming Upload in HTTP Applications

10.1 Bad Flow

10.2 Better Flow

11. LOB and Auto-Commit

12. Chunking: When One LOB Row Is Not Enough

13. Integrity: Size, Hash, and Content Validation

13.1 Validate Size

13.2 Compute Hash While Streaming

14. Security: Large Payload Is an Attack Surface

15. Observability for LOB Operations

16. Pool Isolation for LOB Workload

17. Driver-Specific Behavior

18. Blob/Clob Object Lifecycle

19. Timeout Design for LOB

20. Failure Modes

20.1 Stream Fails Mid-Write

20.2 Commit Ambiguous After LOB Write

20.3 Client Abort During Download

20.4 Pool Exhaustion During Download Spike

21. Implementation Pattern: Database LOB Repository

22. Object Storage Pattern with Transactional Metadata

22.1 Tables

22.2 Flow

22.3 Consistency Problem

23. Testing Strategy

23.1 Unit Tests

23.2 Integration Tests with Real DB

23.3 Heap Safety Test

24. Code Review Checklist

25. Anti-Patterns Catalog

25.1 Files.readAllBytes() for User Upload

25.2 Returning InputStream from DAO Without Resource Ownership

25.3 Streaming DB BLOB Directly to Slow HTTP Clients Using OLTP Pool

25.4 Storing Large Videos in OLTP Database

25.5 No Size Limit

25.6 No Hash

25.7 Long Transaction Including Upload from Client

25.8 Testing Only with H2

25.9 Using One Pool for Everything

25.10 Treating LOB Timeout as Query Timeout Only

26. Production Incident Playbook

27. Practice Exercises

Exercise 1 — Refactor Unsafe Upload

Exercise 2 — Diagnose Pool Exhaustion

Exercise 3 — Design Object Storage Metadata Workflow

28. Key Takeaways

29. References

30. Next

5.1 Bad: Load Entire File into `byte[]`

5.2 Better: `setBinaryStream` dengan Length

6. Writing Text Data: `String` vs `Reader` vs `Clob`

7. Reading Binary Data: Avoid `getBytes()` for Large Payload

7.1 Bad: `getBytes()` untuk File Besar

7.2 Better: `getBinaryStream()`

9. API Design: Do Not Return `InputStream` from DAO Casually

18. `Blob`/`Clob` Object Lifecycle

25.1 `Files.readAllBytes()` for User Upload

25.2 Returning `InputStream` from DAO Without Resource Ownership