Implementing repository using Postgres and JSONB

October 11, 2021; repository java json versioning postgresql

Repository is an object responsible for storing entities states. It is used to find right objects and to store them for later use. Here I would like to show you how to implement a simple repository in Java using PostgreSQL database and JSONB type available there. The solution is very simple to implement. Although it has its weaknesses it is often the good enough one.

Let’s start from architecture

Before I go further I would like to introduce you how I organize my application. In short, I try to divide the application into modules and keep the implementation details (e.g. used database engine or its structure) outside the module logic. This approach usually allows me to handle the logic of the module without focusing on such details. For example, if I want to create a module for lending books, I leave the details of how and where the books will be stored to the end of the process, and it is usually done outside the module itself - somewhere at the level of plugging the module into a running application.

Application Architecture — Simplified application architecture.

When we think about some feature, the word “module” often appears. However, it is worth avoiding it during writing the code. A module makes sense within a given thought context. When you suddenly need to use a module inside another module, or even deeper, it suddenly becomes a naming problem. I try to avoid “modules” by simply calling them by a specific domain name - for example, instead of BooksModule, I just use Books.

Accessing entity from repository

To start, we need a sample entity that we want to store and obtain from our repository. Without going too deep into the implementation at this point, let’s say it will be a BookEntity.

So let’s define the interface of how we want to access the entity. To start it off, it could be:

public interface BookRepository {
    
    Optional<BookEntity> findById(BookId id);
    
    void save(BookEntity entity);
    
    void remove(BookEntity entity);
}

Think of your Entity more like a working copy than an original instance. This actual “Book” is still in the repository. You are really only retrieving information about its state, and by the time you have fetched the Entity from the database, it may already have a completely different state, meaning you will potentially be working with outdated data.

Having the entity, and the API to access the repository, we can easily start coding the business logic. If it suddenly turns out that we need to find all the books of a given author, we can do it right away - without wasting too much time developing how to store and retrieve the list of books. So, since we just faced the need to get a list of books by a particular author - let’s extend our interface with that:

public interface BookRepository {

    // ...

    List<BookEntity> findByAuthor(AuthorId authorId);
}

As you can see from the examples above - I like to provide a dedicated type as an entity identifier - instead of something like String, Long, or UUID. This way I can see right away in the code that I need a specific book identifier here, rather than some UUID. In the above example, I see that to get a copy of the book, I need its ID, and to get the list of books, I need the author ID.

An Entity is not the only allowed option for the representation of a given object taken from the repository. Remember that it is, nevertheless, some working copy of the real object - you can also think of it, for example, as some unmodifiable object, where any modification of it from the code side will create a new unmodifiable object. In that case, you could call it, for example, BookSnapshot.

Before implementing Repository

When our module’s business logic is handled, this means we know what attributes our entity should have, and we have already defined the repository API that needs to be implemented. At this stage, we can consider if we should store our entities permanently (and use a relational database) or when for some reason it is enough to store them temporarily (and use Redis for example). Or maybe they are large objects, and we want to store them on Amazon S3 for example? This is the right time to make this decision. When it comes to business objects we are usually interested in permanent storage and with the possibility of auditing how a given entity has changed over time.

In business applications, we usually want to store objects permanently, along with the ability to audit changes. There won’t be much opportunity to use the SQL DELETE statement.

Before implementation, it is worth to consider the life cycle of our entities and the number of objects stored in the repository. Will we have hundreds, thousands or millions of them?… how often a single entity will be modified (how many versions will be saved there)?. In our case, I can assume that it will be several thousand entities, where each will be modified at most a couple of times.

In the post title, I promised that I would cover the implementation of a repository that runs on PostgreSQL and stores data in JSON (JSONB) format. This data type was introduced to Postgres in version 9.4, allowing us to have more mutable data structures stored in the old relational database.

Our current repository requirements are:

Storing entity attributes in JSON form (because that is one of the purpose of this article).
Ability to get an entity by its primary identifier.
Ability to retrieve entity list based on book author.
Ability to save entity state or delete it.
Change history audit.
The repository is expected to store many thousands of objects modified several times during their life cycle.

So, an example database could take this structure:

CREATE TABLE books (
	id uuid NOT NULL,
	object_id uuid NOT NULL,
	content jsonb NOT NULL,
	valid_from timestamp NOT NULL,
	valid_to timestamp,
	CONSTRAINT books_pk PRIMARY KEY (id)
);

id - will be used as the primary key of the record row - it will be responsible for storing the identifier of a specific entity version
object_id - this will be our entity identifier
content - here we store the attributes of our entity in JSON format
valid_from - the date from which the version is valid
valid_to - the date until which the version is valid (if it is NULL, then it is the current version of the entity)

Only one active entry:

To protect the application from creating multiple active (no valid_to date set) records with the same identifier, we can create a unique index on the fields that will ensure this uniqueness:

CREATE UNIQUE INDEX books_idx_valid_to ON books (object_id, (valid_to IS NULL)) WHERE valid_to IS NULL;

We still need to create appropriate indexes to allow us to quickly get to the data we are looking for.

Searching by entity id:

CREATE INDEX books_idx_object_id ON books (object_id);

Searching by author:

PostgreSQL also allows us to create indexes on individual fields included in the JSON. Below is an example of how to index an author id:

CREATE index books_idx_author_id ON books ((content->>'authorId'));

With the data storage infrastructure in place, you can start implementing the repository itself.

public class BookJdbcRepository implements BookRepository {

    private final Datasource datasouce;
    
    public BookJdbcRepository(Datasource datasouce) {
        this.datasource = datasource;
    }
    
    public Optional<BookEntity> findById(BookId id) {
        // ....
    }
    
    public void save(BookEntity entity) {
        // ....
    }
    
    public void remove(BookEntity entity) {
        // ....
    }
    
    public List<BookEntity> findByAuthor(AuthorId authorId) {
        // ....
    }
}

The Datasource is the connection pool created earlier. I recommend here for example HikariCP.

In real production applications, the simple use of DataSource may not be sufficient, especially when there is a need to modify many different objects within a single transaction. Often a better practice will be to replace it with some kind of transaction manager object. Someday I will write how to make an example implementation of such a mechanism.

Now we can focus on the implementation of the various methods. Let’s start with saving the versions of the entities:

public void save(BookEntity entity) {
    try (Connection connection = datasource.getConnection()) {
        Instant validDate = Instant.now();
        invalidateVersion(connection, entity, validDate);
        createVersion(connection, entity, validDate);
        connection.commit();
    } catch (SqlException e) {
        throw new RuntimeException(e);
    }
}

Before we go further into the implementation of the invalidateVersion and createVersion methods, we need to consider 2 details:

How to save entity state in JSON format.
How to implement optimistic locking (with this editing frequency, it may not be a must-have, but it’s always better to protect ourselves).

Prepare entity to save

In most cases, if I store entity state in multiple attributes then I do the conversion of those attributes at the repository level - converting the entity state to a private object that exists within the repository class.

public class BookJdbcRepository implements BookRepository {
    // ....
    
    private static class PersistedState {
        private final String authorId;
        private final String someAttribute;
        private final Integer otherAttribute;
        
        PersistedState(BookEntity entity) {
            // assign PersistedState fields from entity
        }
    }
}

I can convert such an object from or to JSON using, external libraries like Gson. In some cases, I store the state inside the entity already as such a ready-made single object, which I can also easily save.

Before we go back to implementing the repository itself, we still need a reverse mechanism that will allow us to build the entity again from the saved state. To do this, we can use a private entity constructor called by a static method. We also need to decide which field to use for optimistic locking - looking at the database structure, we have 2 available fields for this - id and valid_from. After a moment of reflection - I’ll take valid_from - a less obvious solution, but it will allow us to enhance our model with the date of the last modification of the entity (as if we wanted to present it somewhere to the user).

public class BookEntity {
    
    private final Instant lastUpdate;
    // .... and other attributes....
    
    public BookEntity(AuthorId authorId) {
        this.bookId = BookId.generate();
        this.authorId = authorId;
        this.lastUpdate = null;
    } 
    
    private BookEntinty(BookId bookId, AuthorId authorId, String someAttribute, 
                        Integer otherAttribute, Instant lastUpdate) {
        this.bookId = bookId;
        this.authorId = authorId;
        this.someAttribute = someAttribute;
        this.otherAttribute = otherAttribute;
        this.lastUpdate = lastUpdate;
    }
    
    // .....
    
    protected static BookEntity persisted(BookId bookId, AuthorId authorId, 
            String someAttribute, Integer otherAttribute, Instant lastUpdate) {
        
        return new BookEntity(bookId, authorId, someAttribute, otherAttribute, lastUpdate);
    }
}

And add a mapping class to the repository itself:

public class BookJdbcRepository implements BookRepository {
   // ....
    
    private static class Mapper extends BookEntity {
        private Mapper() {
            super(null);
            throw new RuntimeException("Do not use me.");
        }

        static BookEntity map(ResultSet rs) throws SQLException {
            PersistedState state = new Gson().fromJson(rs.getString("content"), PersistedState.class);
            return persisted(
                BookId.fromUUID((UUID) rs.getObject("object_id")),
                AuthorId.fromString(state.authorId),
                state.someAttribute,
                state.otherAttribute,
                rs.getTimestamp("valid_from").toInstant();
            );
        }
    }
}

Well, and at this point we seem to have the conversion of our entity to the state stored in the database and vice versa figured out, we can go back and deal with the invalidateVersion and createVersion methods. Let’s start by invalidating the previous version along with the optimistic locking mechanism - because that’s where it should work.

private void invalidateVersion(Connection connection, AutomationEntity entity, Instant validDate) throws SQLException {
    PreparedStatement stmt = connection.prepareStatement(
        "UPDATE books SET valid_to = ? WHERE object_id = ? AND valid_from = ? AND valid_to IS NULL"
    );
    stmt.setTimestamp(1, Timestamp.from(validDate));
    stmt.setObject(2, entity.getId().asUUID());
    stmt.setTimestamp(3, Timestamp.from(entity.getLastUpdate()))
    int changes = stmt.executeUpdate();
    if (changes != 1) {
        throw new IllegalStateException("Entity was modified in the meantime."); // or OptimisticLockException
    }
}

Now let’s implement the saving of the new version of the entity:

private void createVersion(Connection connection, BookEntity entity, Instant validDate) throws SQLException {
    PreparedStatement stmt = connection.prepareStatement(
        "INSERT INTO books (id,object_id,content,valid_from) VALUES (?,?,?::jsonb,?)"
    );

    stmt.setObject(1, UUID.randomUUID()); // generate versionId
    stmt.setObject(2, entity.getId().asUUID());
    stmt.setString(3, new Gson().toJson(new PersistedState(entity)));
    stmt.setTimestamp(4, Timestamp.from(validDate));
    stmt.executeUpdate();
}

Using UUIDv4 (as in the code above) is not the optimal solution when it comes to databases - due to its randomness. UUIDv7 will work better here, but in order to generate them in Java you will need to add additional libraries, or figure out on your own how generating such identifiers looks like.

And going back to the main methods of our API. Having the modification date stored in the model, we can tell if an entity was already stored in the database and its previous version should be deleted. Regardless of whether the previous version was present, we need to save the current state of the entity as a new version. Then we’ll implement the next missing methods - to delete and get data.

public class BookJdbcRepository implements BookRepository {
	// ....
    
    public void save(BookEntity entity) {
        try (Connection connection = datasource.getConnection()) {
            Instant validDate = Instant.now();
            if (entity.getLastUpdate() != null) {
            	invalidateVersion(connection, entity, validDate);
            }
            createVersion(connection, entity, validDate);
            
            connection.commit();
        } catch (SqlException e) {
            throw new RuntimeException(e);
        }
    }
    
    public void remove(BookEntity entity) {
        try (Connection connection = datasource.getConnection()) {
            Instant validDate = Instant.now();
            invalidateVersion(connection, entity, validDate);
            
            connection.commit();
        } catch (SqlException e) {
            throw new RuntimeException(e);
        }
    }
    
    public Optional<BookEntity> findById(BookId id) {
        try (Connection connection = datasource.getConnection()) {
            PreparedStatement stmt = connection.prepareStatement(
                "SELECT * FROM books WHERE object_id = ? AND valid_to IS NULL"
            );
            stmt.setObject(1, id.asUUID());
            ResultSet rs = stmt.executeQuery();
            if (rs.next()) {
                return Optional.of(Mapper.map(rs));
            }
            return Optional.empty();
        } catch (SqlException e) {
            throw new RuntimeException(e);
        }
    }
    
    public List<BookEntity> findByAuthor(AuthorId authorId) {
        try (Connection connection = datasource.getConnection()) {
            PreparedStatement stmt = connection.prepareStatement(
                "SELECT * FROM books WHERE content->>'authorId' = ? AND valid_to IS NULL"
            );
            stmt.setString(1, authorId.asString());
            ResultSet rs = stmt.executeQuery();
            List<BookEntity> result = new LinkedList();
            while (rs.next()) {
                result.add(Mapper.map(rs));
            }
            return result;
        } catch (SqlException e) {
            throw new RuntimeException(e);
        }
    }
    
    // .... other internal methods ....
}

And that’s it when it comes to the repository layer. If necessary, we can extend the repository with new methods for retrieving data, and we can easily enrich our model with new attributes without adding more columns to the database.