Postgres – A new NoSQL

In today’s world of database technologies, there are two major database types: SQL and NoSQL.  Basically both SQL and NoSQL do the same thing but in different way. Depending upon our project needs we need to find the better fit for our project. So, if you need to handle large amount of data with a little or even no structure then NoSQL is the best fit. But if you need transactional support and handle structured data then you should go for the SQL. We will not go into the details of it. But for a quick difference between these two, you can visit here.

For any project we either go for SQL or NoSQL. For our project also we needed acid compliance and transactional support, so we had opted for SQL database(postgres). Now, keeping this in mind and looking at the title of this blog you may wonder Why would I even want to store unstructured data in my database? and that too postgres? Isn’t it better to go for the NoSQL database itself?

Nothing complicated in that. So this is the usecase that I had in my project:

In our project we were storing dish. Now as dish has many ingredients, we wanted to keep a track of some ingredients like whether it contains nuts, milk, eggs, gluten, soy sauce in order to handle the allergies and preferences of customers. Now,  by the conventional approach assuming that its a relational database, we would have created separate column for each. But thinking a little bit, is it a scalable solution? Off course not! Here postgres came into the picture with its hstore support. PostgreSQL has provided a very great platform by incorporating hstore, json and jsonb, which has lead us to use unstructured data in a structured database. So, we created just one column ‘contains’ which stored this data as a key value pair with keys as the ingredients and boolean value.

There are some other usecases also where it makes a lot of sense to incorporate JSON document into your model. For example, it’s perfect when you need to maintain data that comes from an external service in the same structure and format (as JSON) that it arrived to you. Instead of trying to normalize this data across multiple tables, you can store it as it is (and still query against it).

In this blog we will have a quick overview about the NoSQL capabilities of postgres and learn how to use hstore in detail.

HSTORE:

  • Hstore is a schema less key-value store.
  • The best part is it’s acid compliant.
  • It is useful to store parse attributes like product description.
  • The advantage of using hstore is we can store very different types of records with different attributes in the same table and still we can query with SQL.
  • The downside of hstore is that all values are stored as strings.

Postgres Document Store:

  • JSON is the most popular data interchange format on web.
  • Postgres has a native JSON data type and a variety of JSON functions.
  • It is a hierarchical document model.
  • Postgres also supports JSONB column type which is the binary version of JSON.
  • JSONB is faster and robust than JSON.
  • The key difference between JSON and JSONB is that JSON stores exact copy of the text input, which must be reparsed again and again. However, JSONB stores a binary representation that avoids reparsing the data structure.

If you want to learn how to use jsonb with ruby on rails you can visit here

So, we can say that we can use nosql capabilities with the same syntax and in the same ACID transactional environment and rely on the same query planner, optimizer and indexing technologies as conventional SQL-only queries.

Using HSTORE:

To use hstore you must enable the extension by using the command

 CREATE EXTENSION IF NOT EXISTS hstore; 

We can simply create a table as like any other table with hstore as column type

CREATE TABLE dishes(
  name TEXT,
  recipe TEXT,
  contains HSTORE
);

Insertion of the data has nothing magical to do with hstore. We can simply use the same conventional syntax

INSERT INTO dishes (name, recipie, contains) VALUES (
'Green Beans, Tomato and Potato Salad',
'Organic Potatoes red 2 cups, Hot house Tomatoes 1/2 cup, Organic Green Beans 1/2 cup, 1 tbsp Parsley, 1 tsp  Lemon, 1/2 clove Garlic, 1/4 cup Extra Virgin Olive Oil, Salt & Pepper red onoin 1/4 cup kalamata olives 2 tbsp 1 tsp capers',
'"nuts"=>"yes", "dairy"=>"no", "gluten"=>"no", "sesame"=>"no", "egg"=>"no"');

It’s typical for every row to have the same key names, or at least some minimum number of overlapping key names, but you can, of course, use any keys and values you like. It may be the case that there are totally different keys in many of the rows.

Now, let’s see a simple query to retrieve all the dishes containing nuts

SELECT name FROM dishes where contains->'nuts'='yes';

Notice several things here. First, the name of the column remains without any quotes, just as you do when you’re retrieving the full contents of the column. Second, you put the name of the key after the -> arrow. Finally, the returned value always will be of type TEXT. There are numerous operators and functions provided by postgres which you can always refer from the official documentation.

Hstore with Rails:

In rails you can use enable_extension in your migration. Let’s see how to add the column contains to our dishes table by writing a migration

class AddContainsToDish < ActiveRecord::Migration[5.1]
  def change
    enable_extension 'hstore'
    add_column :dishes, :contains, :hstore, default: {}
  end
end

Now you have to identify this column on your model with store_accessor as below:

class Dish < ApplicationRecord
  store_accessor :contains
end

We can now store any kind of attributes in the contains column.

Dish.create(name: 'Green Beans, Tomato and Potato Salad', contains: {'nuts'=>'yes', 'dairy'=>'no', 'gluten'=>'no', 'sesame'=>'no','egg'=>'no'})

Not only hstore allows us to store arbitrary key value pairs but also it allows us to quickly query them.

# Find all dishes that have a key 'nuts' in contains

Dish.where("contains ? :key", :key => 'nuts')

# Find all dishes having sesame

Dish.where("contains @> (:key => :value)", :key => 'sesame', :value => 'yes')

If you’re going to query this column frequently, you must add an index. There are two types you can use: GiST and GIN.

  • GIN indexes are three times faster to search, but they take more time to index. They also take more disk space. Use it when you have more than 100K unique terms.
  • GiST indexes are slower than GIN indexes, but they’re faster to update. Use it when you have up to 100K unique terms.

You can define the index on your migration file with the :using option.

class AddContainsToDish < ActiveRecord::Migration[5.1]
  def change
    enable_extension 'hstore'
    add_column :dishes, :contains, :hstore, default: {}
    add_index :dishes, :contains, using: :gin
  end
end

This is how you can use hstore. If you want to dig into the details, more information is available in the postgres hstore docs.

So, you can say that postgres is a bridge between SQL and NoSQL. You can convert hstore to json as well. Also you can make a sql table look like json document and vice versa in postgres. You can easily combine the sql and json queries in the acid compliant environment of postgres. So, now you can start with storing structured data in your database and then integrate unstructured data as well or start with unstructured dataset and adjust the balance between structured and unstructured data very very easily with postgres. To know more about the NoSQL capabilities visit the official site of enterprisedb.