Practical Application of Row Level Security
Update: I initially wrote thing a year ago while exploring Row Level Security in Postgres, but unpublished it due to performance concerns. I am republishing it now since new methods have come to light and it may serve as a useful reference.
More update: I just finished a more thorough investigation of how to implement RLS performantly, it leverages some new optimizations in PostgreSQL 10.
Here’s a dangerous idea: let’s give each application user a role in the database. PostgreSQL 9.5 added Row Level Security, (RLS), which provides a benefit of implementing this— it’s just a question of whether or not it makes sense.
The goal is to let the database handle object level security using RLS so that our application layer doesn’t have to. This enforces consistency by design, and will hopefully result in a lot less code. Just to set the stage without RLS and separate user roles, data may be returned using something like this:
select data.* from data join user_data
on data.id = user_data.data_id
where user_data.user_id = $userId;
This is a fairly straightforward query, but it becomes much messier when a user is executing complicated queries which need to join over tables. This query is doing two things; there is no separation of concerns. Try two queries instead:
set role $userId;
select * from data;
Now our second query is free to grow as complex as we desire, but it will never have to be concerned about ownership. Nice. Let’s jump straight in to see how we can do this and calculate the costs.
Here is a simple schema we’ll work with. Since we are normalizing the ownership of data in user_data this security structure will work for data with multiple owners.
The users table is basically a pg_roles_meta table, with role as a foreign key. This is only included to show where you can store additional data for a user, but it is not necessary.
We can grant all on data to application_user because RLS will take care of the actual read/write permissions on a per user basis. Note that for a real application we might restrict application_users on other tables more carefully.
This is the actual RLS policy, more details on the syntax here. Our condition is simple: does a user_data record exist for the current_user?
The using part of the policy is how the existing records get filtered, so it operates on select, update, and delete queries only.
The with check part of the policy is used to validate new values present in insert and update queries. This could be useful for validation logic, but we don’t need it for this exercise so we’ll let everything pass through.
This is a helpful function which automatically generates the row checked by our policy; every time a user inserts a piece of data, this will add the user_data record which represents their ownership.
It would be better if this record was automatically generated based on our policy since the two concepts are tightly coupled.
First and foremost, we will continue with a role that inherits from application_user. Creating this user is where the one major drawback is notable: we are storing users as pg_roles and in the users table so we need to generate a role and a row in the users table. I’ll do that for bob and alice behind the scenes.
Now let’s actually try this out with a few million rows and watch what happens. When we insert data, we can see our after trigger get fired:
set role bob;
explain analyze insert into data (value) values ('hello world');Insert on data (cost=0.00..0.01 rows=1 width=56) (actual time=0.573..0.573 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=56) (actual time=0.172..0.172 rows=1 loops=1)
Planning time: 0.035 ms
Trigger insert_user_data_trigger: time=0.660 calls=1
Execution time: 1.266 ms
Nice. Adding new things get’s a lot more interesting. I just added a million or so rows for bob and another million or so for alice behind the scenes, now lets select.
set role alice;
explain analyze select * from data;Result (cost=26396.00..43767.00 rows=1001000 width=30) (actual time=95.911..221.031 rows=1001001 loops=1)
One-Time Filter: $0
InitPlan 1 (returns $0)
-> Seq Scan on user_data (cost=0.00..26396.00 rows=1 width=1) (actual time=0.015..95.888 rows=1 loops=1)
Filter: ((data_id = '3015191e-2ff8-439b-ac2f-c4f915333d2a'::uuid) AND (user_role = ("current_user"())::text))
Rows Removed by Filter: 1001000
-> Seq Scan on data (cost=26396.00..43767.00 rows=1001000 width=30) (actual time=0.020..63.459 rows=1001001 loops=1)
Planning time: 0.124 ms
Execution time: 234.884 ms
From my limited knowledge of query explanations, it looks like PostgreSQL ran the select and policy query at the same time, and used the result of the latter to filter the former, (much like a join). This was a fantastic idea because testing each row against the policy would be far slower.
For demonstration, here is a different way to write the using part of the policy which yields drastically slower results:
select true as bool
where user_data.data_id = data.id
and user_data.user_role = current_user
) = true
This policy makes perfect sense, arguably more than the performant one, but it only selects one row at a time. Check out the explain:
Seq Scan on data (cost=0.00..58530.00 rows=750 width=31) (actual time=0.252..327.946 rows=1000 loops=1)
Filter: (SubPlan 1)
Rows Removed by Filter: 1000
-> Seq Scan on user_data (cost=0.00..39.00 rows=1 width=1) (actual time=0.120..0.161 rows=1 loops=2000)
Filter: ((data_id = data.id) AND (user_role = ("current_user"())::text))
Rows Removed by Filter: 2000
Planning time: 0.152 ms
Execution time: 328.040 ms
As expected, this policy is run on each row, one at a time, which kills performance. It is also worth mentioning that I terminated the initial query after ten minutes and truncated the tables. This query is on about one thousand rows, and it is already slower than our previous attempt!
Real World Time
This is proof enough for me to move ahead with a real world application, (using the first policy of course). We may quickly find some edge-case queries which drag the performance of our policy to a halt, but we might also find ways to leverage this feature which radically improve our development cycle. Either way we’ve learned something.
I plan on implementing this in a Thin Stack, an application architecture constructed on the ideology “write less code”. Check back soon for updates.
I was inspired by a few *.stackexchange question and answers which brought me to attempt this: