cancel
Showing results for 
Search instead for 
Did you mean: 

Beginner Notion Queries

charlattecamp
New Contributor

Hello! I just set up a Fivetran connector to pull data from my Notion into a BigQuery database. The Notion data is extremely unstructured when compared to the data actually in Notion, so I had to write quite a bit of SQL to get the data to look like I wanted it to. The caveat to the title is that these are queries to start using Notion data, but they aren't necessarily beginner level SQL. Another note is that the SQL Syntax used is for Bigquery, so may need to be edited based on which database you're using. 

First, a query to list out the databases you have. This is how I got the database IDs to be able to filter later queries. 

with database_object_title as (
  select
    id,
    title,
    title_unnest,
    JSON_VALUE(title_unnest,"$.type") as type,
    JSON_QUERY(title_unnest,"$.text") as text,
    JSON_QUERY(title_unnest,"$.annotations") as annotations,
    JSON_VALUE(title_unnest,"$.plain_text") as plain_text
  FROM `notion.database_object`, 
    UNNEST(JSON_QUERY_ARRAY(title)) title_unnest
  where title is not null
  )
select 
  database_object.id, 
  database_object_title.plain_text as title,
  created_time, 
  last_edited_time,
  object,
  page_id, 
  parent_database_id,
from notion.database_object
  left join database_object_title on database_object.id = database_object_title.id
where not _fivetran_deleted
order by title nulls last,last_edited_time desc

 This is a query to get all property values of the pages. I basically unnest each different type of property and then join them back together. Note that the resulting table has a composite key. You'll need to join both the property ID and the page ID for matching the property value with the title and page. 

with page_property_title as (
  select
    id,
    page_id,
    title_unnest,
    JSON_VALUE(title_unnest,"$.type") as type,
    JSON_QUERY(title_unnest,"$.text") as text,
    JSON_QUERY(title_unnest,"$.annotations") as annotations,
    JSON_VALUE(title_unnest,"$.plain_text") as plain_text
  FROM `notion.page_property`, 
    UNNEST(JSON_QUERY_ARRAY(title)) title_unnest
  where title is not null
  )
, page_property_rich_text as (
  select 
    id, 
    page_id,
    rich_text_unnest,
    JSON_VALUE(rich_text_unnest,"$.type") as rich_text_type,
    JSON_QUERY(rich_text_unnest,"$.text") as rich_text_text,
    JSON_QUERY(rich_text_unnest,"$.annotations") as rich_text_annotations,
    JSON_VALUE(rich_text_unnest,"$.plain_text") as rich_text_plain_text
  FROM `notion.page_property`, 
    UNNEST(JSON_QUERY_ARRAY(rich_text)) rich_text_unnest
  where rich_text is not null
), page_property_relation as (
  select 
    id, 
    page_id,
    relation_unnest,
    JSON_VALUE(relation_unnest,"$.id") as relation_unnest_id
  FROM `notion.page_property`, 
    UNNEST(JSON_QUERY_ARRAY(relation)) relation_unnest
  where relation is not null
), page_property_multi_select as (
  select 
    id, 
    page_id,
    multi_select_unnest,
    JSON_VALUE(multi_select_unnest,"$.id") as multi_select_unnest_id,
    JSON_VALUE(multi_select_unnest,"$.name") as multi_select_unnest_name
  FROM `notion.page_property`, 
    UNNEST(JSON_QUERY_ARRAY(multi_select)) multi_select_unnest
  where multi_select is not null
), page_property_rollup as ( 
  select  
    id, 
    page_id,
    JSON_VALUE(page_property.rollup,"$.type") as rollup_type, --just number or array
    JSON_VALUE(page_property.rollup,"$.number") as rollup_number,
    JSON_VALUE(page_property.rollup,"$.function") as rollup_function,
    JSON_VALUE(rollup_unnest,"$.type") as array_type, --relation, rich_text, date
    -- JSON_QUERY(rollup_unnest,"$.relation") as rollup_relation_array,
    JSON_QUERY(rollup_unnest,"$.rich_text") as rollup_rich_text_array,
    JSON_QUERY(rollup_unnest,"$.date") as rollup_date_array,
    JSON_VALUE(JSON_QUERY(rollup_unnest,"$.date"),"$.start") as start_date,
    JSON_VALUE(JSON_QUERY(rollup_unnest,"$.date"),"$.end") as end_date,
    rollup_unnest
  FROM notion.page_property,
    UNNEST(JSON_QUERY_ARRAY(JSON_QUERY(page_property.rollup,"$.array"))) rollup_unnest
  where page_property.rollup is not null
    and JSON_VALUE(rollup_unnest,"$.type") = 'rich_text'
)
, rollup_relation_values as (
  select 
    *,
    JSON_VALUE(relation_unnest,"$.id") as rollup_relation_id
  from page_property_rollup,
  UNNEST(JSON_extract_array(rollup_unnest,"$.relation")) relation_unnest
)
, rollup_rich_text_values as (
  select 
    *,
    JSON_VALUE(rich_text_unnest,"$.type") as rollup_rich_text_type,
    JSON_VALUE(rich_text_unnest,"$.text.content") as rollup_rich_text_content,
  from page_property_rollup,
  UNNEST(JSON_extract_array(rollup_unnest,"$.rich_text")) rich_text_unnest
)

SELECT
  page_property.id,
  page_property.page_id,
  page_property.type,
  -- title
  page_property_title.plain_text as title,
  -- rich_text
  page_property_rich_text.rich_text_plain_text,
  -- checkbox
  page_property.checkbox,
  -- relation
  page_property_relation.relation_unnest_id, --likely should be rolled up. 
  -- last_edited_time
  page_property.last_edited_time,
  -- select
  JSON_VALUE(page_property.select,"$.name") as select_name,
  -- created_time
  page_property.created_time,
  -- rollup
  rollup_rich_text_values.rollup_rich_text_content as rollup_text,
  -- multi_select
  multi_select_unnest_name as multi_select_text,
  -- date
  JSON_VALUE(page_property.date,"$.start") as date_start,
  JSON_VALUE(page_property.date,"$.end") as date_end,
  -- number
  page_property.number
FROM notion.page_property
  left join page_property_title on page_property.page_id = page_property_title.page_id
    and page_property.id = page_property_title.id
  left join page_property_rich_text on page_property.page_id = page_property_rich_text.page_id
    and page_property.id = page_property_rich_text.id 
  left join page_property_relation on page_property.page_id = page_property_relation.page_id
    and page_property.id = page_property_relation.id 
  left join rollup_rich_text_values on page_property.page_id = rollup_rich_text_values.page_id
    and page_property.id = rollup_rich_text_values.id 
  left join page_property_multi_select on page_property.page_id = page_property_multi_select.page_id
    and page_property.id = page_property_multi_select.id 
where not _fivetran_deleted 

From there, we want to use the database properties table to figure out which page properties are actually used in the given database, and what their titles are. For that I used the below query

select 
      page.id,
      page.created_time,
      page.database_id,
      JSON_VALUE(page.icon,"$.emoji") as emoji,
      page.last_edited_time,
      page.object,
      page.parent_page_id,
      case 
        when database_object_property.name ='Related to Bullet Journal (Property)' then 'related_bullet_journals' 
        else database_object_property.name 
      end as property_name,
      database_object_property.id as property_id
  from notion.page 
  left join notion.database_object_property on page.database_id = database_object_property.database_object_id 
  where not page._fivetran_deleted 
    and not database_object_property._fivetran_deleted 

The above query ended up being fairly optional to the end query, as I didn't end up using the pivoting function to make the columns dynamic. For now, I went with the easier option of simply manually aggregating the different property values based on their types. That said, it gives a nice base for the different pages and allows us to join the properties back in:

select 
  pulls_pivot.database_id,
  pulls_pivot.id as page_id,
  string_agg(page_properties_deconstructed.title,',') as name,
  max(pulls_pivot.created_time) as created,
  max(date_start) as date,
  string_agg(case when property_name='Spread' then page_properties_deconstructed.relation_unnest_id else null end,',') as spread, 
  string_agg(page_properties_deconstructed.rollup_text,',') as questions,
  string_agg(case when property_name='Cards' then page_properties_deconstructed.relation_unnest_id else null end,',') as Cards, 
  max(pulls_pivot.last_edited_time) as last_edited_time,
  string_agg(case when property_name='related_bullet_journals' then page_properties_deconstructed.relation_unnest_id else null end,',') as related_bullet_journals,
from pulls_pivot
left join page_properties_deconstructed on pulls_pivot.id = page_properties_deconstructed.page_id 
  and pulls_pivot.property_id = page_properties_deconstructed.id

In the above query, I use case statements to separate out the different properties of the same types (Spread and Cards are both Relation types, for example) and then aggregate them based on their data type (String_agg for strings, max for numerics/dates) and I do that for each property listed out for that database. This will need to be configured specially for your database.

So for my Pulls database that I layout in the above query, there are 7 properties:
Created (datetime), Date (Date), Spread (Relation), Questions (Rollup), Cards (Relation), Last edited time (datetime) and related to bullet journal (relation)

As an example of a full query, I'll use a different database. This is my Cards database which also has 7 properties:
Short meaning (Text), Deck relation (Relation), Number (Select),  Numerology relation (Relation), Suit (Relation), Created time (Datetime), Last edited time (Datetime)
You can see how these are each pulled out and aggregated in the final query.

with page_property_title as (
  select
    id,
    page_id,
    title_unnest,
    JSON_VALUE(title_unnest,"$.type") as type,
    JSON_QUERY(title_unnest,"$.text") as text,
    JSON_QUERY(title_unnest,"$.annotations") as annotations,
    JSON_VALUE(title_unnest,"$.plain_text") as plain_text
  FROM `notion.page_property`, 
    UNNEST(JSON_QUERY_ARRAY(title)) title_unnest
  where title is not null
  )
, page_property_rich_text as (
  select 
    id, 
    page_id,
    rich_text_unnest,
    JSON_VALUE(rich_text_unnest,"$.type") as rich_text_type,
    JSON_QUERY(rich_text_unnest,"$.text") as rich_text_text,
    JSON_QUERY(rich_text_unnest,"$.annotations") as rich_text_annotations,
    JSON_VALUE(rich_text_unnest,"$.plain_text") as rich_text_plain_text
  FROM `notion.page_property`, 
    UNNEST(JSON_QUERY_ARRAY(rich_text)) rich_text_unnest
  where rich_text is not null
), page_property_relation as (
  select 
    id, 
    page_id,
    relation_unnest,
    JSON_VALUE(relation_unnest,"$.id") as relation_unnest_id
  FROM `notion.page_property`, 
    UNNEST(JSON_QUERY_ARRAY(relation)) relation_unnest
  where relation is not null
), page_property_multi_select as (
  select 
    id, 
    page_id,
    multi_select_unnest,
    JSON_VALUE(multi_select_unnest,"$.id") as multi_select_unnest_id,
    JSON_VALUE(multi_select_unnest,"$.name") as multi_select_unnest_name
  FROM `notion.page_property`, 
    UNNEST(JSON_QUERY_ARRAY(multi_select)) multi_select_unnest
  where multi_select is not null
), page_property_rollup as ( 
  select  
    id, 
    page_id,
    JSON_VALUE(page_property.rollup,"$.type") as rollup_type, --just number or array
    JSON_VALUE(page_property.rollup,"$.number") as rollup_number,
    JSON_VALUE(page_property.rollup,"$.function") as rollup_function,
    JSON_VALUE(rollup_unnest,"$.type") as array_type, --relation, rich_text, date
    -- JSON_QUERY(rollup_unnest,"$.relation") as rollup_relation_array,
    JSON_QUERY(rollup_unnest,"$.rich_text") as rollup_rich_text_array,
    JSON_QUERY(rollup_unnest,"$.date") as rollup_date_array,
    JSON_VALUE(JSON_QUERY(rollup_unnest,"$.date"),"$.start") as start_date,
    JSON_VALUE(JSON_QUERY(rollup_unnest,"$.date"),"$.end") as end_date,
    rollup_unnest
  FROM notion.page_property,
    UNNEST(JSON_QUERY_ARRAY(JSON_QUERY(page_property.rollup,"$.array"))) rollup_unnest
  where page_property.rollup is not null
    and JSON_VALUE(rollup_unnest,"$.type") = 'rich_text'
)
, rollup_relation_values as (
  select 
    *,
    JSON_VALUE(relation_unnest,"$.id") as rollup_relation_id
  from page_property_rollup,
  UNNEST(JSON_extract_array(rollup_unnest,"$.relation")) relation_unnest
)
, rollup_rich_text_values as (
  select 
    *,
    JSON_VALUE(rich_text_unnest,"$.type") as rollup_rich_text_type,
    JSON_VALUE(rich_text_unnest,"$.text.content") as rollup_rich_text_content,
  from page_property_rollup,
  UNNEST(JSON_extract_array(rollup_unnest,"$.rich_text")) rich_text_unnest
), page_properties_deconstructed as (
  SELECT
    page_property.id,
    page_property.page_id,
    page_property.type,
    -- title
    page_property_title.plain_text as title,
    -- rich_text
    page_property_rich_text.rich_text_plain_text,
    -- checkbox
    page_property.checkbox,
    -- relation
    page_property_relation.relation_unnest_id, --likely should be rolled up. 
    -- last_edited_time
    page_property.last_edited_time,
    -- select
    JSON_VALUE(page_property.select,"$.name") as select_name,
    -- created_time
    page_property.created_time,
    -- rollup
    rollup_rich_text_values.rollup_rich_text_content as rollup_text,
    -- multi_select
    multi_select_unnest_name as multi_select_text,
    -- date
    JSON_VALUE(page_property.date,"$.start") as date_start,
    JSON_VALUE(page_property.date,"$.end") as date_end,
    -- number
    page_property.number
  FROM notion.page_property
  left join page_property_title on page_property.page_id = page_property_title.page_id
    and page_property.id = page_property_title.id
  left join page_property_rich_text on page_property.page_id = page_property_rich_text.page_id
    and page_property.id = page_property_rich_text.id 
  left join page_property_relation on page_property.page_id = page_property_relation.page_id
    and page_property.id = page_property_relation.id 
  left join rollup_rich_text_values on page_property.page_id = rollup_rich_text_values.page_id
    and page_property.id = rollup_rich_text_values.id 
  left join page_property_multi_select on page_property.page_id = page_property_multi_select.page_id
    and page_property.id = page_property_multi_select.id 
  where not _fivetran_deleted 
), pulls_pivot as (
    select 
      page.id,
      page.created_time,
      page.database_id,
      JSON_VALUE(page.icon,"$.emoji") as emoji,
      page.last_edited_time,
      page.object,
      page.parent_page_id,
      case 
        when database_object_property.name ='Related to Bullet Journal (Property)' then 'related_bullet_journals' 
        else database_object_property.name 
      end as property_name,
      database_object_property.id as property_id
  from notion.page 
  left join notion.database_object_property on page.database_id = database_object_property.database_object_id 
  where not page._fivetran_deleted 
    and not database_object_property._fivetran_deleted 
)
select 
  pulls_pivot.database_id,
  pulls_pivot.id as page_id,
  string_agg(page_properties_deconstructed.title,',') as name,
  string_agg(rich_text_plain_text) as short_meaning,
  string_agg(case when property_name='Deck' then page_properties_deconstructed.relation_unnest_id else null end,',') as Deck_relation,
  max(select_name) as number,
  string_agg(case when property_name='Related to Numerology (Cards)' then page_properties_deconstructed.relation_unnest_id else null end,',') as numerology_relation,
  string_agg(case when property_name='Suit' then page_properties_deconstructed.relation_unnest_id else null end,',') as suit,
  max(pulls_pivot.created_time) as created_time,
  max(pulls_pivot.last_edited_time) as last_edited_time
from pulls_pivot
left join page_properties_deconstructed on pulls_pivot.id = page_properties_deconstructed.page_id 
  and pulls_pivot.property_id = page_properties_deconstructed.id
where database_id='YOUR DB ID HERE' 
group by 1,2

I hope this helps you get started working with your Notion data! Let me know if you have any questions!

5 REPLIES 5

jill-ross
Director of Community
Director of Community

Wow, @charlattecamp - thank you for sharing your work on this! Awesome stuff. 

Toby-Metcalf
Community Admin
Community Admin

Welcome to the Community @charlattecamp - thank you for sharing this.

Cheers,
Toby

brian-dka
New Contributor

Thanks for your queries @charlattecamp ! I wanted to share your queries modified for Snowflake.  There is a couple of instances where I removed a little bit of your code because I didn't need it yet for my notion data (under specific relation conditions).  However, this might help other Snowflake users. 

First - database objects

with database_object_title as (
    select
        id,
        title,
        value:type::varchar(256) as type,
        value:text::varchar(256) as text,
        value:annotations::varchar(256) as annotations,
        value:plain_text::varchar(256) as plain_text
    FROM
        fivetran.notion.database_object,
        lateral flatten(input => title)
)
select
    db.id,
    dot.plain_text as title,
    db.created_time,
    db.last_edited_time,
    db.object,
    db.page_id,
    db.parent_database_id
from
    fivetran.notion.database_object as db
    left join database_object_title as dot
    on db.id = dot.id
where
    not _fivetran_deleted;

and page properties

with page_property_title as (
    select
        id,
        page_id,
        value:type::varchar(256) as type,
        value:text::varchar(256) as text,
        value:annotations::varchar(256) as annotations,
        value:plain_text::varchar(256) as plain_text
    FROM
        fivetran.notion.page_property,
        lateral flatten(input => title)
    where
        title is not null
),
page_property_rich_text as (
    select
        id,
        page_id,
        value:type::varchar as rich_text_type,
        value:text::varchar as rich_text_text,
        value:annotations::varchar as rich_text_annotations,
        value:plain_text::varchar as rich_text_plain_text
    FROM
        fivetran.notion.page_property,
        lateral flatten(input => rich_text)
    where
        rich_text is not null
),
page_property_relation as (
    select
        id,
        page_id,
        value:id::varchar as relation_unnest_id
    FROM
        fivetran.notion.page_property,
        lateral flatten(input => relation)
    where
        relation is not null
),
page_property_multi_select as (
    select
        id,
        page_id,
        value:id::varchar as multi_select_unnest_id,
        value:name::varchar as multi_select_unnest_name
    FROM
        fivetran.notion.page_property,
        lateral flatten(input => multi_select)
    where
        multi_select is not null
),
page_property_rollup as (
    select
        id,
        page_id,
        rollup:type::varchar as rollup_type,
        rollup:number::number as rollup_number,
        rollup:function::varchar as rollup_function
    from
        fivetran.notion.page_property
    where
        page_property.rollup is not null
)
SELECT
    page_property.id,
    page_property.page_id,
    page_property.type,
    page_property_title.plain_text as title,
    page_property_rich_text.rich_text_plain_text,
    page_property.checkbox,
    page_property_relation.relation_unnest_id,
    page_property.last_edited_time,
    page_property.created_time,
    multi_select_unnest_name as multi_select_text 

FROM
    fivetran.notion.page_property
    left join page_property_title on page_property.page_id = page_property_title.page_id
    and page_property.id = page_property_title.id
    left join page_property_rich_text on page_property.page_id = page_property_rich_text.page_id
    and page_property.id = page_property_rich_text.id
    left join page_property_relation on page_property.page_id = page_property_relation.page_id
    and page_property.id = page_property_relation.id
    left join page_property_multi_select on page_property.page_id = page_property_multi_select.page_id
    and page_property.id = page_property_multi_select.id
where
    not _fivetran_deleted

and pages

select
    page.id,
    page.created_time,
    page.database_id,
    page.last_edited_time,
    page.object,
    page.parent_page_id,
    database_object_property.name as property_name,
    database_object_property.id as property_id
from
    fivetran.notion.page
    left join fivetran.notion.database_object_property on page.database_id = database_object_property.database_object_id
where
    not page._fivetran_deleted
    and not database_object_property._fivetran_deleted

 

 

 

 

 

 

 

 

 

Good day @brian-dka 
Welcome to Fivetran Community and thank you for contributing to the conversation.

Cheers,
Toby