Skip to content

Instantly share code, notes, and snippets.

@NightJar
Created February 27, 2023 02:53
Show Gist options
  • Save NightJar/ad374840b4775dd3bf3c3d6d64cfa940 to your computer and use it in GitHub Desktop.
Save NightJar/ad374840b4775dd3bf3c3d6d64cfa940 to your computer and use it in GitHub Desktop.
An SQL query that will get a list of every page on the site (regardless of visibility to any given user) along with published status. Useful for information architecture (IA) reviews by site owners or groups thereof.
-- file names are relative to the active database's data directory (e.g. /var/lib/mysql/SS_yoursite/${outfile}.csv) and are therefore only readable by the mysql user (use sudo) - or output to an absolute path with write permission e.g. /tmp as below
with recursive kids as (
select st.ID, URLSegment, cast(URLSegment as varchar(1000)) as Link, ParentID, Title, st.Version, LastEdited
from SiteTree st
left join Page pagedata on st.ID = pagedata.ID
where st.ParentID = 0
union all
select child.ID, child.URLSegment, cast(concat(parent.Link, '/', child.URLSegment) as varchar(1000)) as Link, child.ParentID, child.Title, child.Version, child.LastEdited
from SiteTree as child
left join Page childpagedata on child.ID = childpagedata.ID
join kids as parent on parent.ID = child.ParentID
)
select 'ID' ID, 'Title' Title, 'Link' Link, 'Last edited' LastEdited, 'Published status' as 'status'
union all
select
kids.ID,
kids.Title,
concat('https://www.your-domain.test/', kids.Link),
kids.LastEdited,
case when stl.Version is null then 'not published' when stl.Version = kids.Version then 'live' else 'draft changes' end 'status'
into outfile '/tmp/allThePagesOnMySite.csv' character set 'utf8' fields terminated by ',' optionally enclosed by '"' lines terminated by '\r\n'
from kids
left join SiteTree_Live stl on kids.ID = stl.ID;
@NightJar
Copy link
Author

This query is probably very MySQL (or MariaDB) specific in regards to the into outfile segment at the bottom. The SELECT portion should be relatively portable between RDBMS's, provided they support recursive queries.

The where section in the first select statement can be altered to output e.g. just a single section of the site. E.g. where st.URLSegment = 'product-catalogue' and st.ID = '123' (where 123 is the ID of the product catalogue page).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment