Dive into Lightroom catalogues
Table of contents
This note captures my learnings on the contents of the Adobe Lightroom catalogues, particularly its βLibraryβ module.
This entry is about Lightroom version 11. I have no expectation of any of this holding true for past or future versions. I have only inspected a catalogue from the MacOS version so I don't know if any of my findings are specific to this operating system.
The following will assume a catalogue named βsquirrelβ so file and directory names will match that.
# Overview
A catalogue is composed of a few SQLite databases and some auxiliary files.
squirrel.lrcat
: The catalogue. 116 tables with information about the images, folder structure, metadata (XMP, etc), development history, etc.squirrel Helper.lrdata/helper.db
: 22 tables to manage the search using FTS5.squirrel Previews.lrdata/previews.db
: 11 tables to manage the previews stored in disk as.rlprev
image pyramids.squirrel Previews.lrdata/root-pixels.db
: 2 tables to manage small JPEG thumbnails.
# Database schema
The schema is large so I will focus on the core tables from where the rest hangs. Some tables have redundant information, likely to be convenience denormalisation and precomputed values.
To inspect the whole schema you can either use the .schema
command in a standard SQLite session or use a GUI such as Beekeeper Studio. You'll notice that there are no explicit foreign key contraints so you'll have to work out which fields link to which tables. Field naming is fairly consistent so it should be a reasonable effort.
# Images
The Adobe_images
is the core of the catalogue. In it you can find information about ratings, colour labels, whether it's a copy or an original, orientation and more.
# Filesystem
There are three tables representing the file structure linking between the images in the catalogue and the source images stored outside Lightroom.
AgLibraryRootFolder
has the absolute and relative paths to work out where the images sit in the disk and where they sit in relation to the catalogue.AgLibraryFolder
represents any folder structure existing from the root folder inwards.AgLibraryFile
represents any file found in each known folder. Every image inAdobe_images
has a file in this table.
# Metadata
Metadata is split in multiple tables by provenance:
Adobe_AdditionalMetadata
has a fieldxmp
with the whole XMP blob.AgHarvestedExifMetadata
has the Exif information.AgHarvestedIptcMetadata
has the IPTC information.AgHarvestedDNGMetadata
has the DNG information.
There are other metadata tables such as AgLibraryIPTC
but I haven't worked out whether they are more relevant than the ones listed above.
# Keywords
Keywords are normalised across. The main tables are AgLibraryKeyword
, AgLibraryKeywordImage
and AgLibraryKeywordSynonym
.
# Collections
Collections are normalised across. The main tables are AgLibraryCollection
and AgLibraryCollectionImage
.
# Imports
Both AgLibraryImport
and AgLibraryImportImage
group images to the importing action and the date it happened.
# Querying the catalogue
With an acquaintance of the schema we can start pulling some information out of the catalogue.
The following query lists every image with their respective filesystem path, rating, colour label and details around how the photo was taken.
SELECT
root_folder.name AS root_folder,
folder.pathFromRoot AS relative_path,
file.baseName AS file_basename,
file.extension AS file_extension,
COALESCE(image.rating, 0) AS rating,
image.colorLabels AS colour_label,
image.captureTime as capture_time,
exif.focalLength AS focal_length,
ROUND(exif.aperture, 3) AS aperture,
exif.shutterSpeed AS shutter_speed,
ROUND(exif.isoSpeedRating, 0) AS iso_speed_rating,
camera_model.value AS camera,
lens.value AS lens
FROM
AgLibraryFile AS file
LEFT JOIN AgLibraryFolder AS folder
ON folder.id_local = file.folder
LEFT JOIN AgLibraryRootFolder AS root_folder
ON root_folder.id_local = folder.rootFolder
LEFT JOIN Adobe_images AS image
ON file.id_local = image.rootFile
LEFT JOIN AgHarvestedExifMetadata AS exif
ON exif.image = image.id_local
LEFT JOIN AgInternedExifCameraModel AS camera_model
ON camera_model.id_local = metadata.cameraModelRef
LEFT JOIN AgInternedExifLens AS lens
ON lens.id_local = metadata.lensRef
This next query lists the images with their filesystem path, height, width and the full XMP blob.
SELECT
root_folder.absolutePath || folder.pathFromRoot || file.baseName || '.' || file.extension AS path,
image.fileHeight AS height,
image.fileWidth AS width,
metadata.xmp
FROM
Adobe_images AS image
JOIN AgLibraryFile AS file
ON image.rootFile = file.id_local
JOIN AgLibraryFolder AS folder
ON file.folder = folder.id_local
JOIN AgLibraryRootFolder AS root_folder
ON folder.rootFolder = root_folder.id_local
JOIN Adobe_AdditionalMetadata metadata
ON metadata.image = image.id_local
# Timestamps
Timestamps deserve their own section because Lightroom uses different formats depending on the case and not necessarily one you would expect. For example, some values are ISO8601 datetimes, whereas others are large integers looking like a Unix epoch.
These large integers don't have the conventional Unix epoch starting at 1970-01-01T00:00:00Z
though, they start at 2001-01-01T00:00:00Z
. Thus the corresponding Unix time is 978307200
.
Note : At least this is the case in MacOS and seems to boil down to the Cocoa Core Data Timestamp.
With this in mind, to get the creation time in UTC from a virtual copy of an image you'll need to convert values from existing copies to the Unix Epoch and the other ones either filter them out or reset to 0
(i.e. 1970-01-01T00:00:00Z
):
SELECT
id_local,
copyName,
strftime('%Y-%m-%dT%H:%M:%SZ',
CASE
WHEN copyName IS NULL THEN
copyCreationTime - 63113817600
ELSE
copyCreationTime + 978307200
END, 'unixepoch')
AS copy_creation_time,
FROM
Adobe_images AS image
ORDER BY
timestamp DESC
# Thumbnails
There are two types of thumbnail: the root-pixels.db
database holds small JPEG blobs whilst the previews.db
database holds references to the image pyramid files (.lrprev
).
A quick way to see a thumbnail from the root-pixels.db
is as follows:
| |
We extract the JPEG blob as a hexadecimal string, convert it back to binary with xxd
and display it with Kitty 's icat
.
The previews.db
holds information about the .lrprev
files and their original images. Using the main tables you can get a good view of what's available:
SELECT
ImageCacheEntry.imageId AS image_id,
ImageCacheEntry.orientation,
Pyramid.uuid,
Pyramid.digest,
Pyramid.colorProfile AS color_profile,
Pyramid.croppedWidth AS width,
Pyramid.croppedHeight AS height
FROM
Pyramid
JOIN ImageCacheEntry
ON ImageCacheEntry.uuid = Pyramid.uuid
#
Detour: the .lrprev
file
A .lrprev
file is a binary file containing the full image pyramid generated by Lightroom. In short, the binary layout excluding paddings looks like:
<signature = AgHg> <size> <label = "header">
<data>
<signature = AgHg> <size> <label = "level_1">
<data>
β¦
<signature = AgHg> <size> <label = "level_4">
<data>
β¦
EOF
In other words, the file starts with a four byte signature AgHg
(0x41 0x67 0x48 0x67
), followed with the size (541 bytes in this example) and then the label of the block (header
):
ββββββββββ¬ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ¬βββββββββ¬βββββββββ
β00000000β 41 67 48 67 00 20 00 00 β 00 00 00 00 00 00 02 47 βAgHg0 00β000000β’Gβ
β00000010β 00 00 00 00 00 00 00 09 β 68 65 61 64 65 72 00 00 β0000000_βheader00β
ββββββββββ΄ββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ΄βββββββββ΄βββββββββ
After that comes the actual header which is a key/value structure parseable as a Lua table:
pyramid =
From the header data we know this file has 7 pyramid levels which means the rest of the file is expected to have 7 more blocks labelled from level_1
to level_7
where each data block is a fully functional JPEG of the described width and height. I haven't checked but level 1 looks exactly the same as the small thumbnail found in root-pixels.db
.
Note : The uuid
and digest
are also part of the filename which has a pattern such as {uuid}-{digest}.lrprev
. This information also allows joining the dots with the previews.db
tables Pyramid
, PyramidLevel
and ImageCacheEntry
which in turn link to the main catalogue Adobe_images
table.
After the header, a new block starts with the same signature AgHg
, followed by the size and label of the block (level_1
), then the data of size bytes and then it repeats for as many levels as the pyramid has.
# Closing thoughts
It's fair to say that proprietary undocumented formats are not my favourite topic. That said, the fact that Lightroom uses SQLite and keeps things roughtly consistent softens the problem of having data held hostage by businesses. It would be great if Adobe had easy to find documentation for the internals but I recognise they likely don't get any benefit from doing so.
I'm not entirely sure whether Lightroom Classic is here to stay or not but if it is it might be worth it learning how to write a plug-in for it. A quick search in Adobe's website yields no useful getting started tutorials which is a bit of a shame.