Delete Unreferenced Assets AEM

Problem Statement:

Delete all the assets which don’t have references to improve AEM performance in turn Indexes and search/query performance.

Introduction:

How do assets get published?

  1. The author uploads the images and publishes the assets
  2. Create a launcher and workflow which processes assets metadata and publishes the pages
  3. Whenever we publish any pages and if the page has references to assets, then during publishing, it asks to replicate the references as well.

What happens when the page is unpublished?

  1. When the page is deactivated, assets referenced to the page will not be deactivated because this asset might have reference to the other pages hence out of the box assets won’t be deactivated.
  2. If we perform cleanup, deactivate and delete old pages, we might not be cleaning up assets related to this page.

What advantages of cleaning up of old assets?

  1. Drastically reduces repository size
  2. Improves DAM Asset search
  3. Improves indexing

Generate Published Asset Report by visiting:

Go to Tools -> Assets -> Reports as shown below:

Asset Report Tool Section

Click on create and click on Publish report

Asset Publish Report

Provide folder path and start date and end date

Asset Report Configure Page

Select the columns as per the requirement

Asset Report Custom Column Page

Finally, the report will be ready with all the assets lists as shown below

Asset Report

Download the report to see the final list of images

DAM Report result

To generate the report let’s create a new AEM tool:

AEM OOTB comes with multiple tools in AEM and to access all the tools you need to navigate to the tool section and select the appropriate sections to perform all the operations

For example:
  1. AEM operations
  2. Managing templates
  3. Cloud configurations
  4. ACS Commons tools etc..

Tools are an essential part of AEM and avoid any dependency on Groovy scripts or any add scripts and can be managed or extended at any given point in time.

Broken Asset Report generates a report of all the unreferenced assets by running a Reference Search query across the repository every 30s (by default), you can update the scheduler expression based on your repository size.

The scheduler also checks for the CPU/HEAP size before triggering the reference search process and for more details on Throttled scheduler please refer to the link.

In order to create a perfect content backup package please use the Tool generator to give your tool name and descriptions: https://kiransg.com/2022/11/24/aem-tool-create-generate-tool-from-scratch

Please refer to my GitHub repository for working code on Broken Asset Reference:

Once the repository is built and deployed, you will be able to access the Broken Asset Reference report as shown below:

Broken Asset Reference Tool Section

You can select the report just now created by you in the drop down as shown below:

Select the Asset Report from dropdown

Provide scheduler expression as per your needs and select the result refresh interval for every 10s or as per your needs

You can see the results as shown below once the process is kicked off and it will also show the current row its processing and also CPU/Heap usage.

Reference Search Running status

For some reason, if your system CPU/Heap is throttling then from the backend it takes care of not running your scheduler or you can also manually unschedule the scheduler.

Once your system’s CPU comes back to normal you can go back and select the report and schedule again and report generation picks from the current row where it was left off.

Once the processing is complete click on the report name to download the generated report.

Reference Search Completed Status
Report Excel with Has reference Column

Cross verification:

  1. You can rerun the generated report on the MCP broken asset reference.
  2. Generate the Splunk (logs) results by running a query to get all the assets to call (/content/dam) requests on dispatcher/publisher from the past 1year or so.
  3. You can also reach out to the Analytics team, requesting image impressions (data on image usage) from the past 1 year or so.

Please provide your valuable feedback in the comments.