Broken Page References AEM

Problem Statement:

How to get the list of all the broken references in AEM?


Get a List of all the broken references using MCP and provide the report


OOTB we get a Broken reference report provided by MCP, which can be used to get all the broken references in the content repo.

Broken Refernce Report

It’s highly recommended to run this process during

  1. off hours
  2. Don’t run on the root level
  3. Run it on 2nd level or 3rd level pages

How to run this process?

Provide Source path

Provide the regex so that it will consider only the references which point to /content or /etc (points to AEM)

You can also provide exclude properties to improve the traversal of nodes.

If you want to verify any broken links in the RTE fields or properties, then check the deep check checkbox and provide the properties list.

But the above process has a few issues.

  1. Html properties are not working as expected

We need a few customizations to this process by making a few changes to check HTML level references by adding JSOUP API

Add the following dependencies to your POM.xml


Get the following Broken reference code into your local as shown below:

Add the following code as shown below:

if (htmlFields.contains(property.getKey())) {
            stream = stream.flatMap(val -> {
                try {
                    Document doc = Jsoup.parse(val);
                    Elements anchors ="a");
                    return -> link.attr("href"));
                } catch (Exception e) {
                    log.warn("Could not parse links from property value of {}", property.getKey(), e);
                    return Stream.empty();
At Line number 207

When we run it on wknd site it would look something like this:

Broken Reference Report

Broken Asset References AEM

Problem Statement:

Get a List of all the Assets which are missing references


Get the list of broken asset references to unpublish and remove them repo to improve the system stability and performance.


How do assets get published?

  1.  The author uploads the images and publishes the assets
  2. Create a launcher and workflow which process assets metadata and publish the pages
  3. Whenever we publish any pages and if the page has references to assets, then during publishing, it asks to replicate the references as well.

What happens when the page is unpublished?

  1.  When the page is deactivated, assets referenced to the page will not be deactivated because this asset might have reference to the other pages hence out of the box assets won’t be deactivated.
  2. If we perform cleanup, deactivate and delete old pages, we might not be cleaning up assets related to this page.

Advantages of cleaning up old assets?

  1. Drastically reduces repository size
  2. Improves DAM Asset search
  3. Improves indexing

Get Publish Report using Assets Report:

  • Go to Tools -> Assets -> Reports as shown below:
Asset Reports
  • Click on create and click on Publish report
Select Publish report
  • Provide folder path and start date and end date
Add Report details
  • Select the columns as per requirement
Configure columns for the report
  • Finally, report will be ready with all the assets lists as shown below
Completed Reports
  • Download the report to see the final list of images
Example Report CSV file

If Images are unpublished then we can ask authors to review and delete them

If images are published but has no references to figure this out, we need a new process.

Broken Asset reference:

Add the following dependencies to your pom.xml


Create a new MCP process by calling the MCP service and providing the implementation class:

package com.mysite.core.mcp;

import org.osgi.service.component.annotations.Component;
import org.osgi.service.component.annotations.Reference;
import com.adobe.acs.commons.mcp.ProcessDefinitionFactory;

@Component(service = ProcessDefinitionFactory.class)
public class BrokenAssetsFactory extends ProcessDefinitionFactory<BrokenAssets> {

    Replicator replicator;

    public String getName() {
        return "Broken Asset References";

    public BrokenAssets createProcessDefinitionInstance() {
        return new BrokenAssets(replicator);

Create an implementation class as shown below:

package com.mysite.core.mcp;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.EnumMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.Optional;
import javax.jcr.RepositoryException;
import org.apache.commons.collections4.ListUtils;
import org.apache.commons.lang3.StringUtils;
import org.apache.jackrabbit.util.Text;
import org.jetbrains.annotations.NotNull;
import com.adobe.acs.commons.fam.ActionManager;
import com.adobe.acs.commons.mcp.ProcessDefinition;
import com.adobe.acs.commons.mcp.ProcessInstance;
import com.adobe.acs.commons.mcp.form.Description;
import com.adobe.acs.commons.mcp.form.FileUploadComponent;
import com.adobe.acs.commons.mcp.form.FormField;
import com.adobe.acs.commons.mcp.form.RadioComponent;
import com.adobe.acs.commons.mcp.model.GenericReport;
import com.adobe.acs.commons.mcp.model.ManagedProcess;

 * Relocate Pages and/or Sites using a parallelized move process
public class BrokenAssets extends ProcessDefinition {
	private static final String SOURCE_PATH = "Path";
	private static final String CONTAINS_QUERY = "CONTAINS(s.*, '%s') or ";
	private final GenericReport report = new GenericReport();
	List<EnumMap<ReportColumns, String>> reportData = Collections.synchronizedList(new ArrayList<>());
	List<String> assetsList = new LinkedList<>();
    public enum PublishMethod {
        @Description("Select this option to generate Broken References Report")
    Replicator replicatorService;
    @FormField(name = "Mapping Excel", component = FileUploadComponent.class)
    private RequestParameter mappingExcel;
    private Spreadsheet spreadsheet;
    @FormField(name = "Type",
            description = "Please select the oprporiate option to execute",
            component = RadioComponent.EnumerationSelector.class,
            required = true,
            options = {"vertical", "default=BROKEN_REFERENCE_REPORT"})
    public PublishMethod publishMethod = PublishMethod.BROKEN_REFERENCE_REPORT;

	@FormField(name="Content Path",
			description="Content Path for search results",
	public String contentPath = "/content";
    @FormField(name="Chunk count",
            description="Max number of chunk the search results",
    public int chunkCount = 3000;
            description="Max number of retries per commit",
    public int retryCount = 2;
    @FormField(name="Retry delay",
            description="Delay between retries (in milliseconds)",
    public int retryWait = 500;
    public BrokenAssets(Replicator replicator) {
    	replicatorService = replicator;

    public void init() throws RepositoryException {    	
    	if(null != mappingExcel && mappingExcel.getSize() > 0) {
    		try {
                // Read spreadsheet
        		spreadsheet = new Spreadsheet(mappingExcel, SOURCE_PATH).buildSpreadsheet();
        		spreadsheet.getDataRowsAsCompositeVariants().forEach(row -> assetsList.add(getString(row, SOURCE_PATH)) );
            } catch (IOException e) {
                throw new RepositoryException("Unable to process spreadsheet", e);

    ManagedProcess instanceInfo;

    public void buildProcess(ProcessInstance instance, ResourceResolver rr) throws LoginException, RepositoryException {
        instanceInfo = instance.getInfo();
        switch ( {
	        case "broken_reference_report":
	        	instance.getInfo().setDescription("Collecting references");
	        	instance.defineAction("Searching Refs", rr, this::collectRefernce);

    public void storeReport(ProcessInstance instance, ResourceResolver rr) throws RepositoryException, PersistenceException {    	
        report.setRows(reportData, ReportColumns.class);
        report.persist(rr, instance.getPath() + "/jcr:content/report");

    public enum ReportColumns {
    protected void collectRefernce(ActionManager manager) {
    private void collectReferences(ResourceResolver resourceResolver) throws Exception {    	
		if(!assetsList.isEmpty()) {						 
			RetryUtils.withRetry(retryCount, retryWait, () -> {
			    List<List<String>> chunkedAssetList = ListUtils.partition(assetsList, chunkCount);					
			    int counter = 0;
			    while(!chunkedAssetList.isEmpty() && counter < chunkedAssetList.size()) {
			        List<String> chunk = chunkedAssetList.get(counter);
			        @NotNull Iterator<Resource> resourceResults = buildSQLQueryAndFetchResults(resourceResolver, contentPath, chunk);
			        collectReferences(resourceResults, chunk);
			if(!assetsList.isEmpty()) {;

	private void collectReferences(@NotNull Iterator<Resource> resourceResults, List<String> chunk) {
		while(resourceResults.hasNext()) {
			Resource resultRes =;
			if (null != resultRes) {
				resultRes.getValueMap().entrySet().forEach(property -> {
					if(!property.getKey().equalsIgnoreCase("dam:folderThumbnailPaths")) {
						Object prop = property.getValue();
						if (prop.getClass() == String[].class) {
							List<String> propertyValue = Arrays.asList((String[]) prop);
							if (!propertyValue.isEmpty()) {
								List<String> matchingAsset =
										assetPat -> -> sam.contains(assetPat)))
								if(!matchingAsset.isEmpty()) {
						} else if (prop.getClass() == String.class) {
							String propertyValue = (String) prop;
							if (StringUtils.isNotEmpty(propertyValue) ) {
								Optional<String> matchingAsset =;
								if(matchingAsset.isPresent()) {

	private void reportResult(String path) {
		EnumMap<ReportColumns, String> row = new EnumMap<>(ReportColumns.class);
		row.put(ReportColumns.ASSET_PATH, path);		
	private @NotNull Iterator<Resource> buildSQLQueryAndFetchResults(ResourceResolver resolver, String contentPath, List<String> chunk) {
        String groupStr = -> String.format(CONTAINS_QUERY, Text.escapeIllegalXpathSearchChars(r).replaceAll("'", "''"))).collect(Collectors.joining());        
		String querySt = "SELECT * FROM [nt:base] AS s WHERE ISDESCENDANTNODE(["+contentPath+"]) and "+ StringUtils.substringBeforeLast(groupStr, "or");
		return resolver.findResources(querySt, "JCR-SQL2");
	@SuppressWarnings({"rawtypes", "unchecked"})
    private String getString(Map<String, CompositeVariant> row, String path) {
        CompositeVariant v = row.get(path.toLowerCase(Locale.ENGLISH));
        if (v != null) {
            return (String) v.getValueAs(String.class);
        } else {
            return null;
package com.mysite.core.mcp;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class RetryUtils {
    private static final Logger log = LoggerFactory.getLogger(RetryUtils.class);

    public static interface CallToRetry {
        void process() throws Exception;

    public static boolean withRetry(int maxTimes, long intervalWait, CallToRetry call) throws Exception {
        if (maxTimes <= 0) {
            throw new IllegalArgumentException("Must run at least one time");
        if (intervalWait <= 0) {
            throw new IllegalArgumentException("Initial wait must be at least 1");
        Exception thrown = null;
        for (int i = 0; i < maxTimes; i++) {
            try {
                return true;
            } catch (Exception e) {
                thrown = e;
      "Encountered failure on {} due to {}, attempt retry {} of {}", call.getClass().getName() , e.getMessage(), (i + 1), maxTimes, e);
            try {
            } catch (InterruptedException wakeAndAbort) {
        throw thrown;

Running Asset Reference Process:

After building the code you can see the new Process showing up in MCP

Borken Asset Refernce Process

Copy the Path column into the new Excel sheet as shown below

Path column into new excel file

Upload into the process and start to see all the images which are published yet unreferenced as shown below

Why does Chunk count?

Chunk count helps the SQL 2 query to group by the paths, which will be maxing 4500 and it won’t take more than that (configurable based on the environment). However basically, if we have 20000 / 4500 = 4.44 ~ 5 we will be running the query max five times to generate the below report

Share the report with the content authors team to validate if images are required if not plan to clean up using

After completing the process
Example report and we can download and share with Authors

Clean up Process:

Authors don’t have to unpublish and delete individual images, then can you use the below process to upload the excel sheet with all the approved image paths and upload it to the process to deactivate and delete

AEM Publish / UnPublish / Delete List of pages – MCP Process