Broken Asset References AEM


Problem Statement:

Get a List of all the Assets which are missing references


Requirement:

Get the list of broken asset references to unpublish and remove them repo to improve the system stability and performance.


Introduction:

How do assets get published?

  1.  The author uploads the images and publishes the assets
  2. Create a launcher and workflow which process assets metadata and publish the pages
  3. Whenever we publish any pages and if the page has references to assets, then during publishing, it asks to replicate the references as well.

What happens when the page is unpublished?

  1.  When the page is deactivated, assets referenced to the page will not be deactivated because this asset might have reference to the other pages hence out of the box assets won’t be deactivated.
  2. If we perform cleanup, deactivate and delete old pages, we might not be cleaning up assets related to this page.

Advantages of cleaning up old assets?

  1. Drastically reduces repository size
  2. Improves DAM Asset search
  3. Improves indexing

Get Publish Report using Assets Report:

  • Go to Tools -> Assets -> Reports as shown below:
Asset Reports
  • Click on create and click on Publish report
Select Publish report
  • Provide folder path and start date and end date
Add Report details
  • Select the columns as per requirement
Configure columns for the report
  • Finally, report will be ready with all the assets lists as shown below
Completed Reports
  • Download the report to see the final list of images
Example Report CSV file

If Images are unpublished then we can ask authors to review and delete them

If images are published but has no references to figure this out, we need a new process.

Broken Asset reference:

Add the following dependencies to your pom.xml

        <dependency>
            <groupId>org.jetbrains</groupId>
            <artifactId>annotations</artifactId>
            <version>17.0.0</version>
        </dependency>
        <dependency>
            <groupId>com.adobe.acs</groupId>
            <artifactId>acs-aem-commons-bundle</artifactId>
            <version>5.2.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.13.1</version>
            <scope>provided</scope>
        </dependency>

Create a new MCP process by calling the MCP service and providing the implementation class:

package com.mysite.core.mcp;

import org.osgi.service.component.annotations.Component;
import org.osgi.service.component.annotations.Reference;
import com.adobe.acs.commons.mcp.ProcessDefinitionFactory;
import com.day.cq.replication.Replicator;

@Component(service = ProcessDefinitionFactory.class)
public class BrokenAssetsFactory extends ProcessDefinitionFactory<BrokenAssets> {

    @Reference
    Replicator replicator;

    @Override
    public String getName() {
        return "Broken Asset References";
    }

    @Override
    public BrokenAssets createProcessDefinitionInstance() {
        return new BrokenAssets(replicator);
    }
}

Create an implementation class as shown below:

package com.mysite.core.mcp;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.EnumMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collectors;
import javax.jcr.RepositoryException;
import org.apache.commons.collections4.ListUtils;
import org.apache.commons.lang3.StringUtils;
import org.apache.jackrabbit.util.Text;
import org.apache.sling.api.request.RequestParameter;
import org.apache.sling.api.resource.LoginException;
import org.apache.sling.api.resource.PersistenceException;
import org.apache.sling.api.resource.Resource;
import org.apache.sling.api.resource.ResourceResolver;
import org.jetbrains.annotations.NotNull;
import com.adobe.acs.commons.data.CompositeVariant;
import com.adobe.acs.commons.data.Spreadsheet;
import com.adobe.acs.commons.fam.ActionManager;
import com.adobe.acs.commons.mcp.ProcessDefinition;
import com.adobe.acs.commons.mcp.ProcessInstance;
import com.adobe.acs.commons.mcp.form.Description;
import com.adobe.acs.commons.mcp.form.FileUploadComponent;
import com.adobe.acs.commons.mcp.form.FormField;
import com.adobe.acs.commons.mcp.form.RadioComponent;
import com.adobe.acs.commons.mcp.model.GenericReport;
import com.adobe.acs.commons.mcp.model.ManagedProcess;
import com.day.cq.replication.ReplicationStatus;
import com.day.cq.replication.Replicator;

/**
 * Relocate Pages and/or Sites using a parallelized move process
 */
public class BrokenAssets extends ProcessDefinition {
	
	private static final String SOURCE_PATH = "Path";
	private static final String CONTAINS_QUERY = "CONTAINS(s.*, '%s') or ";
	private final GenericReport report = new GenericReport();
	List<EnumMap<ReportColumns, String>> reportData = Collections.synchronizedList(new ArrayList<>());
	
	List<String> assetsList = new LinkedList<>();
	
    public enum PublishMethod {
        @Description("Select this option to generate Broken References Report")
        BROKEN_REFERENCE_REPORT
    }
    
    Replicator replicatorService;
    
    @FormField(name = "Mapping Excel", component = FileUploadComponent.class)
    private RequestParameter mappingExcel;
    private Spreadsheet spreadsheet;
    
    @FormField(name = "Type",
            description = "Please select the oprporiate option to execute",
            component = RadioComponent.EnumerationSelector.class,
            required = true,
            options = {"vertical", "default=BROKEN_REFERENCE_REPORT"})
    public PublishMethod publishMethod = PublishMethod.BROKEN_REFERENCE_REPORT;

	@FormField(name="Content Path",
			description="Content Path for search results",
			hint="/content",
			options={"default=/content"})
	public String contentPath = "/content";
    
    @FormField(name="Chunk count",
            description="Max number of chunk the search results",
            hint="4500",
            options={"default=3000"})
    public int chunkCount = 3000;
    
    @FormField(name="Retries",
            description="Max number of retries per commit",
            hint="2",
            options={"default=2"})
    public int retryCount = 2;
    
    @FormField(name="Retry delay",
            description="Delay between retries (in milliseconds)",
            hint="500,1000,...",
            options={"default=500"})
    public int retryWait = 500;
    
    public BrokenAssets(Replicator replicator) {
    	replicatorService = replicator;
	}

    @Override
    public void init() throws RepositoryException {    	
    	if(null != mappingExcel && mappingExcel.getSize() > 0) {
    		try {
                // Read spreadsheet
        		spreadsheet = new Spreadsheet(mappingExcel, SOURCE_PATH).buildSpreadsheet();
        		spreadsheet.getDataRowsAsCompositeVariants().forEach(row -> assetsList.add(getString(row, SOURCE_PATH)) );
            } catch (IOException e) {
                throw new RepositoryException("Unable to process spreadsheet", e);
            }
    	}    	    	    	
    }

    ManagedProcess instanceInfo;

    @Override
    public void buildProcess(ProcessInstance instance, ResourceResolver rr) throws LoginException, RepositoryException {
        instanceInfo = instance.getInfo();
        switch (publishMethod.name().toLowerCase()) {
	        case "broken_reference_report":
	        	instance.getInfo().setDescription("Collecting references");
	        	instance.defineAction("Searching Refs", rr, this::collectRefernce);
	            break;
        }        
    }

    @Override
    public void storeReport(ProcessInstance instance, ResourceResolver rr) throws RepositoryException, PersistenceException {    	
        report.setRows(reportData, ReportColumns.class);
        report.persist(rr, instance.getPath() + "/jcr:content/report");
    }

    public enum ReportColumns {
    	ASSET_PATH
    }
            
    protected void collectRefernce(ActionManager manager) {
    	manager.deferredWithResolver(this::collectReferences); 
    }
    
    private void collectReferences(ResourceResolver resourceResolver) throws Exception {    	
		if(!assetsList.isEmpty()) {						 
			RetryUtils.withRetry(retryCount, retryWait, () -> {
			    List<List<String>> chunkedAssetList = ListUtils.partition(assetsList, chunkCount);					
			    int counter = 0;
			    while(!chunkedAssetList.isEmpty() && counter < chunkedAssetList.size()) {
			        List<String> chunk = chunkedAssetList.get(counter);
			        counter++;
			        @NotNull Iterator<Resource> resourceResults = buildSQLQueryAndFetchResults(resourceResolver, contentPath, chunk);
			        collectReferences(resourceResults, chunk);
			    }
			});	
			if(!assetsList.isEmpty()) {
				assetsList.stream().forEach(this::reportResult);
			}
		}	        			
    }

	private void collectReferences(@NotNull Iterator<Resource> resourceResults, List<String> chunk) {
		while(resourceResults.hasNext()) {
			Resource resultRes = resourceResults.next();
			if (null != resultRes) {
				resultRes.getValueMap().entrySet().forEach(property -> {
					if(!property.getKey().equalsIgnoreCase("dam:folderThumbnailPaths")) {
						Object prop = property.getValue();
						if (prop.getClass() == String[].class) {
							List<String> propertyValue = Arrays.asList((String[]) prop);
							if (!propertyValue.isEmpty()) {
								List<String> matchingAsset = chunk.stream().filter(
										assetPat -> propertyValue.stream().anyMatch(sam -> sam.contains(assetPat)))
										.collect(Collectors.toList());
								if(!matchingAsset.isEmpty()) {
									chunk.removeIf(matchingAsset::contains);
									assetsList.removeIf(matchingAsset::contains);
								}
							}
						} else if (prop.getClass() == String.class) {
							String propertyValue = (String) prop;
							if (StringUtils.isNotEmpty(propertyValue) ) {
								Optional<String> matchingAsset = chunk.stream().filter(propertyValue::contains).findAny();
								if(matchingAsset.isPresent()) {
									chunk.remove(matchingAsset.get());
									assetsList.remove(matchingAsset.get());
								}
							}
						}
					}
				});
			}
		}
	}

	private void reportResult(String path) {
		EnumMap<ReportColumns, String> row = new EnumMap<>(ReportColumns.class);
		row.put(ReportColumns.ASSET_PATH, path);		
		reportData.add(row);
	}	
	
	private @NotNull Iterator<Resource> buildSQLQueryAndFetchResults(ResourceResolver resolver, String contentPath, List<String> chunk) {
        String groupStr = chunk.stream().map(r -> String.format(CONTAINS_QUERY, Text.escapeIllegalXpathSearchChars(r).replaceAll("'", "''"))).collect(Collectors.joining());        
		String querySt = "SELECT * FROM [nt:base] AS s WHERE ISDESCENDANTNODE(["+contentPath+"]) and "+ StringUtils.substringBeforeLast(groupStr, "or");
		return resolver.findResources(querySt, "JCR-SQL2");
    }
	
	@SuppressWarnings({"rawtypes", "unchecked"})
    private String getString(Map<String, CompositeVariant> row, String path) {
        CompositeVariant v = row.get(path.toLowerCase(Locale.ENGLISH));
        if (v != null) {
            return (String) v.getValueAs(String.class);
        } else {
            return null;
        }
    }
}
package com.mysite.core.mcp;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class RetryUtils {
    private static final Logger log = LoggerFactory.getLogger(RetryUtils.class);

    public static interface CallToRetry {
        void process() throws Exception;
    }

    public static boolean withRetry(int maxTimes, long intervalWait, CallToRetry call) throws Exception {
        if (maxTimes <= 0) {
            throw new IllegalArgumentException("Must run at least one time");
        }
        if (intervalWait <= 0) {
            throw new IllegalArgumentException("Initial wait must be at least 1");
        }
        Exception thrown = null;
        for (int i = 0; i < maxTimes; i++) {
            try {
                call.process();
                return true;
            } catch (Exception e) {
                thrown = e;
                log.info("Encountered failure on {} due to {}, attempt retry {} of {}", call.getClass().getName() , e.getMessage(), (i + 1), maxTimes, e);
            }
            try {
                Thread.sleep(intervalWait);
            } catch (InterruptedException wakeAndAbort) {
                break;
            }
        }
        throw thrown;
    }
}

Running Asset Reference Process:

After building the code you can see the new Process showing up in MCP

Borken Asset Refernce Process

Copy the Path column into the new Excel sheet as shown below

Path column into new excel file

Upload into the process and start to see all the images which are published yet unreferenced as shown below

Why does Chunk count?

Chunk count helps the SQL 2 query to group by the paths, which will be maxing 4500 and it won’t take more than that (configurable based on the environment). However basically, if we have 20000 / 4500 = 4.44 ~ 5 we will be running the query max five times to generate the below report

Share the report with the content authors team to validate if images are required if not plan to clean up using

After completing the process
Example report and we can download and share with Authors

Clean up Process:

Authors don’t have to unpublish and delete individual images, then can you use the below process to upload the excel sheet with all the approved image paths and upload it to the process to deactivate and delete

AEM Publish / UnPublish / Delete List of pages – MCP Process