Problem Statement:
How to get the list of all the broken references in AEM?
Requirement:
Get a List of all the broken references using MCP and provide the report
Introduction:
OOTB we get a Broken reference report provided by MCP, which can be used to get all the broken references in the content repo.

It’s highly recommended to run this process during
- off hours
- Don’t run on the root level
- Run it on 2nd level or 3rd level pages
How to run this process?
Provide Source path
Provide the regex so that it will consider only the references which point to /content or /etc (points to AEM)
You can also provide exclude properties to improve the traversal of nodes.
If you want to verify any broken links in the RTE fields or properties, then check the deep check checkbox and provide the properties list.
But the above process has a few issues.
- Html properties are not working as expected
We need a few customizations to this process by making a few changes to check HTML level references by adding JSOUP API
Add the following dependencies to your POM.xml
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.adobe.acs</groupId>
<artifactId>acs-aem-commons-bundle</artifactId>
<scope>provided</scope>
</dependency>
Get the following Broken reference code into your local as shown below:
- Broken reference List Process
- Broken reference List Process Factory
Add the following code as shown below:
if (htmlFields.contains(property.getKey())) {
stream = stream.flatMap(val -> {
try {
Document doc = Jsoup.parse(val);
Elements anchors = doc.select("a");
return anchors.stream().map(link -> link.attr("href"));
} catch (Exception e) {
log.warn("Could not parse links from property value of {}", property.getKey(), e);
return Stream.empty();
}
});
}

When we run it on wknd site it would look something like this:
