Comparing Stream-Based, Page.listChildren, and Query Builder Methods for Listing AEM Children Pages

Problem Statement:

What is the best way to list all the children in AEM?

Stream-based VS page.listChildren VS Query Builder

Introduction:

AEM Sling Query is a resource traversal tool recommended for content traversal in AEM. Traversal using listChildren(), getChildren(), or the Resource API is preferable to writing JCR Queries as querying can be more costly than traversal. Sling Query is not a replacement for JCR Queries. When traversal involves checking multiple levels down, Sling Query is recommended because it involves lazy evaluation of query results.

JCR queries in AEM development and recommends using them sparingly in production environments due to performance concerns. JCR queries are suitable for end-user searches and structured content retrieval but should not be used for rendering requests such as navigation or content counts.

How can I get all the child pages in AEM using JCR Query?

List<String> queryList = new ArrayList<>();
Map<String, String> map = new HashMap<>();
map.put("path", resource.getPath());
map.put("type", "cq:PageContent");
map.put("p.limit", "-1");

Session session = resolver.adaptTo(Session.class);
Query query = queryBuilder.createQuery(PredicateGroup.create(map), session);
SearchResult result = query.getResult();
ResourceResolver leakingResourceResolverReference = null;
try {
    for (final Hit hit : result.getHits()) {
        if (leakingResourceResolverReference == null) {
            leakingResourceResolverReference = hit.getResource().getResourceResolver();
        }
        queryList.add(hit.getPath());
    }
} catch (RepositoryException e) {
    log.error("Error collecting inherited section search results", e);
} finally {
    if (leakingResourceResolverReference != null) {
        leakingResourceResolverReference.close();
    }
}

But JCR Query consumes more resources

AEM recommends using Page.listchildren because of less complexity

List<String> pageList = new ArrayList<>();
Page page = resource.adaptTo(Page.class);
Iterator<Page> childIterator = page.listChildren(new PageFilter(), true);
StreamSupport.stream(((Iterable<Page>) () -> childIterator).spliterator(), false).forEach( r -> {
    pageList.add(r.getPath());
    }
);

But it sometimes misses some results in the result set and it’s slower compared to Java streams based

How about Java streams?

Java streams can iterate faster and execute faster and consumes very few resources

List<String> streamList = new ArrayList<>();
for (Resource descendant : (Iterable<? extends Resource>) traverse(resource)::iterator) {
    streamList.add(descendant.getPath());
}
private Stream<Resource> traverse(@NotNull Resource resourceRoot) {
    Stream<Resource> children = StreamSupport.stream(resourceRoot.getChildren().spliterator(), false)
            .filter(this::shouldFollow);
    return Stream.concat(
            shouldInclude(resourceRoot) ? Stream.of(resourceRoot) : Stream.empty(),
            children.flatMap(this::traverse)
    );
}

protected boolean shouldFollow(@NotNull Resource resource) {
    return !JcrConstants.JCR_CONTENT.equals(resource.getName());
}

protected boolean shouldInclude(@NotNull Resource resource) {
    return resource.getChild(JcrConstants.JCR_CONTENT) != null;
}

I recently came across this logic while debugging the OOTB sling sitemap generator: https://github.com/apache/sling-org-apache-sling-sitemap

results comparison

Stream-based results took just 3miliseconds compared to page.listChildren or query

Broken Page References AEM


Problem Statement:

How to get the list of all the broken references in AEM?


Requirement:

Get a List of all the broken references using MCP and provide the report


Introduction:

OOTB we get a Broken reference report provided by MCP, which can be used to get all the broken references in the content repo.

Broken Refernce Report

It’s highly recommended to run this process during

  1. off hours
  2. Don’t run on the root level
  3. Run it on 2nd level or 3rd level pages

How to run this process?

Provide Source path

Provide the regex so that it will consider only the references which point to /content or /etc (points to AEM)

You can also provide exclude properties to improve the traversal of nodes.

If you want to verify any broken links in the RTE fields or properties, then check the deep check checkbox and provide the properties list.

But the above process has a few issues.

  1. Html properties are not working as expected

We need a few customizations to this process by making a few changes to check HTML level references by adding JSOUP API

Add the following dependencies to your POM.xml

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>com.adobe.acs</groupId>
    <artifactId>acs-aem-commons-bundle</artifactId>
    <scope>provided</scope>
</dependency>

Get the following Broken reference code into your local as shown below:

Add the following code as shown below:

if (htmlFields.contains(property.getKey())) {
            stream = stream.flatMap(val -> {
                try {
                    Document doc = Jsoup.parse(val);
                    Elements anchors = doc.select("a");
                    return anchors.stream().map(link -> link.attr("href"));
                } catch (Exception e) {
                    log.warn("Could not parse links from property value of {}", property.getKey(), e);
                    return Stream.empty();
                }
            });
        }
At Line number 207

When we run it on wknd site it would look something like this:

Broken Reference Report

AEM with Java streams

Problem statement:

How to use java streams in AEM? Can I use streams for iterating and resources?

Requirement:

Use Java streams to iterate child nodes, validating and resources and API’s.

Introduction:

There are a lot of benefits to using streams in Java, such as the ability to write functions at a more abstract level which can reduce code bugs, compact functions into fewer and more readable lines of code, and the ease they offer for parallelization

  • Streams have a strong affinity with functions
  • Streams encourage less mutability
  • Streams encourage looser coupling
  • Streams can succinctly express quite sophisticated behavior
  • Streams provide scope for future efficiency gains

Java Objects:

This class consists of static utility methods for operating on objects. These utilities include null-safe or null-tolerant methods for computing the hash code of an object, returning a string for an object, and comparing two objects.

if (Objects.nonNull(resource)) {
  resource.getValueMap().get("myproperty", StringUtils.EMPTY);
}

Java Optional:

Trying using Java Optional util, which is a box type that holds a reference to another object.

Is immutable and non serializable ant there is no public constructor and can only be present or absent

It is created by the of(), ofNullable(), empty() static method.

In the below example Optional resource is created and you can check whether the resource is present and if present then get the valuemap

Optional < Resource > res = Optional.ofNullable(resource);
if (res.isPresent()) {
    res.get().getValueMap().get("myproperty", StringUtils.EMPTY);
}

you can also call stream to get children’s as shown below:

Optional < Resource > res = Optional.ofNullable(resource);
if (res.isPresent()) {
    List < Resource > jam = res.stream().filter(Objects::nonNull).collect(Collectors.toList());
}

Java Stream Support:

Low-level utility methods for creating and manipulating streams. This class is mostly for library writers presenting stream views of data structures; most static stream methods intended for end users are in the various Stream classes.

In the below example we are trying to get a resource iterator to get all the child resources and map the resources to a page and filter using Objects and finally collect the list of pages.

Iterator < Resource > iterator = childResources.getChildren().iterator();
List < Page > pages = StreamSupport.stream(((Iterable < Resource > )() -> iterator).spliterator(), false)
  .map(currentPage.getPageManager()::getContainingPage).filter(Objects::nonNull)
  .collect(Collectors.toList());

We can also Optional utility to get the children resources or empty list to avoid all kinds of null pointer exceptions.

List < Resource > pagesList = Optional.ofNullable(resource.getChild(Teaser.NN_ACTIONS))
  .map(Resource::getChildren)
  .map(Iterable::spliterator)
  .map(s -> StreamSupport.stream(s, false))
  .orElseGet(Stream::empty)
  .collect(Collectors.toList());

We can also adapt the resource to Page API and call the listchilderens to get all the children and using stream support we are going to map the page paths into a list as shown below:

terator < Page > childIterator = childResources.adaptTo(Page.class).listChildren();
StreamSupport.stream(((Iterable < Page > )() -> childIterator).spliterator(), false)
  .filter(Objects::nonNull)
  .map(childPage -> childPage.getPath())
  .collect(Collectors.toList());

Does this works only on resource and page API?

No, we can also use Content Fragment and other API’s as well for example in the below code we are trying to iterate contentfragment and get all the variations of the contentfragment.

Optional < ContentFragment > contentFragment = Optional.ofNullable(resource.adaptTo(ContentFragment.class));
Iterator < VariationDef > versionIterator = contentFragment.get().listAllVariations();
List < String > variationsList = StreamSupport.stream(((Iterable < VariationDef > )() -> versionIterator).spliterator(), false)
  .filter(Objects::nonNull)
  .map(cfVariation -> cfVariation.getTitle())
  .collect(Collectors.toList());

You can also learn more about other tricks and techniques of Java Streams:

AEM Query builder using Java streams