Comparing Stream-Based, Page.listChildren, and Query Builder Methods for Listing AEM Children Pages

Problem Statement:

What is the best way to list all the children in AEM?

Stream-based VS page.listChildren VS Query Builder

Introduction:

AEM Sling Query is a resource traversal tool recommended for content traversal in AEM. Traversal using listChildren(), getChildren(), or the Resource API is preferable to writing JCR Queries as querying can be more costly than traversal. Sling Query is not a replacement for JCR Queries. When traversal involves checking multiple levels down, Sling Query is recommended because it involves lazy evaluation of query results.

JCR queries in AEM development and recommends using them sparingly in production environments due to performance concerns. JCR queries are suitable for end-user searches and structured content retrieval but should not be used for rendering requests such as navigation or content counts.

How can I get all the child pages in AEM using JCR Query?

List<String> queryList = new ArrayList<>();
Map<String, String> map = new HashMap<>();
map.put("path", resource.getPath());
map.put("type", "cq:PageContent");
map.put("p.limit", "-1");

Session session = resolver.adaptTo(Session.class);
Query query = queryBuilder.createQuery(PredicateGroup.create(map), session);
SearchResult result = query.getResult();
ResourceResolver leakingResourceResolverReference = null;
try {
    for (final Hit hit : result.getHits()) {
        if (leakingResourceResolverReference == null) {
            leakingResourceResolverReference = hit.getResource().getResourceResolver();
        }
        queryList.add(hit.getPath());
    }
} catch (RepositoryException e) {
    log.error("Error collecting inherited section search results", e);
} finally {
    if (leakingResourceResolverReference != null) {
        leakingResourceResolverReference.close();
    }
}

But JCR Query consumes more resources

AEM recommends using Page.listchildren because of less complexity

List<String> pageList = new ArrayList<>();
Page page = resource.adaptTo(Page.class);
Iterator<Page> childIterator = page.listChildren(new PageFilter(), true);
StreamSupport.stream(((Iterable<Page>) () -> childIterator).spliterator(), false).forEach( r -> {
    pageList.add(r.getPath());
    }
);

But it sometimes misses some results in the result set and it’s slower compared to Java streams based

How about Java streams?

Java streams can iterate faster and execute faster and consumes very few resources

List<String> streamList = new ArrayList<>();
for (Resource descendant : (Iterable<? extends Resource>) traverse(resource)::iterator) {
    streamList.add(descendant.getPath());
}
private Stream<Resource> traverse(@NotNull Resource resourceRoot) {
    Stream<Resource> children = StreamSupport.stream(resourceRoot.getChildren().spliterator(), false)
            .filter(this::shouldFollow);
    return Stream.concat(
            shouldInclude(resourceRoot) ? Stream.of(resourceRoot) : Stream.empty(),
            children.flatMap(this::traverse)
    );
}

protected boolean shouldFollow(@NotNull Resource resource) {
    return !JcrConstants.JCR_CONTENT.equals(resource.getName());
}

protected boolean shouldInclude(@NotNull Resource resource) {
    return resource.getChild(JcrConstants.JCR_CONTENT) != null;
}

I recently came across this logic while debugging the OOTB sling sitemap generator: https://github.com/apache/sling-org-apache-sling-sitemap

results comparison

Stream-based results took just 3miliseconds compared to page.listChildren or query

7 thoughts on “Comparing Stream-Based, Page.listChildren, and Query Builder Methods for Listing AEM Children Pages

  1. Not entirely correct.
    page.listchildren just lists pages (and only pages) as children, while resource.listChildren lists all child resources, including the mandatory jcr:content childnode, any potential ACL nodes/resources, sling folders etc. So their semantic is quite different.

    (And the performance difference just comes from these additional checks.)

    Like

    1. Hi Jorg,
      I had a similar requirement in my current project and as per the Page API documentation if we set deep as true it gets all the children but if you have more 3000 or 4000 child pages then results are not consistent
      But if you refer to the sling sitemap generator they are using streams to generate a sitemap: https://github.com/apache/sling-org-apache-sling-sitemap/blob/master/src/main/java/org/apache/sling/sitemap/spi/generator/ResourceTreeSitemapGenerator.java

      Like

      1. What do you mean with “results are not consistent”?

        In the end all this code uses node.getNodes() to iterate through all child nodes. Some implementations add more features (like removing implementation details like rep:acl nodes etc), but in the majority of the code executed is idential

        Like

      2. But that is caused by the fact, that the content is different per environment. And order matters.
        I would define “non-consistent behavior” if you run the same code on the same env (with the same unchanged content) and get different results.

        Like

  2. java.util.Iterator listChildren(Filter filter,
    boolean deep)
    Returns an iterator over descendant resources that adapt to a page and are included in the given filter. other child resources are skipped.
    Parameters:
    filter – for iteration. may be null
    deep – false traverses only children; true traverses all descendants
    Returns:
    iterator of child pages

    page documentation says if we set it has true it gets all the children’ and I also thought Sling sitemap generator team would use this API but I was wrong they used streams

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s