AEM Bulk Replication allows you to activate a series of pages and/or assets.
How can we make sure workflow won’t impact AEM performance (CPU or Heap memory) / throttle the system?
Introduction:
AEM bulk replication or activation is performed on a series of pages and/or assets. We usually perform bulk replication on tree structures or lists of paths.
Use case:
MSM (Multi-site management) – rolling out a series of pages or site
Editable template – add/remove new components on template structure and activate existing pages
Bulk Asset ingests into the system
Bulk redirect//vanity path update
ACS Commons Throttled Task Runner is built on Java management API for managing and monitoring the Java VM and can be used to pause tasks and terminate the tasks based on stats.
Throttled Task Runner (a managed thread pool) provides a convenient way to run many AEM-specific activities in bulk it basically checks for the Throttled Task Runner bean and gets current running stats of the actual work being done.
OSGi Configuration:
The Throttled Task Runner is OSGi configurable, but please note that changing configuration while work is being processed results in resetting the worker pool and can lose active work.
Throttled Task Runner OSGi
Max threads: Recommended not to exceed the number of CPU cores. Default 4.
Max CPU %: Used to throttle activity when CPU exceeds this amount. Range is 0..1; -1 means disable this check.
Max Heap %: Used to throttle activity when heap usage exceeds this amount. Range is 0..1; -1 means disable this check.
Cooldown time: Time to wait for CPU/MEM cooldown between throttle checks (in milliseconds)
Watchdog time: Maximum time allowed (in ms) per action before it is interrupted forcefully.
JMX MBeans
Throttled Task Runner MBean
This is the core worker pool. All action managers share the same task runner pool, at least in the current implementation. The task runner can be paused or halted entirely, throwing out any unfinished work.
Throttled task runner JMX
How to use ACS Commons throttled task runner
Add the following dependency to your pom
Create a custom service or servlet as shown below:
Throttled Replication
inside the custom workflow process method check for the Low CPU and Low memory before starting your task to avoid performance impact on the system.
AEM Workflows allow you to automate a series of steps that are performed on (one or more) pages and/or assets.
How can we make sure workflow won’t impact AEM performance (CPU or Heap memory) / throttle the system?
Introduction:
AEM Workflows allow you to automate a series of steps that are performed on (one or more) pages and/or assets.
For example, when publishing, an editor has to review the content – before a site administrator activates the page. A workflow that automates this example notifies each participant when it is time to perform their required work:
The author applies the workflow to the page.
The editor receives a work item that indicates that they are required to review the page content. When finished, they indicate that their work item is complete.
The site administrator then receives a work item that requests the activation of the page. When finished, they indicate that their work item is complete.
Typically:
Content authors apply workflows to pages as well as participate in workflows.
The workflows that you use are specific to the business processes of your organization.
ACS Commons Throttled Task Runner is built on Java management API for managing and monitoring the Java VM and can be used to pause tasks and terminate the tasks based on stats.
Throttled Task Runner (a managed thread pool) provides a convenient way to run many AEM-specific activities in bulk it basically checks for the Throttled Task Runner bean and gets current running stats of the actual work being done.
OSGi Configuration:
The Throttled Task Runner is OSGi configurable, but please note that changing configuration while work is being processed results in resetting the worker pool and can lose active work.
Throttled task runner OSGi
Max threads: Recommended not to exceed the number of CPU cores. Default 4.
Max CPU %: Used to throttle activity when CPU exceeds this amount. Range is 0..1; -1 means disable this check.
Max Heap %: Used to throttle activity when heap usage exceeds this amount. Range is 0..1; -1 means disable this check.
Cooldown time: Time to wait for CPU/MEM cooldown between throttle checks (in milliseconds)
Watchdog time: Maximum time allowed (in ms) per action before it is interrupted forcefully.
JMX MBeans
Throttled Task Runner MBean
This is the core worker pool. All action managers share the same task runner pool, at least in the current implementation. The task runner can be paused or halted entirely, throwing out any unfinished work.
AEM Schedulers are commonly used to run bulk tasks at an off time (nonbusiness hours) and some tasks are run periodically to fetch results cached and fetched by the front end.
How can we make sure bulk tasks won’t impact AEM performance (CPU or Heap memory) / throttle the system?
Introduction:
In Computer Software, Scheduling is a paradigm of planning for the execution of a task at a certain point in time and it can be broadly classified into two types:
1. Scheduled Task – executing once at a particular future point in time
2. Frequent scheduling – repeat periodically at a fixed interval
The use case for the scheduler:
1. Sitemap generation
2. Synching product data from AEM Commerce
3. Fetch DB content and place it in the repository to be picked up by frontend and in turn cached in Dispatcher
4. Fetch Stats or reports and place them in the repository to be picked up by frontend and in turn cached in Dispatcher
ACS Commons Throttled Task Runner is built on Java management API for managing and monitoring the Java VM and can be used to pause tasks and terminate the tasks based on stats.
Throttled Task Runner (a managed thread pool) provides a convenient way to run many AEM-specific activities in bulk it basically checks for the Throttled Task Runner bean and gets current running stats of the actual work being done.
OSGi Configuration:
The Throttled Task Runner is OSGi configurable, but please note that changing configuration while work is being processed results in resetting the worker pool and can lose active work.
Throttled task runner OSGi
Max threads: Recommended not to exceed the number of CPU cores. Default 4.
Max CPU %: Used to throttle activity when CPU exceeds this amount. Range is 0..1; -1 means disable this check.
Max Heap %: Used to throttle activity when heap usage exceeds this amount. Range is 0..1; -1 means disable this check.
Cooldown time: Time to wait for CPU/MEM cooldown between throttle checks (in milliseconds)
Watchdog time: Maximum time allowed (in ms) per action before it is interrupted forcefully.
JMX MBeans
Throttled Task Runner MBean
This is the core worker pool. All action managers share the same task runner pool, at least in the current implementation. The task runner can be paused or halted entirely, throwing out any unfinished work.
Retry posting the data at least some time until the response is 200
Retry any OAK operations, when the exception occurs
Introduction:
Usually, with respect to AEM, we don’t have Retry Utils which can retry the particular operation whenever an exception occurred.
If we are doing multiple transactions on the AEM repository, especially on a particular node like updating properties or updating references, the OAK operation would through exceptions like your operation is blocked by another operation or invalid modification.
If we are connecting to external services through REST API and the connection has failed or timeout and if we want to connect to the external system then we don’t have the option to retry out of the box
Create a Retry Utils as shown below:
Retry on Exception:
package com.test.utils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class RetryOnException {
private static final Logger log = LoggerFactory.getLogger(RetryOnException.class);
public static interface CallToRetry {
void process() throws Exception;
}
public static boolean withRetry(int maxTimes, long intervalWait, CallToRetry call) throws Exception {
if (maxTimes <= 0) {
throw new IllegalArgumentException("Must run at least one time");
}
if (intervalWait <= 0) {
throw new IllegalArgumentException("Initial wait must be at least 1");
}
Exception thrown = null;
for (int counter = 0; counter < maxTimes; counter++) {
try {
call.process();
return true;
} catch (Exception e) {
thrown = e;
log.info("Encountered failure on {} due to {}, attempt retry {} of {}", call.getClass().getName() , e.getMessage(), (counter + 1), maxTimes, e);
}
try {
Thread.sleep(intervalWait);
} catch (InterruptedException wakeAndAbort) {
break;
}
}
throw thrown;
}
}
The above Util can be used in any code as shown below and the retry will happen only when the exception occurs during operations
package com.test.utils;
import java.util.concurrent.atomic.AtomicInteger;
import org.aarp.www.mcp.utils.RetryOnException;
public class ExampleOne {
public static void main(String[] args) {
AtomicInteger atomicCounter = new AtomicInteger(0);
try {
RetryOnException.withRetry(3, 500, () -> {
if(atomicCounter.getAndIncrement() < 2) {
System.out.println("Retring count with Exception" + atomicCounter.get());
throw new Exception("Throwing New Exception to test");
} else {
System.out.println("Retring count without Exception " + atomicCounter.get());
}
});
} catch (Exception e) {
e.printStackTrace();
}
}
}
Exception Result
Retry on the condition:
package com.test.utils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class RetryOnCondition {
private static final Logger log = LoggerFactory.getLogger(RetryOnCondition.class);
public static interface CallToRetry {
boolean process() throws Exception;
}
public static boolean withRetry(int maxTimes, long intervalWait, CallToRetry call) throws Exception {
if (maxTimes <= 0) {
throw new IllegalArgumentException("Must run at least one time");
}
if (intervalWait <= 0) {
throw new IllegalArgumentException("Initial wait must be at least 1");
}
Exception thrown = null;
for (int counter = 0; counter < maxTimes; counter++) {
try {
boolean status = call.process();
if(status) {
return true;
}
} catch (Exception e) {
thrown = e;
log.info("Encountered failure on {} due to {}, attempt retry {} of {}", call.getClass().getName() , e.getMessage(), (counter + 1), maxTimes, e);
}
try {
Thread.sleep(intervalWait);
} catch (InterruptedException wakeAndAbort) {
break;
}
}
throw thrown;
}
}
The above Util can be used to retry based on the condition like if the connection is successful or the response code is 200
Create a perfect package in AEM, whenever we want to create any package in AEM we provide the content paths in the package filter we get only content pages. But what about images and reference pages? What about experience fragment pages or XF page-related context hub variations?
Introduction:
MCP (Manage Controlled Processes) is both a dashboard for performing complex tasks and a rich API for defining these tasks as process definitions. In addition to kicking off new processes, users can also monitor running tasks, retrieve information about completed tasks, halt work, and so on.
Add the following maven dependency to your pom to extend MCP
In order to create a perfect content package in AEM, please create an MCP as shown below:
Create Process Definition factory – PackageCreatorFactory
This class tells ACS Commons MCP to pick the process definition and process name getName and you need to mention the implementation class inside the createProcessDefinitionInstance method as shown below:
How to access the CRX package manager in PROD or AEM as Cloud services?
User cases:
Latest content package from PROD to lowers or local for debugging purposes
Install the content package on PROD
Continue the PROD deployment during CM outage in between deployment
Introduction:
Packages enable the importing and exporting of repository content. For example, you can use packages to install new functionality, transfer content between instances, and back up repository content.
A package is a zip file holding repository content in the form of a file-system serialization (called “vault” serialization). This provides an easy-to-use-and-edit representation of files and folders.
Packages include content, both page content and project-related content, selected using filters.
A package also contains vault meta information, including the filter definitions and import configuration information. Additional content properties (that are not used for package extraction) can be included in the package, such as a description, a visual image, or an icon; these properties are for the content package consumer and for informational purposes only.
In order to access packages in AEM:
MCP (Manage Controlled Processes) is both a dashboard for performing complex tasks and a rich API for defining these tasks as process definitions. In addition to kicking off new processes, users can also monitor running tasks, retrieve information about completed tasks, halt work, and so on.
Add the following maven dependency to your pom to extend MCP
Create Process Definition factory – PackageHandlerFactory
This class tells ACS Commons MCP to pick the process definition and process name getName and you need to mention the implementation class inside the createProcessDefinitionInstance method as shown below:
AEM environment size is increasing because of user-generated packages
Requirement:
Can we purge all the user-generated packages to improve stability?
Introduction:
A package is a zip file holding repository content in the form of a file-system serialization (called “vault” serialization). This provides an easy-to-use-and-edit representation of files and folders.
Packages include content, both page content and project-related content, selected using filters.
A package also contains vault meta information, including the filter definitions and import configuration information. Additional content properties (that are not used for package extraction) can be included in the package, such as a description, a visual image, or an icon; these properties are for the content package consumer and for informational purposes only.
Usually, Developers or AEM content synch or even code deployment will keep on piling up in the CRX packages and it will be consuming spaces on MBs and even sometimes GBs.
If we move more packages, then loading crx/packmgr would more time.
Hence you can create a scheduler that runs on off-hours which cleans up the packages and which will get back the space and avoids extra maintenance tasks.
The below scheduler will remove all the packages for my_package group we can add business logic to handle for other groups
package com.mysite.core.schedulers;
import org.osgi.service.metatype.annotations.AttributeDefinition;
import org.osgi.service.metatype.annotations.AttributeType;
import org.osgi.service.metatype.annotations.ObjectClassDefinition;
@ObjectClassDefinition(name = "Old Packages Purge Schedular", description = "Remove old packages from different paths")
public @interface PurgeOldPackagesSchedulerConfig {
String DEFAULT_SCHEDULER_EXPRESSION = "0 0 16 ? * SUN *"; // every Sunday 4 PM
boolean DEFAULT_SCHEDULER_CONCURRENT = false;
@AttributeDefinition(name = "Enabled", description = "True, if scheduler service is enabled", type = AttributeType.BOOLEAN)
boolean enabled() default true;
@AttributeDefinition(name = "Cron expression defining when this Scheduled Service will run", description = "[every minute = 0 * * * * ?], [12:00am daily = 0 0 0 ? * *]", type = AttributeType.STRING)
String schedulerExpression() default DEFAULT_SCHEDULER_EXPRESSION;
@AttributeDefinition(name = "package paths", description = "package folder paths", type = AttributeType.STRING)
String[] packagesPaths() default {"my_packages"};
}
How to Bulk Add, Update or remove page properties in AEM? Without using the Groovy console.
Requirement:
Create a reusable process that can be used to search for the pages based on resourceType and do the CRUD operations on the results.
Introduction:
Usually, whenever we are using editable templates, we might have some initial content but for some reason, if we want to update the experience fragment path or some page properties then usually, we go for Groovy script to run bulk update.
But AMS don’t install developer tools on the PROD, we need to go to other options and for the above requirement, we can use MCP.
MCP (Manage Controlled Processes) is both a dashboard for performing complex tasks and a rich API for defining these tasks as process definitions. In addition to kicking off new processes, users can also monitor running tasks, retrieve information about completed tasks, halt work, and so on.
Add the following maven dependency to your pom to extend MCP
Create Process Definition factory – PropertyUpdateFactory
This class tells ACS Commons MCP to pick the process definition and process name getName and you need to mention the implementation class inside the createProcessDefinitionInstance method as shown below:
package com.mysite.mcp.process;
import org.osgi.service.component.annotations.Component;
import com.adobe.acs.commons.mcp.ProcessDefinitionFactory;
@Component(service = ProcessDefinitionFactory.class, immediate = true)
public class PropertyUpdateFactory extends ProcessDefinitionFactory<PropertyUpdater> {
@Override
public String getName() {
return "Property Updator";
}
@Override
protected PropertyUpdater createProcessDefinitionInstance() {
return new PropertyUpdater();
}
}
Create Process Definition implementation – PropertyUpdater
This is an implementation class where we are defining all the form fields required for the process to run
How to clean up the growing repo? How to safely delete all the unwanted assets and pages.
Requirement:
Find the references of all the assets and pages and clean up unreferenced assets and unwanted pages.
Introduction:
You can call the below process as asset, pages references report.
Usually, with growing repo size, we usually do logs rotation and archiving, we also do some compactions (Revision cleanup).
What if we could remove some of the deactivated and unreferenced assets or pages?
How to find references of assets or pages?
Go to the following url https://{domain}/apps/acs-commons/content/manage-controlled-processes.html and click on Start Process and select Renovator process as shown below:
Start ProcessRenovator Process
I am trying to check the references of all the assets under the following path:
Source path: /content/dam/wknd/en/activities
And select some random path into the Destination path: /content/dam/wknd/en/magazine
And please do make sure to check the Dry run and Detailed Report checkboxes, if not checked all the assets will be moved to the new folder i.e, /content/dam/wknd/en/magazine
Process fields selections
Once you start the process you would see the process take some time to run and you can click on the process and open the view or download the excel report as shown:
Process result pageView the results popup
Once downloaded delete the following columns:
Remove unwanted columns
You can see some of the rows have empty references and if you think these assets are no more required then you can remove them
Unreferenced rows
How to remove the unreferenced assets or pages safely?
You can run through the above steps on any of the folders and please make sure to avoid running on root folders or pages like: /content/dam or /content or home pages because would slow down the servers
For more information on how to use the renovator process for
Create a package for a list of paths using the ACS commons query package tool
Introduction:
Usually, when we try to create a package of content from the lowers or prod environment, we provide a list of pages we are trying to create a package for by going into the edit.
Add filters
What about the assets related to those pages? Again, you need to get the list of assets and edit the package.
Is there a way we can create a package using the ACS Commons tool?