Problem statement:
The AEM instance can slow down due to pending or stuck Sling jobs, and even after restarting the system, the queue may not be cleared due to various reasons. This article suggests ways to safely clear all the Sling jobs. How can clear all the sling jobs safely?
Introduction:
AEM is slowing down and restarting takes a long time because Sling Jobs remain pending or stuck due to various factors:
- Shutdown during active job processing
- The queue is stuck due to some write operations being blocked (high CPU / Indexing / high memory)
- Other process threads are blocked.
- Processing corrupt content or corrupt segment store or blob store
After restart also if you visit the following AEM Tools health check URL: http://localhost:4502/libs/granite/operations/content/healthreports/healthreport.html/system/sling/monitoring/mbeans/org/apache/sling/healthcheck/HealthCheck/slingJobs

If your replication agent is broken due to a pending/stuck queue but the connection is working fine without a timeout, then it’s usually recommended to create a new agent and replace the broken publisher with the healthy publisher (which doesn’t have a pending queue) and disable the existing agent. But does this solve the queued jobs? No
Is there a way to check which sling job:
- Is broken?
- Is active?
- Can I clean up succeeded jobs?
Yes you can follow the following article from Adobe and they usually ask you to purge nodes by going into content explorer to delete the recursively URL: https://helpx.adobe.com/experience-manager/kb/AEM-restart-takes-a-long-time-due-to-sling-jobs-AEM.html
Is there any other way I can stop jobs safely? Instead of cleaning up important sling jobs history?
Solution:
AEM OOTB comes with multiple tools in AEM and to access all the tools you need to navigate to the tool section and select the appropriate sections to perform all the operations
For example:
- AEM operations
- Managing templates
- Cloud configurations
- ACS Commons tools etc.
Tools are an essential part of AEM and avoid any dependency on Groovy scripts or any add scripts and can be managed or extended at any given time.
Creating a tool from scratch can be time-consuming and require man-hours, but generating a tool using a URL such as https://kiransg.com/2022/11/24/aem-tool-create-generate-tool-from-scratch/ can streamline the process. The generated tool, “Clear Out Excess Sling Jobs,” generate the model class, servlet, and other boilerplate code.

Generate a table of all the sling jobs and all the states based on all the registered topics in AEM and provide a search filter to find the sling job.
Use Discover service to get all the Job topics:
TopologyView topology = discoveryService.getTopology();
Set<InstanceDescription> instances = topology.getInstances();
Iterator instanceIt = instances.iterator();
Set<String> enabledTopics = new TreeSet<>();
while (instanceIt.hasNext()) {
InstanceDescription instance = (InstanceDescription) instanceIt.next();
enabledTopics = expandCSV(instance.getProperty("org.apache.sling.event.jobs.consumer.topics"));
}
Use Agent manager to get all the agents configured in AEM:
Map<String, Agent> agents = this.agentManager.getAgents();
Set<String> keys = agents.keySet().stream().map(key -> "com/day/cq/replication/job/"+ key).collect(Collectors.toSet());
Use the Job Manager service to find all the jobs based on type and topic:
Collection<Job> activeJobs = jobManager.findJobs(JobManager.QueryType.ACTIVE, topic, 100, (Map<String, Object>[]) null);
Once the sling model is deployed you would see the dashboard like this with all the jobs based on topics with all the states and actions:

Create a servlet which accepts topics to clear the jobs safely you need to use job stop and remove the jobs based on the Job ID as shown below:

After removing the job, refreshing the table may show the updated content. However sometimes the queued Jobs count might not change, this can be fixed by restarting the server but it’s not recommended.
The working code is available on GitHub at: https://github.com/kiransg89/aemoperations
AEM Operations