chemicals in bottles

http://kb.blackboard.com/display/KB/The PurgeAccumulator task

What is the PurgeAccumulator

Hit statistics for Blackboard are recorded in the ACTIVITY_ACCUMULATOR table, which gets very large over time. The PurgeAccumulator job is designed to copy this data over to another database schema intended purely for reporting purposes, and to maintain (purge) such hit data from the production schema after it has aged out. As a side effect it also calculates some summary data for the day’s activity.

How is PurgeAccumulator run?

This is one of the Scheduled tasks scheduled from config/bb-tasks.xml via the following stanza:
  <task-entry key="bb.stats.purging" version="60">
    <task classname="blackboard.platform.tracking.PurgeApplicationTask">
      <property name="delay" value="21600000" />
      <property name="period" value="86400000" />
      <property name="xml.registered.delay" value="1:00" />
      <property name="xml.registered.period" value="24" />
      <property name="days_to_keep" value="180" />
      <property name="target" value="live" />
      <property name="dev_null" value="/dev/null" />
      <property name="command-line" value="/usr/local/blackboard/tools/admin/PurgeAccumulator.sh" />
    </task>
  </task-entry>
The meaning of these parameters in bb-tasks.xml is:
  • delay = milliseconds after service startup before starting this task
  • period = milliseconds between task invocations
  • xml.registered.delay = time of day (24-hour clock) when task will run (but not earlier than delay parameter specifies, i.e. if at least 6 hours after JVM startup haven’t gone by, it won’t run at this hour)
  • xml.registered.period = how many hours to wait before trying to run the task again (overrides period parameter)
  • days_to_keep = number of days of stats activity to keep in production (bb_bb60) schema as well as in reporting (bb_bb60_stats) schema (parameter specific to this task only)
  • target = whether to purge live or stats data – should always be “live” in bb-tasks.xml (parameter specific to this task only)
Unless commented out in this file, each appserver and each tomcat node will try running this task, but only one of them should run per day, so this is much better scheduled manually (see below).

Manual Invocations

The manual invocation syntax for PurgeAccumulator is:
   blackboard/tools/admin/PurgeAccumulator(.sh|.bat) purge-live BBLEARN days_to_keep
   blackboard/tools/admin/PurgeAccumulator(.sh|.bat) purge-stats BBLEARN cutoff_date|days_to_keep
The first of these is the normal PurgeAccumulator run scheduled nightly, the second purges historical tracking data from the stats DB if you find it too old and stale. That second method is only used during manual invocations of the tool. Normally the task is scheduled from the config/bb-tasks.xml task scheduler configuration file as (Unix syntax):
   /usr/local/blackboard/tools/admin/PurgeAccumulator.sh purge-live BBLEARN 180 > /dev/null
The second parameter to the tool (“bb_uid”) is always BBLEARN, except for institutions running multiple virtual installations. The third parameter (here “180”) is the number of days of stats to keep in the BBLEARN (production) database schema. (The synchronization step run as part of purge-live will always sync the BBLEARN_STATS schema up to the day of running the tool, this is not controllable.) One can optionally enter a specific cutoff date in YYYY-MM-DD format instead, meaning “purge everything older than this date”. The tool can also be invoked manually with a “purge-stats” option instead of “purge-live”; this deletes old records from the BBLEARN_STATS reporting schema irrevocably when you don’t need them anymore. When invoked with this option, the tool does not do any synchronizing (it does not run the daily_system_tracking or sync_… procedures). For better distinction from the purge-live method his option is designed to take a cutoff date parameter (purge everything older than this date), but a days_to_keep value can also be used instead. Example:
   /usr/local/blackboard/tools/admin/PurgeAccumulator.sh purge-stats BBLEARN 2007-12-30
To schedule PurgeAccumulator externally via the Windows Task Scheduler rather than running it as a background task from bb-tasks.xml, first make sure the task is commented out in bb-tasks.xml via regular HTML-style comment tags, then schedule the blackboard\tools\admin\PurgeAccumulator.bat script with these same options in Task Scheduler for 1:00 am every day. On Unix you obviously also need to comment out the task in bb-tasks.xml. You can then either schedule the above command in bbuser’s crontab (preferred, but requires bbuser to be among the users allowed to use cron), or use an “su” construct to schedule it in the root crontab file (the example here schedules it at 1:00 am, the normal time when PurgeAccumulator would run from bb-tasks.xml as well):
0 1 * * * su - bbuser -c "/usr/local/blackboard/tools/admin/PurgeAccumulator.sh purge-live BBLEARN 180"

How does the PurgeAccumulator work?

The PurgeAccumulator task executes a number of stored procedures while logged into the database as the BB_BB60_STATS user.
  • As first part of this task the daily_system_tracking procedure is executed in the BB_BB60 schema, which calculates today’s overall system statistics and inserts them into the system_tracking table. If this procedure is not executing, the General System Statistics overview in the GUI shows zeroes, the easiest way to determine from the GUI that there was a problem with PurgeAccumulator.
  • Afterwards it copies the most recent hit data from the production database into the stats DB via stored procedure BBLEARN_stats.sync_activity_accumulator. On Oracle, this procedure works in batches of 2000 records since 6.3SP1 (and would often fail to complete in 6.2).
  • Then it drops all data from all other tables in the BB_BB60_STATS schema (by truncating the tables) and copies the current data from the production DB (BBLEARN schema) over (of course, in case of VIs, this works similarly for the BB_VINAME and BB_VINAME_STATS schemata). The tables are: application, course_main, course_roles, course_users, data_source, institution_roles, navigation_item, system_roles, system_tracking, users, user_roles. This of course generates a lot of redo/transaction logging, since some of these tables are quite large.
  • Finally, the purge_accumulator procedure is executed in the BBLEARN (production) schema, with the days_to_keep value as only parameter. This deletes all hit data older than days_to_keep days from the production DB. It works in batches of 2000 records at a time.

Verifying PurgeAccumulator operation

The PurgeAccumulator logs to logs/update-tools/update-tool-log.txt on the appserver that ran it, so you can see each night whether it ran and where. It logs each step of its operation – and if it skips synching with the stats database due to licensing that is correct on a Basic system. To verify whether the tool ran correctly and completely, log into the database as BBLEARN_STATS user and run the following queries:
--Most of these single selects will be extremely fast accesses to index endpoints, but if you put them together into one query,
--they will be very slow, needing a full table scan of this huge table, so please don't ever do that!
SELECT MIN(PK1) FROM BBLEARN.ACTIVITY_ACCUMULATOR;
SELECT MAX(PK1) FROM BBLEARN.ACTIVITY_ACCUMULATOR;
SELECT MIN(TIMESTAMP) FROM BBLEARN.ACTIVITY_ACCUMULATOR;
SELECT MAX(TIMESTAMP) FROM BBLEARN.ACTIVITY_ACCUMULATOR;
SELECT COUNT(*) FROM BBLEAN.ACTIVITY_ACCUMULATOR;
--on Oracle, a faster alternative (assuming the table is analyzed regularly) is:
--SELECT NUM_ROWS,LAST_ANALYZED FROM ALL_TABLES WHERE TABLE_NAME='ACTIVITY_ACCUMULATOR' AND OWNER='BBLEARN';
--***section boundary***
SELECT MIN(PK1) FROM BBLEARN_STATS.ACTIVITY_ACCUMULATOR;
SELECT MAX(PK1) FROM BBLEARN_STATS.ACTIVITY_ACCUMULATOR;
SELECT MIN(TIMESTAMP) FROM BBLEARN_STATS.ACTIVITY_ACCUMULATOR;
SELECT MAX(TIMESTAMP) FROM BBLEAN_STATS.ACTIVITY_ACCUMULATOR;
SELECT COUNT(*) FROM BBLEAN_STATS.ACTIVITY_ACCUMULATOR;
--on Oracle, a faster alternative (assuming the table is analyzed regularly) is:
--SELECT NUM_ROWS,LAST_ANALYZED FROM ALL_TABLES WHERE TABLE_NAME='ACTIVITY_ACCUMULATOR' AND OWNER='BBLEARN_STATS';
Under correct operation the first section (production DB) should show the oldest record (min(timestamp)) having a date from 180 days ago (or whatever number of days to keep you have configured), and the newest record in the second section (stats DB) should be from today, meaning the sync has completed. If the sync finished but the purge failed, you would still have records until today in the stats DB, but would have too-old records in the production DB. As of Release 9 and Release 8, Service Pack 5 statistics are no longer directly recorded to the ACTIVITY_ACCUMULATOR table to improve performance. Instead a simple ACTIVITY_ACCUMULATOR_QUEUE table without the indexes required for system report performance is appended to and records are then copied by a SQL Server Agent or Oracle DBA job to the main ACTIVITY_ACCUMULATOR table. If the above values are correct but are missing the latest events for your system, verify that the job is running successfully: Oracle Check that the job is running successfully and is not marked as broken, and verify that the DBA job queue is enabled on your instance.
select job, broken, failures, next_date from dba_jobs where what = 'activity_accumulator_update;';
]]>