Websites often need tasks that run periodically, behind the scenes. Examples include sending email reminders, aggregating denormalized data and permanently deleting archived records. Very often the simplest solution is to setup a cron job to hit a URL on the site that performs the task.
Cron has the advantage of simplicity, but it's not not ideal for the job. You have to take steps to ensure that regular users of the site cannot hit those URLs directly. It also forces you to manage an external configuration. What if you forget to perform the configuration on the qa or production servers? It would be safer and easier if the configuration was in the code for the site.
For Django sites, celery seems to be the solution of choice. Celery is really focused on being a distributed task queue, but it can also be a great scheduler. Their documentation is excellent, but I found that they lack a quickstart guide for getting started with Django and celery, just for replacing cron.
Note: Celery typically runs with RabbitMQ as the back-end. For just task scheduling, this may be overkill. This guide starts out using kombu, which is backed by the database Django is already using.
sudo pip install django-celery
INSTALLED_APPS = ( ... 'kombu.transport.django', 'djcelery', ) BROKER_URL = "django://" # tell kombu to use the Django database as the message queue import djcelery djcelery.setup_loader()
from celery.task.schedules import crontab from celery.decorators import periodic_task # this will run every minute, see http://celeryproject.org/docs/reference/celery.task.schedules.html#celery.task.schedules.crontab @periodic_task(run_every=crontab(hour="*", minute="*", day_of_week="*")) def test(): print "firing test task"
sudo ./manage.py celeryd -v 2 -B -s celery -E -l INFO
At this point, you should see your celery tasks in the console output, and you should see the task firing every minute.
[2012-03-02 09:34:49,170: WARNING/MainProcess] -------------- celery@chase-VirtualBox v2.5.1 ---- **** ----- --- * *** * -- [Configuration] -- * - **** --- . broker: django://localhost// - ** ---------- . loader: djcelery.loaders.DjangoLoader - ** ---------- . logfile: [stderr]@INFO - ** ---------- . concurrency: 1 - ** ---------- . events: ON - *** --- * --- . beat: ON -- ******* ---- --- ***** ----- [Queues] -------------- . celery: exchange:celery (direct) binding:celery [Tasks] . myapp.tasks.test [2012-03-02 09:34:49,236: INFO/PoolWorker-2] child process calling self.run() [2012-03-02 09:34:49,239: WARNING/MainProcess] celery@chase-VirtualBox has started. [2012-03-02 09:34:49,245: INFO/Beat] child process calling self.run() [2012-03-02 09:34:49,249: INFO/Beat] Celerybeat: Starting... [2012-03-02 09:34:49,283: INFO/Beat] Scheduler: Sending due task myapp.tasks.test [2012-03-02 09:34:54,654: INFO/MainProcess] Got task from broker: myapp.tasks.test[39d57f82-fdd2-406a-ad5f-50b0e30a6492] [2012-03-02 09:34:54,666: WARNING/PoolWorker-2] firing test task [2012-03-02 09:34:54,667: INFO/MainProcess] Task myapp.tasks.test[39d57f82-fdd2-406a-ad5f-50b0e30a6492] succeeded in 0.00423407554626s: None
You may also want to run celeryd as a service.
Update 3/1/2012: updated instructions Kombu. Tested on Python 2.7.2 and Django 1.3.0 in a clean environment.