Websites often need tasks that run periodically, behind the scenes. Examples include sending email reminders, aggregating denormalized data and permanently deleting archived records. Very often the simplest solution is to setup a cron job to hit a URL on the site that performs the task.

Cron has the advantage of simplicity, but it's not not ideal for the job. You have to take steps to ensure that regular users of the site cannot hit those URLs directly. It also forces you to manage an external configuration. What if you forget to perform the configuration on the qa or production servers? It would be safer and easier if the configuration was in the code for the site.

For Django sites, celery seems to be the solution of choice. Celery is really focused on being a distributed task queue, but it can also be a great scheduler. Their documentation is excellent, but I found that they lack a quickstart guide for getting started with Django and celery, just for replacing cron.

Note: Celery typically runs with RabbitMQ as the back-end. For just task scheduling, this may be overkill. This guide starts out using kombu, which is backed by the database Django is already using.

  1. Install django-celery, ghettoq
    sudo pip install django-celery
    
  2. Edit settings.py, and add the celery config info
    INSTALLED_APPS = (
        ...
        'kombu.transport.django',
        'djcelery',
    )
    
    BROKER_URL = "django://" # tell kombu to use the Django database as the message queue
    
    import djcelery
    djcelery.setup_loader()
    
  3. Add the new tables to the Django database
    ./manage.py syncdb
    
  4. Create a file, tasks.py in your project (same level as models.py)
    from celery.task.schedules import crontab
    from celery.decorators import periodic_task
    
    # this will run every minute, see http://celeryproject.org/docs/reference/celery.task.schedules.html#celery.task.schedules.crontab
    @periodic_task(run_every=crontab(hour="*", minute="*", day_of_week="*"))
    def test():
        print "firing test task"
    
  5. Start the celery daemon in "beat" mode, which is required for scheduling
    sudo ./manage.py celeryd -v 2 -B -s celery -E -l INFO
    

At this point, you should see your celery tasks in the console output, and you should see the task firing every minute.

[2012-03-02 09:34:49,170: WARNING/MainProcess]

 -------------- celery@chase-VirtualBox v2.5.1
---- **** -----
--- * ***  * -- [Configuration]
-- * - **** ---   . broker:      django://localhost//
- ** ----------   . loader:      djcelery.loaders.DjangoLoader
- ** ----------   . logfile:     [stderr]@INFO
- ** ----------   . concurrency: 1
- ** ----------   . events:      ON
- *** --- * ---   . beat:        ON
-- ******* ----
--- ***** ----- [Queues]
 --------------   . celery:      exchange:celery (direct) binding:celery


[Tasks]
  . myapp.tasks.test

[2012-03-02 09:34:49,236: INFO/PoolWorker-2] child process calling self.run()
[2012-03-02 09:34:49,239: WARNING/MainProcess] celery@chase-VirtualBox has started.
[2012-03-02 09:34:49,245: INFO/Beat] child process calling self.run()
[2012-03-02 09:34:49,249: INFO/Beat] Celerybeat: Starting...
[2012-03-02 09:34:49,283: INFO/Beat] Scheduler: Sending due task myapp.tasks.test
[2012-03-02 09:34:54,654: INFO/MainProcess] Got task from broker: myapp.tasks.test[39d57f82-fdd2-406a-ad5f-50b0e30a6492]
[2012-03-02 09:34:54,666: WARNING/PoolWorker-2] firing test task
[2012-03-02 09:34:54,667: INFO/MainProcess] Task myapp.tasks.test[39d57f82-fdd2-406a-ad5f-50b0e30a6492] succeeded in 0.00423407554626s: None

If you want, you can upgrade to RabbitMQ. Just make sure to update your setting.py, as well.

You may also want to run celeryd as a service.

Update 3/1/2012: updated instructions Kombu. Tested on Python 2.7.2 and Django 1.3.0 in a clean environment.