Mar 042017
 

Django logo

If you’ve created any forms at all using the Django web framework then you should already be familiar with Django’s CSRF middleware and the protection it provides web site’s against cross site forgery request attacks. When the middleware is active, and unless the view has this protection overridden, any form POSTed will be expected to contain a hidden field named csrfmiddlewaretoken the value of which is expected to match a similarly named field in a CSRF cookie attached to the user. Because this value is specific to a user and constantly changing as well, testing the output of webpages with forms against what is expected is difficult. What follows is the solution I am using in Django 1.10.

If you are doing Test Driven Development (TDD) or less formal testing of your Django project, then one of the things you would likely want to do is to test the output of a view function against the output you expect.

If the CSRF protection wasn’t an issue, you’d probably have a unit test that looks like:

from django.test import TestCase
from django.http import HttpRequest
from django.template.loader import render_to_string

from myapp.views import form_page

class FormPageTest(TestCase):

    def test_form_page_returns_correct_html(self):
        request = HttpRequest()
        response = form_page(request)
        expected_html = render_to_string('form.html', request=request)
        self.assertEqual(response.content.decode(), expected_html)

For the purpose of this example, form_page is the view function that is located in the views subdirectory of the Django app myapp. form.html is the template used to generate the response. It contains the form with the {% csrf_token %}. This is the template tag that inserts the csrfmiddlewaretoken hidden field into the form. This looks something like:

<html>
⋮  
  <form method="POST">
    <input type="text" name="a_form_input">
    ⋮
    {% csrf_token %}
  </form>
⋮
</html>

Prior to Django 1.10, the test above would have succeeded. This is because in earlier versions of Django, the CSRF token remained static during the user’s session. In order to protect against BREACH attacks, this behavior changed in Django 1.10. Now the CSRF token changes after every user request.

The test code above fails because although a single request object is generated in the line request = HttpRequest(), the CSRF token changes after the call to the view, response = form_page(request). So even though we send the same request as an argument to the render_to_string function, because the CSRF token has changed, the html it generates will have a different value for the csrfmiddlewaretoken hidden field than the one in the response from the view.

Since we can’t get the CSRF token to remain unchanged for our testing and because we can’t ever predict its value, the solution is to remove the hidden field entirely from the html we are comparing. To do this, we need a function that takes an html string as an argument and strips out the csrfmiddlewaretoken hidden field returning the rest of the html untouched.

Both this solution and the following code were inspired by this StackOverflow answer: http://stackoverflow.com/a/39859042. Instead of an object method like the code that inspired it, I have chosen to implement it as a top level function in the test module.

The function to strip out the  csrfmiddlewaretoken hidden field is:

def remove_csrf(html_code):
    csrf_regex = r'<input[^>]+csrfmiddlewaretoken[^>]+>'
    return re.sub(csrf_regex, '', html_code)

In short, we are generating a regular expression to match csrfmiddlewaretoken hidden field and using the sub function provided by Python’s regular expression re module to remove it.

We now apply the remove_csrf function to both of the html strings we are comparing:

def test_form_page_returns_correct_html(self):
        request = HttpRequest()
        response = form_page(request)
        expected_html = render_to_string('form.html', request=request)
        self.assertEqual(remove_csrf(response.content.decode()), remove_csrf(expected_html))

For completeness, our original example test module now looks like:

from django.test import TestCase
from django.http import HttpRequest
from django.template.loader import render_to_string

import re

from myapp.views import form_page

def remove_csrf(html_code):
    csrf_regex = r'<input[^>]+csrfmiddlewaretoken[^>]+>'
    return re.sub(csrf_regex, '', html_code)

def test_form_page_returns_correct_html(self):
        request = HttpRequest()
        response = form_page(request)
        expected_html = render_to_string('form.html', request=request)
        self.assertEqual(remove_csrf(response.content.decode()), remove_csrf(expected_html))

After searching far and wide, this is the solution I found, but if you’ve got a better one, please share it in the comments below.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)