Django analytics middleware
Server - side tracking with Django, Celery and google measurement protocol.
Introduction
The most widely used analytics implementation is including a tracking code in every web page you want to track. Your analytics provider is generating the code and a relevant tracking id which identifies your application. When a client program (like web browser) visits your page, the tracking code is running at the same time, sending data to the provider about the specific tracking id. This is the client - side way of user tracking.
Why server - side?
- Proxy servers or "ad blocker" browser plugins like ABP, uBlock usualy block access to your analytics provider. In fact, your users are blocking analytics depending on how sensitive they are about their privacy. Consequently the data is not reliable.
- You probably want to provide your analytics provider, with the data required for your own analysis. Not necessarily with all the data they are able to gather running code on your user's machine.
- A browser running javascript may be not be the case with your application. You may want to gather data about an API usage, mobile requests, custom events etc. You may want to follow your own tracking logic.
Django middleware
Django middleware is a good point to do the tracking job because it stays between every HTTPRequest and HTTPResponse. It is basicaly a regular python class with some methods called at request phase and some other called at response phase. For details about django middleware check the documentation: https://docs.djangoproject.com/en/
Tracking logic, Requirements, Dependencies
So our middleware need to do the following things:
- Generate a unique tracking id for each user making requests
- Ensure that every request from one user is tracked with the same tracking id
- Exclude some preset web pages from tracking (for example, admin pages or rss pages)
- Exclude error HTTP responses from tracking (track only HTTP status code 200)
- Do the tracking asyncronously (obviously we do not want our users to wait for en external resource)
We are going to implement the 2. by using cookies technology, so another requirement occurs. We are required by law to:
- Inform the user that we are using cookie technology for tracking him.
The "Django messages framework" (https://docs.djangoproject.com/en/2.0/ref/contrib/messages/) is handy for us for implementing the 2.1. requirement.
We will also use the Celery infrastructure described in another article for implementing the 5. requirement (asyncronous tracking).
For this demonstration will use Google analytics as analytics provider and meausurement protocol.
Of course, we suggest you to use your own analytics infrastructure and be the owner of your data. You can use both proprietary and open source products like or . Using Google Analytics, obviously, you are aware that your data is retained "for ever" and may be used by Google and/or its associates.
Analytics application
Assuming that you already have a project in the current directory, create the new app:
$ python manage.py startapp analytics
Create a file named "tracker.py". We will use it as our tracking library:
./analytics/tracker.py
import random import uuid from django.conf import settings VERSION = settings.ANALYTICS_API_VERSION COOKIE_NAME = settings.ANALYTICS_COOKIE_NAME COOKIE_PATH = settings.ANALYTICS_COOKIE_PATH COOKIE_AGE = settings.ANALYTICS_COOKIE_AGE ANALYTICS_ID = settings.ANALYTICS_ID def cookie_exists(request): cookie = request.COOKIES.get(COOKIE_NAME) if cookie: return True else: return False def set_cookie(visitor_id, response): response.set_cookie( COOKIE_NAME, value=visitor_id, max_age=COOKIE_AGE, path=COOKIE_PATH, ) return response def build_params(request, path=None, event=None, referer=None, visitor_id=None, site=None): meta = request.META site = site referer = referer or request.GET.get('r', '') path = path or request.GET.get('p', '/') user_agent = meta.get('HTTP_USER_AGENT', 'Unknown') cookie = request.COOKIES.get(COOKIE_NAME) visitor_id = visitor_id or cookie visitor_ip = meta.get('REMOTE_ADDR', '') try: pagetitle = request.current_page.get_page_title() except: pagetitle = None params = { 'v': VERSION, 'z': str(random.randint(0, 0x7fffffff)), 't': 'pageview', 'dt': pagetitle, 'dh': site, 'dr': referer, 'dp': path, 'tid': ANALYTICS_ID, 'cid': visitor_id, 'uip': visitor_ip, 'ua': user_agent, } return params
Lets create the Celery task that will submit every pageview to our analytics provider:
./analytics/tasks.py
from celery.decorators import task import requests @task(ignore_result=True) def submit_tracking(params, provider_url): response = requests.post( provider_url, data=params) response.raise_for_status()
The middleware class contains two methods.
- process_request() running at request phase and
- process_response() running at response phase.
./analytics/middleware.py
from django.conf import settings from analytics.tracker import build_params, set_cookie, cookie_exists from analytics.tasks import submit_tracking from django.contrib import messages from django.contrib.sites.shortcuts import get_current_site import uuid provider_url = settings.ANALYTICS_PROVIDER_URL class AnalyticsMiddleware(object): def process_request(self, request): if not cookie_exists(request): site = get_current_site(request) messages.info(request, '<strong>Welcome!</strong> ' + site.domain + '\ is using <strong>cookie</strong> technology\ for tracking everything you do.', extra_tags='alert-info') request.session['visitor_id'] = str(uuid.uuid4()) request.session['site'] = site.domain else: return None def process_response(self, request, response): httprspcode = response.status_code if not httprspcode == 200: return response if hasattr(settings, 'ANALYTICS_IGNORE_PATH'): exclude = [p for p in settings.ANALYTICS_IGNORE_PATH if request.path.startswith(p)] if any(exclude): return response path = request.path referer = request.META.get('HTTP_REFERER', '') visitor_id = request.session.get('visitor_id') site = request.session.get('site') params = build_params(request, path=path, referer=referer, visitor_id=visitor_id, site=site) response = set_cookie(visitor_id, response) submit_tracking.delay(params, provider_url) return response
Now, all we have to do is to register the middleware class, the new application and some variables in our project's setting file:
./project/settings.py
MIDDLEWARE_CLASSES = ( # ... 'analytics.middleware.AnalyticsMiddleware', # ... ) # ... INSTALLED_APPS = ( # ... 'analytics', # ... ) # ... # Analytics configuration ANALYTICS_COOKIE_NAME = 'project_stats' ANALYTICS_COOKIE_PATH = '/' ANALYTICS_COOKIE_AGE = 31556926 # 1 year in seconds ANALYTICS_ID = 'UA-xxxxxxxx-x' ANALYTICS_API_VERSION = '1' ANALYTICS_IGNORE_PATH = ['/page1/', '/page2/'] ANALYTICS_PROVIDER_URL = 'https://www.google-analytics.com/collect'
For rendering any message with your base template you can use something like this:
./project/templates/base.html
{% if messages %} {% for message in messages %} <div class="alert {{ message.extra_tags }} alert-dismissible" role="alert"> <button type="button" class="close" data-dismiss="alert" aria-label="Close"> <span aria-hidden="true">×</span> </button> {{ message|safe }} </div> {% endfor %} {% endif %}
Notice that we are using bootstrap alerts for displaying our messages to the user. The specific boltstrap alert class is specified by the messages framework with an extra_tag. This way, we can characterize our messages as "info", "warning" etc. rendering them with the related color.
Downside
There are some things you have to consider when you are thinking about server-side tracking:
- You will need some extra processing in every request-responce cycle. This will have its impact on performance.
- In this demo we used a tracking cookie to identify clients. This introduced some caching releated complications. For example if you willing to use varnish or a CDN service in front of Django, you are making your pages not cacheable. If you are caching using Django’s cache framework there is no problem but it is slower than a front - end caching solution. Of course, client - side tracking (when user allows it) is working well in both of these cases.
- Posted by Kostas Koutsogiannopoulos · Sept. 1, 2016