1. Data Product Architectures Benjamin Bengfort @bbengfort District Data Labs 2. Abstract 3. What is data science? Or what is the goal of data science? Or why do they pay us so much? 4. Two Objectives Orient Data Science to Users 5. Data Products are self-adapting, broadly applicable software-based engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data. 6. Data Products are Applications that Employ Many Machine Learning Models 7. Data Report 8. Without Feedback Models are Disconnected They cannot adapt, tune, or react. 9. Data Products aren’t single models So how do we architect data products? 10. The Lambda Architecture 11. Three Case Studies 12. Analyst Architecture 13. Analyst Architecture: Document Review 14. Analyst Architecture: Triggers 15. Recommender Architecture 16. Recommender: Annotation Service 17. Partisan Discourse Architecture 18. Partisan Discourse: Adding Documents 19. Partisan Discourse: Documents 20. Partisan Discourse: User Specific Models 21. Commonalities? 22. Microservices Architecture: Smart Endpoints, Dumb Pipe HTTP HTTP HTTP HTTP HTTPHTTP HTTP Stateful Services Database Backed Services 23. Django Application Model 24. Class Based, Definitional Programming from rest_framework import viewsets class InstanceViewSet(viewsets.ModelViewSet): queryset = Instance.objects.all() serializer_class = InstanceSerializer def list(self, request): pass def create(self, request): pass def retrieve(self, request, pk=None): pass def update(self, request, pk=None): pass def destroy(self, request, pk=None): pass from django.db import models from rest_framework import serializers as rf class InstanceSerializer(rf.ModelSerializer): prediction = rf.CharField(read_only=True) class Meta: model = Instance fields = ('color', 'shape', 'amount') class Instance(models.Model): SHAPES = ('square', 'triangle', 'circle') color = models.CharField(default='red') shape = models.CharField(choices=SHAPES) amount = models.IntegerField() 25. Features and Instances as Star Schema 26. REST API Feature Interaction 27. Model (ML) Build Process: Export Instance Table COPY ( SELECT instances.* FROM instances JOIN feature on feature.id = instance.id ... ORDER BY instance.created LIMIT 10000 ) as instances TO '/tmp/instances.csv' DELIMITER ',' CSV HEADER; 28. Model (ML) Build Process: Build Model import pandas as pd from sklearn.svm import SVC from sklearn.cross_validation import KFold # Load Data data = pd.read_csv('/tm