Data Pipeline Evolution
How do you build a big data pipeline when the size of your data starts to get out of hand? How do you improve on the initial model when people are demanding more data, and faster? This talk covers the evolution of a data pipeline in Python, from daily full load to up-to-the-minute event stream, based on our experience at FanDuel. Technologies covered include Amazon EMR, Redshift, Hadoop, Luigi, Spark and Kinesis. We also look at the challenges and trade-offs, and building a big data engineering team.
I’m a data engineer working at FanDuel, one of Scotland‘s two ‘unicorn’ companies, working with Big Data technologies mostly in Python. My previous experience covers the public and private sectors, contracting, startups, and open source. I’ve also taught adults to code and given talks at international programming conferences.