Julia is a Data Scientist at Stack Overflow, has a PhD in astrophysics and an abiding love for Jane Austen (which we totally understand!). Before moving into Data Science and discovering R, Julia worked in academia and ed tech, and was a NASA Datanaut. She enjoys making beautiful charts, programming in R, text mining, and communicating about technical topics with diverse audiences. In fact, she loves R and text mining so much, she literally wrote the book on it: Text Mining with R: A Tidy Approach!
Lovely to speak to you Julia, could you give us a bit of a background around the work that you do?
The open source work I do focuses on building a bridge between the tidyverse ecosystem of tools and the real world text data that so many of us need to use in our organizations, so we can use powerful, well-designed tidy tools with text data. In my day job, I work at Stack Overflow, using statistics and machine learning to make our site the best place for people who code to learn and share knowledge online, and to help our clients who want to engage with developers be successful.
What led to your career path?
My academic background is in physics and astronomy, where I was an observational astronomer who spent my time “in the trenches” with real-life data. Also, I’ve been heavily involved in education in various forms for a long time, whether speaking, teaching, writing, or otherwise. All of this together informs how I do data science, because a huge part of what I do is communicate with people about what a complex data analysis means. The fact that I analyze some dataset or train some machine learning model is great, but if I can’t explain it to my business partners, then we can’t make decisions.
Could you tell us what to expect from the content of your talk? And are there any key takeaway advice or tips that delegates will come away with?
Many R users working in fields from healthcare to finance to tech deal with messy text data (this includes me at Stack Overflow!); my talk focuses on a practical, flexible approach to use this text data to gain insight and make better decisions.
Can you give an example?
Folks at EARL can expect my talk to start with the fundamentals of exploratory data analysis for text. EDA is a fruitful and important part of the data science process, and in my own work, I know how much bang for the buck I get when I am deliberate about EDA strategies. We won’t stop there, though! We will also cover how to use tidy data principles for supervised and unsupervised machine learning for text.
What inspired you to write your book Text Mining with R – A Tidy Approach?
The book that my collaborator Dave and I wrote together grew organically out of the work we were doing in this space. We started by developing long-form documentation for our R package, invested more time in laying out best practices in workflows through blog posts, and eventually brought a book’s worth of content together in one cohesive, organized place.
Tell us about the type of work you get involved with on a day to day basis.
In my day job at Stack Overflow, I work on two main categories of questions. The first is centered on the ways that we directly generate revenue, through partnering with clients who want to hire, engage with, and enable the world’s developers. The second (which is of course connected to the first) is centered on the public Q&A community of Stack Overflow and the other Stack Exchange sites; I work on questions around how technologies are related to each other and changing, how to scaffold question askers to success, and how to make Stack Overflow more welcoming and inclusive.
What work do you do with the wider data science community and how do you see it evolving?
In my open source work, I maintain my own R packages, blog and speak about data analysis practices and share resources about data science and tech via social media. I have some ideas for new work I am excited about pursuing soon! I would love to evolve my data science work to more fully support best practices in machine learning for text. Another area that I want to continue to invest energy in, both in my day job and community work, is moving data science and tech toward more just and inclusive practices.
Come and see Julia and be inspired about her love for text mining and tidyverse applications at EARL Seattle on 7th November, we are really looking forward to the conference programme in Seattle, Houston and Boston.
Tickets can still be purchased here.