It has been nearly five months since I officially started my data science journey (And a month officially under my organization’s Data Science Unit), and it is still very daunting on how far I am from being a full fledged Data Scientist.
As a part of a small team of data scientists, it is expected for each of its members to be able to do a wide spectrum of tasks, from initializing development environments and pushing models to productions, to deliberating problems into hypothesis and jumping into subject matter specific equations. It is unsurprising to me as data science as a field is a fundamentally broad subject matter to begin with.
Here are some of the topics I have in my list to be learned within the next three months
- Cross validation theory and implementation
- Basics of text mining and Natural Language Processing, including TF-IDF, doc2vec, lemma and stemming
- Learn how to do a proper model stacking and ensembling (Very important for my job)
- Revisit and resharpen my understanding on time series analysis and forecasting (ARIMA, Autocorrelation)
- Learn how to create effective data pipelines
And the list goes on. Truth to be told it gave me some anxiety at the scale of it all and begs the question: When do I learn all of these? On the job? Over the weekends? At night?
Currently its all three. My, the journey sounds pretty long and far.