tl;dr: Choosing the right technology stack can mean the difference between SUCCESS and FAILURE in data science. It can also mean the difference between ‘I Feel Productive’ and ‘Everything I try takes so much time’. When deciding which toolkit / stack be sure to learn from others, consider implementation partners, and always remain cautious.
Note: In addition to this post, I am surveying people to determine which data science technology stacks are being used right now. The survey responses will remain confidential and can be accessed via this link: http://bit.ly/ToolkitSurvey (takes less than 200 seconds!). Those that complete the survey will get a copy of the report (which will detail the pros/cons of popular data science technology stacks).
This context is that I’ve got a short time to evaluate the quality of a data science project recently completed. This project has generated an insight that tells us something about our business. From my evaluation , I need to decide whether the business should act upon the insights generated. Acting upon insights will cost time and money. If the initiative fails, which is healthy and happens often, I’m going to need to both justify my actions and learn from the failure. To do this, I need to make sure to act only upon quality analyses. Repeatedly doing this requires me to learn how to sniff out the good and bad projects quickly.
If I need to quickly evaluate the quality of a data science project, I try answer three simple questions:
- Data Understanding: Has the underlying data supporting the output been correctly understood?
- Insight Value: Are the insights generated valuable to the business?
- Execution: Was the technical execution of the initiative robust?
In my experience, A high-quality data scientist will answer these questions as an output to the project. For a new data scientist (or one without business context), asking questions allows me to quickly reach a conclusion on each.
If you want to be an excellent data scientist, ensuring you nail each of the above the above should be the compass that directs your efforts.
I usually try answer some or all of these questions to reach a conclusion. Feel free to suggest ones I’ve missed in the comments below.
- Has the data quality been assessed? What did this exercise look like? Where there any significant concerns?
- For data that is manually captured by (at some stage) by humans, has this been accounted for? How?
- Is the data history clear? Has this data been through multiple legacy system migrations? Has the impact of this been determined? How was this dealt with?
- How have missing or null values been addressed? What proportion of such values were there?
- Is the organisation’s data capture process clear? Were there any data items (having looked at the data capture process) that didn’t match their definitions?
- Is the insight actionable (can I act upon this insight)? Are there any legal, regulatory, logistical or other challenges that would prevent me from acting upon this insight?
- Is the insight valuable? If this insight is correct, and take action based upon it, will those actions lead to better outcomes (e.g. profits, subscribers, success) for the organisation?
- Is the insight testable? Is it possible to verify the conclusions of the project? If not, does this present a major problem?
- Will acting on the insights be cost-effective relative to the potential gains?
- Are there any major risks to acting upon this insight? Will I be opening a Pandora’s box (e.g.significantly changing our relationship with customers)? How do these risks stack up against the potential benefits?
- Has the code quality been tested?
- Is the statistical model / implementation based on reliable assumptions (e.g. homogeneity of time series patterns),
- Have the model outputs been tested independently of the model (e.g. train/test)?
- Good data governance throughout the project (so nothing got inadvertently messed up)?
- Are the significant data and business assumptions documented?
- Was testing designed and preformed independent of the project delivery?
- What statistical methods were tried and rejected? Why were these rejected?