5 tips to improve communication and take data science to the next level

Editor’s note: Christian Kendall is a data scientist at computer software firm Salford Systems, San Diego.

If you search for skills that data scientists need, you will find communication in the top five or 10, right next to technical skills like Python, R or SQL. A large part of my job is spent figuring out what pieces of information my team or customers will connect with and then finding ways to show how data can be used with different tools to gain insights.

Data scientists in other industries likely spend a lot more time talking to data and software engineers than stepping into meetings with managers and those in business intelligence. Forums for the data science industry offer little advice on how to communicate effectively to develop insights and work with diverse teams. As data scientists, our greatest value lies in our ability to offer insights and bridge gaps between business and technology. The better we communicate, the more we increase the effects of good data science. We use data and technology and turn it into actionable knowledge to make other people’s jobs easier and increase returns. In turn, our colleagues engage us in discussions to create better tests and provide more appropriate data and technology. The cycle brings us on a data-driven ride to high efficiency, high ROI and informed efforts in research and development.

Without communication skills, we struggle to: develop insights, ensure knowledge is actionable, access the basic inputs needed to do the job and generate the appropriate outputs in a way that people will listen. Here are a few tips to take your communication skills to the next level.

1. Jump right in.

Sometimes you need a quick-and-dirty solution to get output for rough plots and figures. Other times you can’t tell if you are on the right track until you get something to work and you see how your data and models behave. Getting something that works gives you a starting point for discussions which can lead to useful findings more quickly.

A while back I read a blog post about applying data science to quality assurance. The hypothetical data scientist started with a simple classifier for checking parts on a production. After talking with an engineer and an economist, the data scientist set thresholds for obviously good parts and obviously bad parts. The parts the model was not as confident about were rejected for further inspection by engineers. A simple working model makes it easier to tune parts of your analysis pipeline. However, in this example the data scientist used the simple model for discussion first. The result was increased performance and a conversation with engineers on how automation can work with them instead of competing against them.

Starting with the simplest case – and a clear idea of how your work will be used – helps you quickly understand your data and make progress. This ensures that you reach useful and presentable conclusions faster, increasing the returns of subsequent work with the data. Initial results allow you to demonstrate and interpret improvements since you can compare models and results to a simple and intuitive strategy. Having a tangible response or performance metric opens up opportunities for you to discuss the problem with both technical and non-technical professionals as opposed to trying a battery of modeling approaches and digging into the idiosyncrasies of the project on your own. Overall, getting some sort of results, or even errors, will inform your next step and give you a better frame of reference on the problem.

2. Consider – and manage – all viewpoints.

Understanding relationships and incorporating them into your approach is key to implementing viable solutions to real problems. Without facilitating communication among groups, your project can quickly unravel. Even challenges that seem to be purely technical always exist within the context of real business needs, real people and real time constraints.

When I work with customers for support or consulting, I must think even more about the relationships clients need to navigate. Almost every time I have a serious talk with a prospect, the technical aspects are not the main question. I recently made a training video to help an analyst automate more of his model construction and selection. He had an important deadline for his boss, who preferred to use open-source tools with an automated pipeline that he wrote. Comparing our software to open-source tools had little to do with getting this customer’s team to use our product. I had to understand the customer and his boss as well as the need for them to have similar automation capabilities to be able to compare or complement each other’s tools.

Data scientists in other industries manage these relationships every day, while also working to understand data engineers and system administrators. An infrastructure is a technology but in practice it has a history and context. A data engineer can only build for and support so many demands. Systems administration is an information technology job but managing environments and resources for a large group often leads to conflicts when developers all have individual requests for installing various software. Keeping track of relationships gives you the full picture of the stumbling blocks to success, allowing you to reframe questions for more effective discussions and actions.

3. Don’t dive into unnecessary details.

One key to generating actionable information is making sure the intended parties can understand and process it. Giving a review of a technical concept should quickly bring those unfamiliar with the material up to speed while cueing audience members with more expertise to focus on particular aspects within a familiar topic.

Data science involves technical details from multiple disciplines and reports are usually prepared for a more results-oriented audience. In run-throughs of my first Salford Systems’ Webinar, I strove to give complete, technically accurate descriptions of each learning machine I demonstrated. Upon review, I realized that this approach gave the wrong cues. Less-technical audience members would either tune out at the first sighting of equations or fret over copying and understanding the mathematical underpinnings of each approach. People with strong statistical backgrounds would scrutinize how the methods were presented or, more likely, lose focus quickly and miss out on the valuable insights from applying and comparing methods. The detailed background sent the signal that the Webinar would teach you new methods, something the scope and time limits would not allow. Audience members that I sought to engage by demonstrating a new application would check out or leave within the first five minutes. I reworked the presentation to lead with a summary to keep both technical and marketing test audiences engaged.

Leading with the conclusion or the big picture lets the audience develop their questions. Diving into details that are outside the scope and bringing up details that are not directly involved in interpreting your findings can confuse your audience and draws attention away from the insights both technical and non-technical audience members came to see.

4. Summarize.

Finding places to insert summaries helps keep your audience on track. When I delivered that first attempt at a Webinar, our senior scientist wanted to see more indications of milestones and summaries as we progressed through different methods and concepts. I had designed my slides according to my experience in research. I’m used to a technical audience paying close attention and waiting to make conclusions (or to tear the presentation apart). Our senior scientist pointed out that you can’t count on long attention spans when presenting at large meetings, conferences or Webinars. Not only will these people have a lot on their minds, but they may take a break to get coffee, nod off or catch up on e-mail – you may get cut off with questions or be asked to speed things up and get to the point. Constantly summarizing the motivation, the previous steps and the big takeaways keeps an entry point open for members of your audience who missed a section because they were physically or mentally absent.

The more you practice making summaries, the better you will understand your project. Summaries are your chance to practice and ask yourself, “What am I really doing? How is it important and what’s the best way to communicate that?” I always search for opportunities to look at my projects from a different perspective. With constant practice, you gain the skills to communicate a quick, two-minute summary of what you are working on when you’re at a bar with friends, in the hotel lobby, at a conference or giving a presentation. If a CEO cuts you off five minutes into your presentation to “get to the point” or you need to reel a sleepy conference-goer back in, you will be ready with a strong understanding of why your project is important as well as a practiced and engaging delivery – tested in every professional and candid situation possible.

5. Frame the discussion.

If you open any textbook about machine learning or statistical methods you will see descriptions accompanied by examples. Even professors with highly-motivated and -educated students provide examples and data sets to work with. Keeping methods and analyses rooted in concrete examples and real situations is not hand-holding or patronizing – it is an effective way for both you and your audience to develop a concept through framing discussions.

For example, I wanted to show how different learning engines handle local trends as an incentive for trying them out in targeted advertising campaigns. I made a few synthetic data sets that I thought highlighted the different behaviors of learning engines. However, my colleagues suggested tying all examples back to the topic of my talk. Looking at my results, I realized that some of the most interesting findings from my real data sets were local trends that models accounted for automatically when using certain learning engines.

Instead of showing that a hypothetical form in the data can affect performance of learning engines differently, I now had an example that kept the material relatable and incentivized the audience. In fact, one of my models partitioned customer income into sub-regions that aligned almost exactly with tax brackets – assigning different trends in behavior for each. By presenting real scenarios, I could point back to the motivation for the presentation and connect a transitional part of the Webinar directly with the intriguing findings and conclusions of the talk. We got to see how learning engines handled real local trends automatically and pointed to the conclusion that you can learn more about your customers or your data by using predictive models for interpretation.

Keeping your discussions anchored with concrete examples that relate to your topic allows you to point back to the reason and motivation for your project. Inevitably you will have to describe the interpretation of a plot, a schematic of a particular process or the meaning of a derived metric. Describing the effects of a treatment you have analyzed with a Cox survival model, for example, is a lot easier when you used real patient numbers, real time scales and real outcomes when you build the concepts of events, survival and hazard. Using examples not only helps the audience understand abstract concepts, but also keeps them involved with your topic. This pays off when discussing results and conclusions.