Agile Methodologies: How They Fit Into Data Science Processes

 

Agile methodologies are a set of frameworks that help manage projects in an iterative fashion. These methods focus on communication and getting products out there, instead of spending months on gathering requirements.

This software development framework creates opportunities for teams to constantly assess their project’s direction in the development cycle.

There are many different examples of agile methodologies. This includes Scrum, Kanban, and extreme programming (XP).

In this post, we will introduce how these frameworks can play a role in your next data science project.

Agile Manifesto

agile data science
Photo by Luca Campioni on Unsplash

The Agile Manifesto is a succinct set of goals that provide insights into the agile method. Unlike previous ideologies that focused on requirements and documentation, the agile method focuses on working software and customer collaboration. This contrasted older methods of project management, such as the waterfall model method.

The Agile Manifesto is based on these four core values:

  • Individuals and interactions over process and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Response to change over following a plan

These four points don’t outline specific procedures, practices, or best processes for agile software development. It’s more of a philosophical mindset than a rigid framework for software development. That is where methods like scrum and XP come into play.

Scrum

Now, we know that agile is a set of guiding principles that follows an iterative approach to work on software development. In this domain, another important concept to understand is a scrum.

Scrum is one of many agile methodologies used in the software world.

In agile, scrum is just one of many subsets. It tends to offer a simple and lightweight framework to address the complex issues of projects while ensuring the deliverance of high-quality end products. In some cases, scrum can get a little confusing, as teams will often create hybrids that use a bit from other frameworks like Kanban.

There are several key components to scrum. This includes sprints, stories, and a backlog. Other agile methods have similar concepts, but these terms are unique to scrum:

  • Sprints often refer to a two to four-week interval in which the developers and scrum master have a set of features that they aim to complete by the end of the period. This helps the team focus on getting features out quickly. With each sprint lasting two to four weeks, changes to features and overall plans can be made rapidly with little impact to the overall scope of work.
  • User stories refer to the actual tasks that are set up for each sprint. The stories help the developers and project managers keep track of what needs to get done in a sprint. In addition, they allow the developers to break down their work into manageable bites.
  • The backlog is a group of stories that need to be completed by the end of the project. Developers can constantly reexamine priorities and organize the next set of tasks using the backlog as a base.

Why is all this done? There are specific advantages when using scrum to manage a project. Here are a few.

Flexibility

Scrum makes use of user stories to describe the functions which require development. It also adjusts use cases along with each sprint.

Collaboration

With scrum, collaboration becomes a huge benefit since the scrum master, product owner, and the team regularly work closely. Moreover, regular meetings ensure that the work is organized according to the business’s priorities.

Simplicity

Scrum is carried out in the form of sprints, which are only about two to four weeks long. This allows users to constantly push out achievable features without having to wait until the end of a two-year project to find out if their feature works.

Kanban

Kanban is another subset of agile. It was developed for manufacturing processes. One of the mainstays of the Kanban method is the board where these cards that contain tasks are placed. (You can create your own Kanban board here.)

The Kanban style of project management has many other terms and concepts that could take an entire course in itself to start to understand.

The aim of Kanban is to manage and control the flow of features in a process. Now, unlike Scrum, this methodology is not necessarily iterative.

This methodology allows software to be developed in one large cycle, rather than smaller iterations. Despite not following the basic principle of iteration, Kanban still falls into the agile category because it follows many of the principles of the Agile Manifesto.

The projects in this methodology usually have work in progress (WIP) limits that measure the capacity of work. This keeps the team focused.

XP Method

Another agile approach to software development is XP.

XP has several features that distinguish it from the rest of the agile approaches. For instance, XP focuses heavily on communication, feedback, and simplicity. Instead of spending long time gathering requirements, you spend more time developing code.

Each of the extreme programming projects is divided into smaller sections and each section holds its plan—which keeps changing according to performance.

All this being said, XP doesn’t offer the same rigor as Scrum or Kanban. This means it’s very easy to lose control of a project with the XP method. This, in turn, results in a lack of real project management. At the end of the day, this is not conducive to trying to get large projects into production. So most large organizations typically avoid XP.

All of these agile methods have been used in the software world for years effectively. In the past decade, the business world has also seen a surge in data-focused projects. In particular data science. With that comes the need to figure out how to manage these projects in order to improve the data science process.

Data Science Process

When communicating about data science projects, there isn’t a set method of a process for analysis. However, there is a generalized framework that can be used on a data science team.

This is known as the OSEMN framework. Each letter in OSEMN stands for different steps that are required to analyze data. Below, we’ll lay out what happens in each step.

Obtaining Data

The first step is very straightforward.

Data scientists obtain data from available sources. Often developing data pipelines, data scientists will process data using ETL libraries like Airflow to move data from operational databases into analytical ones.

In some cases, this work will be done by a data engineer, but in many cases, data scientists will need to step in.

Scrubbing Data

The next step is to scrub data. This step is used to clean and filter data which directly impacts the results of the analysis. This includes data that could be inaccurate, duplicative, or malformed. Data like that can lead to bad conclusions and results.

In this step, there are various data points you might be attempting to clean. This means removing bad data in some cases. In others, it could mean converting data fields into a standardized format.

Exploring Data

The next step is to use exploratory data analysis techniques.

By combining both rigorous and exploratory analysis methods, data scientists get a deeper understanding of their data. This means there’s an aspect of natural flow when you are analyzing the data. However, it’s important to set goals in order to know when you’re done with your analysis.

Modeling Data

This stage is where the real magic occurs. Before starting to model the data, it’s important to reduce the dimensionality of the dataset under consideration.

In this phase, your team finalizes datasets and writes out business logic and models to share across your organization.

Interpreting Data

The last and final step is data and model interpretation.

In this step, you present your data to other members of your team. You should use visual tools like Tableau and charts. In doing so, your team aligns their models and work with the rest of the business.

How the Data Science Process Aligns With Agile

When it comes to data science, the processes usually include a high degree of uncertainty. Here, agile methodologies align with data science because of several reasons.

Prioritization and Planning

The agile methodology provides data scientists the ability to prioritize models and data according to the goals and requirements of the project. This also helps data scientists give non-technical stakeholders a brief overview of each goal.

Research vs. Development

When it comes to data science projects, it’s hard to plan out exactly what work will be required to finish. You need to constantly run experiments and research. This makes the work more iterative.

Being iterative is a perfect fit for these projects.

Agile: It’s for More Than Just Software

Data science provides valuable results from data models that answer crucial business questions. Using agile methods can be beneficial when managing your data science processes and workflows. Methods like Scrum, Kanban, and XP can help you manage your various data science projects. The key is learning where both disciplines cross over!

Data Analysis Boot Camp

Browse Course
Ben Rogojan
Ben Rogojan