research type analysis

Skip to main content
Skip to primary sidebar
Skip to footer
QuestionPro

Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
Resources Blog eBooks Survey Templates Case Studies Training Help center

Home Market Research

Data Analysis in Research: Types & Methods

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense.

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research.

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words.

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find “food” and “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended text analysis methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other.

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

Content Analysis: It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
Discourse Analysis: Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
Grounded Theory: When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

Fraud: To ensure an actual human being records each response to the survey or the questionnaire
Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
Procedure: To ensure ethical standards were maintained while collecting the data sample
Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

Count, Percent, Frequency
It is used to denote home often a particular event occurs.
Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

Mean, Median, Mode
The method is widely used to demonstrate distribution by various points.
Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

Range, Variance, Standard deviation
Here the field equals high/low points.
Variance standard deviation = difference between the observed score and mean
It is used to identify the spread of scores by stating intervals.
Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

Percentile ranks, Quartile ranks
It relies on standardized scores helping researchers to identify the relationship between different scores.
It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided sample without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected sample to reason that about 80-90% of people like the movie.

Here are two significant areas of inferential statistics.

Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
Cross-tabulation: Also called contingency tables, cross-tabulation is used to analyze the relationship between multiple variables. Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods, and choose samples.

LEARN ABOUT: Best Data Collection Tools

The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing audience sample il to draw a biased inference.
Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

21 Best Contact Center Experience Software in 2024

Apr 12, 2024

Government Customer Experience: Impact on Government Service

Apr 11, 2024

Employee Engagement App: Top 11 For Workforce Improvement

Apr 10, 2024

Top 15 Employee Evaluation Software to Enhance Performance

Other categories.

Academic Research
Artificial Intelligence
Assessments
Brand Awareness
Case Studies
Communities
Consumer Insights
Customer effort score
Customer Engagement
Customer Experience
Customer Loyalty
Customer Research
Customer Satisfaction
Employee Benefits
Employee Engagement
Employee Retention
Friday Five
General Data Protection Regulation
Insights Hub
Life@QuestionPro
Market Research
Mobile diaries
Mobile Surveys
New Features
Online Communities
Question Types
Questionnaire
QuestionPro Products
Release Notes
Research Tools and Apps
Revenue at Risk
Survey Templates
Training Tips
Uncategorized
Video Learning Series
What’s Coming Up
Workforce Intelligence

8 Types of Data Analysis

Data analysis is an aspect of data science and data analytics that is all about analyzing data for different kinds of purposes. The data analysis process involves inspecting, cleaning, transforming and modeling data to draw useful insights from it.

What Are the Different Types of Data Analysis?

Descriptive analysis
Diagnostic analysis
Exploratory analysis
Inferential analysis
Predictive analysis
Causal analysis
Mechanistic analysis
Prescriptive analysis

With its multiple facets, methodologies and techniques, data analysis is used in a variety of fields, including business, science and social science, among others. As businesses thrive under the influence of technological advancements in data analytics, data analysis plays a huge role in decision-making , providing a better, faster and more efficacious system that minimizes risks and reduces human biases .

That said, there are different kinds of data analysis catered with different goals. We’ll examine each one below.

Two Camps of Data Analysis

Data analysis can be divided into two camps, according to the book R for Data Science :

Hypothesis Generation — This involves looking deeply at the data and combining your domain knowledge to generate hypotheses about why the data behaves the way it does.
Hypothesis Confirmation — This involves using a precise mathematical model to generate falsifiable predictions with statistical sophistication to confirm your prior hypotheses.

Types of Data Analysis

Data analysis can be separated and organized into types, arranged in an increasing order of complexity.

1. Descriptive Analysis

The goal of descriptive analysis is to describe or summarize a set of data. Here’s what you need to know:

Descriptive analysis is the very first analysis performed in the data analysis process.
It generates simple summaries about samples and measurements.
It involves common, descriptive statistics like measures of central tendency, variability, frequency and position.

Descriptive Analysis Example

Take the Covid-19 statistics page on Google, for example. The line graph is a pure summary of the cases/deaths, a presentation and description of the population of a particular country infected by the virus.

Descriptive analysis is the first step in analysis where you summarize and describe the data you have using descriptive statistics, and the result is a simple presentation of your data.

More on Data Analysis: Data Analyst vs. Data Scientist: Similarities and Differences Explained

2. Diagnostic Analysis

Diagnostic analysis seeks to answer the question “Why did this happen?” by taking a more in-depth look at data to uncover subtle patterns. Here’s what you need to know:

Diagnostic analysis typically comes after descriptive analysis, taking initial findings and investigating why certain patterns in data happen.
Diagnostic analysis may involve analyzing other related data sources, including past data, to reveal more insights into current data trends.
Diagnostic analysis is ideal for further exploring patterns in data to explain anomalies.

Diagnostic Analysis Example

A footwear store wants to review its website traffic levels over the previous 12 months. Upon compiling and assessing the data, the company’s marketing team finds that June experienced above-average levels of traffic while July and August witnessed slightly lower levels of traffic.

To find out why this difference occurred, the marketing team takes a deeper look. Team members break down the data to focus on specific categories of footwear. In the month of June, they discovered that pages featuring sandals and other beach-related footwear received a high number of views while these numbers dropped in July and August.

Marketers may also review other factors like seasonal changes and company sales events to see if other variables could have contributed to this trend.

3. Exploratory Analysis (EDA)

Exploratory analysis involves examining or exploring data and finding relationships between variables that were previously unknown. Here’s what you need to know:

EDA helps you discover relationships between measures in your data, which are not evidence for the existence of the correlation, as denoted by the phrase, “ Correlation doesn’t imply causation .”
It’s useful for discovering new connections and forming hypotheses. It drives design planning and data collection.

Exploratory Analysis Example

Climate change is an increasingly important topic as the global temperature has gradually risen over the years. One example of an exploratory data analysis on climate change involves taking the rise in temperature over the years from 1950 to 2020 and the increase of human activities and industrialization to find relationships from the data. For example, you may increase the number of factories, cars on the road and airplane flights to see how that correlates with the rise in temperature.

Exploratory analysis explores data to find relationships between measures without identifying the cause. It’s most useful when formulating hypotheses.

4. Inferential Analysis

Inferential analysis involves using a small sample of data to infer information about a larger population of data.

The goal of statistical modeling itself is all about using a small amount of information to extrapolate and generalize information to a larger group. Here’s what you need to know:

Inferential analysis involves using estimated data that is representative of a population and gives a measure of uncertainty or standard deviation to your estimation.
The accuracy of inference depends heavily on your sampling scheme. If the sample isn’t representative of the population, the generalization will be inaccurate. This is known as the central limit theorem .

Inferential Analysis Example

The idea of drawing an inference about the population at large with a smaller sample size is intuitive. Many statistics you see on the media and the internet are inferential; a prediction of an event based on a small sample. For example, a psychological study on the benefits of sleep might have a total of 500 people involved. When they followed up with the candidates, the candidates reported to have better overall attention spans and well-being with seven-to-nine hours of sleep, while those with less sleep and more sleep than the given range suffered from reduced attention spans and energy. This study drawn from 500 people was just a tiny portion of the 7 billion people in the world, and is thus an inference of the larger population.

Inferential analysis extrapolates and generalizes the information of the larger group with a smaller sample to generate analysis and predictions.

5. Predictive Analysis

Predictive analysis involves using historical or current data to find patterns and make predictions about the future. Here’s what you need to know:

The accuracy of the predictions depends on the input variables.
Accuracy also depends on the types of models. A linear model might work well in some cases, and in other cases it might not.
Using a variable to predict another one doesn’t denote a causal relationship.

Predictive Analysis Example

The 2020 US election is a popular topic and many prediction models are built to predict the winning candidate. FiveThirtyEight did this to forecast the 2016 and 2020 elections. Prediction analysis for an election would require input variables such as historical polling data, trends and current polling data in order to return a good prediction. Something as large as an election wouldn’t just be using a linear model, but a complex model with certain tunings to best serve its purpose.

Predictive analysis takes data from the past and present to make predictions about the future.

More on Data: Explaining the Empirical for Normal Distribution

6. Causal Analysis

Causal analysis looks at the cause and effect of relationships between variables and is focused on finding the cause of a correlation. Here’s what you need to know:

To find the cause, you have to question whether the observed correlations driving your conclusion are valid. Just looking at the surface data won’t help you discover the hidden mechanisms underlying the correlations.
Causal analysis is applied in randomized studies focused on identifying causation.
Causal analysis is the gold standard in data analysis and scientific studies where the cause of phenomenon is to be extracted and singled out, like separating wheat from chaff.
Good data is hard to find and requires expensive research and studies. These studies are analyzed in aggregate (multiple groups), and the observed relationships are just average effects (mean) of the whole population. This means the results might not apply to everyone.

Causal Analysis Example

Say you want to test out whether a new drug improves human strength and focus. To do that, you perform randomized control trials for the drug to test its effect. You compare the sample of candidates for your new drug against the candidates receiving a mock control drug through a few tests focused on strength and overall focus and attention. This will allow you to observe how the drug affects the outcome.

Causal analysis is about finding out the causal relationship between variables, and examining how a change in one variable affects another.

7. Mechanistic Analysis

Mechanistic analysis is used to understand exact changes in variables that lead to other changes in other variables. Here’s what you need to know:

It’s applied in physical or engineering sciences, situations that require high precision and little room for error, only noise in data is measurement error.
It’s designed to understand a biological or behavioral process, the pathophysiology of a disease or the mechanism of action of an intervention.

Mechanistic Analysis Example

Many graduate-level research and complex topics are suitable examples, but to put it in simple terms, let’s say an experiment is done to simulate safe and effective nuclear fusion to power the world. A mechanistic analysis of the study would entail a precise balance of controlling and manipulating variables with highly accurate measures of both variables and the desired outcomes. It’s this intricate and meticulous modus operandi toward these big topics that allows for scientific breakthroughs and advancement of society.

Mechanistic analysis is in some ways a predictive analysis, but modified to tackle studies that require high precision and meticulous methodologies for physical or engineering science .

8. Prescriptive Analysis

Prescriptive analysis compiles insights from other previous data analyses and determines actions that teams or companies can take to prepare for predicted trends. Here’s what you need to know:

Prescriptive analysis may come right after predictive analysis, but it may involve combining many different data analyses.
Companies need advanced technology and plenty of resources to conduct prescriptive analysis. AI systems that process data and adjust automated tasks are an example of the technology required to perform prescriptive analysis.

Prescriptive Analysis Example

Prescriptive analysis is pervasive in everyday life, driving the curated content users consume on social media. On platforms like TikTok and Instagram, algorithms can apply prescriptive analysis to review past content a user has engaged with and the kinds of behaviors they exhibited with specific posts. Based on these factors, an algorithm seeks out similar content that is likely to elicit the same response and recommends it on a user’s personal feed.

When to Use the Different Types of Data Analysis

Descriptive analysis summarizes the data at hand and presents your data in a comprehensible way.
Diagnostic analysis takes a more detailed look at data to reveal why certain patterns occur, making it a good method for explaining anomalies.
Exploratory data analysis helps you discover correlations and relationships between variables in your data.
Inferential analysis is for generalizing the larger population with a smaller sample size of data.
Predictive analysis helps you make predictions about the future with data.
Causal analysis emphasizes finding the cause of a correlation between variables.
Mechanistic analysis is for measuring the exact changes in variables that lead to other changes in other variables.
Prescriptive analysis combines insights from different data analyses to develop a course of action teams and companies can take to capitalize on predicted outcomes.

A few important tips to remember about data analysis include:

Correlation doesn’t imply causation.
EDA helps discover new connections and form hypotheses.
Accuracy of inference depends on the sampling scheme.
A good prediction depends on the right input variables.
A simple linear model with enough data usually does the trick.
Using a variable to predict another doesn’t denote causal relationships.
Good data is hard to find, and to produce it requires expensive research.
Results from studies are done in aggregate and are average effects and might not apply to everyone.

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Great Companies Need Great People. That's Where We Come In.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

Knowledge Base
Methodology

Research Methods | Definition, Types, Examples

Research methods are specific procedures for collecting and analysing data. Developing your research methods is an integral part of your research design . When planning your methods, there are two key decisions you will make.

First, decide how you will collect data . Your methods depend on what type of data you need to answer your research question :

Qualitative vs quantitative : Will your data take the form of words or numbers?
Primary vs secondary : Will you collect original data yourself, or will you use data that have already been collected by someone else?
Descriptive vs experimental : Will you take measurements of something as it is, or will you perform an experiment?

Second, decide how you will analyse the data .

For quantitative data, you can use statistical analysis methods to test relationships between variables.
For qualitative data, you can use methods such as thematic analysis to interpret patterns and meanings in the data.

Methods for collecting data, examples of data collection methods, methods for analysing data, examples of data analysis methods, frequently asked questions about methodology.

Data are the information that you collect for the purposes of answering your research question . The type of data you need depends on the aims of your research.

Qualitative vs quantitative data

Your choice of qualitative or quantitative data collection depends on the type of knowledge you want to develop.

For questions about ideas, experiences and meanings, or to study something that can’t be described numerically, collect qualitative data .

If you want to develop a more mechanistic understanding of a topic, or your research involves hypothesis testing , collect quantitative data .

You can also take a mixed methods approach, where you use both qualitative and quantitative research methods.

Primary vs secondary data

Primary data are any original information that you collect for the purposes of answering your research question (e.g. through surveys , observations and experiments ). Secondary data are information that has already been collected by other researchers (e.g. in a government census or previous scientific studies).

If you are exploring a novel research question, you’ll probably need to collect primary data. But if you want to synthesise existing knowledge, analyse historical trends, or identify patterns on a large scale, secondary data might be a better choice.

Descriptive vs experimental data

In descriptive research , you collect data about your study subject without intervening. The validity of your research will depend on your sampling method .

In experimental research , you systematically intervene in a process and measure the outcome. The validity of your research will depend on your experimental design .

To conduct an experiment, you need to be able to vary your independent variable , precisely measure your dependent variable, and control for confounding variables . If it’s practically and ethically possible, this method is the best choice for answering questions about cause and effect.

Prevent plagiarism, run a free check.

Your data analysis methods will depend on the type of data you collect and how you prepare them for analysis.

Data can often be analysed both quantitatively and qualitatively. For example, survey responses could be analysed qualitatively by studying the meanings of responses or quantitatively by studying the frequencies of responses.

Qualitative analysis methods

Qualitative analysis is used to understand words, ideas, and experiences. You can use it to interpret data that were collected:

From open-ended survey and interview questions, literature reviews, case studies, and other sources that use text rather than numbers.
Using non-probability sampling methods .

Qualitative analysis tends to be quite flexible and relies on the researcher’s judgement, so you have to reflect carefully on your choices and assumptions.

Quantitative analysis methods

Quantitative analysis uses numbers and statistics to understand frequencies, averages and correlations (in descriptive studies) or cause-and-effect relationships (in experiments).

You can use quantitative analysis to interpret data that were collected either:

During an experiment.
Using probability sampling methods .

Because the data are collected and analysed in a statistically valid way, the results of quantitative analysis can be easily standardised and shared among researchers.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to test a hypothesis by systematically collecting and analysing data, while qualitative methods allow you to explore ideas and experiences in depth.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

A sample is a subset of individuals from a larger population. Sampling means selecting the group that you will actually collect data from in your research.

For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

Statistical sampling allows you to test a hypothesis about the characteristics of a population. There are various sampling methods you can use to ensure that your sample is representative of the population as a whole.

The research methods you use depend on the type of data you need to answer your research question .

If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts, and meanings, use qualitative methods .
If you want to analyse a large amount of readily available data, use secondary data. If you want data specific to your purposes with control over how they are generated, collect primary data.
If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.

Methods are the specific tools and procedures you use to collect and analyse data (e.g. experiments, surveys , and statistical tests ).

In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .

In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.

Is this article helpful?

Data Analysis: Types, Methods & Techniques (a Complete List)

( Updated Version )

While the term sounds intimidating, “data analysis” is nothing more than making sense of information in a table. It consists of filtering, sorting, grouping, and manipulating data tables with basic algebra and statistics.

In fact, you don’t need experience to understand the basics. You have already worked with data extensively in your life, and “analysis” is nothing more than a fancy word for good sense and basic logic.

Over time, people have intuitively categorized the best logical practices for treating data. These categories are what we call today types , methods , and techniques .

This article provides a comprehensive list of types, methods, and techniques, and explains the difference between them.

For a practical intro to data analysis (including types, methods, & techniques), check out our Intro to Data Analysis eBook for free.

Descriptive, Diagnostic, Predictive, & Prescriptive Analysis

If you Google “types of data analysis,” the first few results will explore descriptive , diagnostic , predictive , and prescriptive analysis. Why? Because these names are easy to understand and are used a lot in “the real world.”

Descriptive analysis is an informational method, diagnostic analysis explains “why” a phenomenon occurs, predictive analysis seeks to forecast the result of an action, and prescriptive analysis identifies solutions to a specific problem.

That said, these are only four branches of a larger analytical tree.

Good data analysts know how to position these four types within other analytical methods and tactics, allowing them to leverage strengths and weaknesses in each to uproot the most valuable insights.

Let’s explore the full analytical tree to understand how to appropriately assess and apply these four traditional types.

Tree diagram of Data Analysis Types, Methods, and Techniques

Here’s a picture to visualize the structure and hierarchy of data analysis types, methods, and techniques.

If it’s too small you can view the picture in a new tab . Open it to follow along!

Note: basic descriptive statistics such as mean , median , and mode , as well as standard deviation , are not shown because most people are already familiar with them. In the diagram, they would fall under the “descriptive” analysis type.

Tree Diagram Explained

The highest-level classification of data analysis is quantitative vs qualitative . Quantitative implies numbers while qualitative implies information other than numbers.

Quantitative data analysis then splits into mathematical analysis and artificial intelligence (AI) analysis . Mathematical types then branch into descriptive , diagnostic , predictive , and prescriptive .

Methods falling under mathematical analysis include clustering , classification , forecasting , and optimization . Qualitative data analysis methods include content analysis , narrative analysis , discourse analysis , framework analysis , and/or grounded theory .

Moreover, mathematical techniques include regression , Nïave Bayes , Simple Exponential Smoothing , cohorts , factors , linear discriminants , and more, whereas techniques falling under the AI type include artificial neural networks , decision trees , evolutionary programming , and fuzzy logic . Techniques under qualitative analysis include text analysis , coding , idea pattern analysis , and word frequency .

It’s a lot to remember! Don’t worry, once you understand the relationship and motive behind all these terms, it’ll be like riding a bike.

We’ll move down the list from top to bottom and I encourage you to open the tree diagram above in a new tab so you can follow along .

But first, let’s just address the elephant in the room: what’s the difference between methods and techniques anyway?

Difference between methods and techniques

Though often used interchangeably, methods ands techniques are not the same. By definition, methods are the process by which techniques are applied, and techniques are the practical application of those methods.

For example, consider driving. Methods include staying in your lane, stopping at a red light, and parking in a spot. Techniques include turning the steering wheel, braking, and pushing the gas pedal.

Data sets: observations and fields

It’s important to understand the basic structure of data tables to comprehend the rest of the article. A data set consists of one far-left column containing observations, then a series of columns containing the fields (aka “traits” or “characteristics”) that describe each observations. For example, imagine we want a data table for fruit. It might look like this:

Now let’s turn to types, methods, and techniques. Each heading below consists of a description, relative importance, the nature of data it explores, and the motivation for using it.

Quantitative Analysis

It accounts for more than 50% of all data analysis and is by far the most widespread and well-known type of data analysis.
As you have seen, it holds descriptive, diagnostic, predictive, and prescriptive methods, which in turn hold some of the most important techniques available today, such as clustering and forecasting.
It can be broken down into mathematical and AI analysis.
Importance : Very high . Quantitative analysis is a must for anyone interesting in becoming or improving as a data analyst.
Nature of Data: data treated under quantitative analysis is, quite simply, quantitative. It encompasses all numeric data.
Motive: to extract insights. (Note: we’re at the top of the pyramid, this gets more insightful as we move down.)

Qualitative Analysis

It accounts for less than 30% of all data analysis and is common in social sciences .
It can refer to the simple recognition of qualitative elements, which is not analytic in any way, but most often refers to methods that assign numeric values to non-numeric data for analysis.
Because of this, some argue that it’s ultimately a quantitative type.
Importance: Medium. In general, knowing qualitative data analysis is not common or even necessary for corporate roles. However, for researchers working in social sciences, its importance is very high .
Nature of Data: data treated under qualitative analysis is non-numeric. However, as part of the analysis, analysts turn non-numeric data into numbers, at which point many argue it is no longer qualitative analysis.
Motive: to extract insights. (This will be more important as we move down the pyramid.)

Mathematical Analysis

Description: mathematical data analysis is a subtype of qualitative data analysis that designates methods and techniques based on statistics, algebra, and logical reasoning to extract insights. It stands in opposition to artificial intelligence analysis.
Importance: Very High. The most widespread methods and techniques fall under mathematical analysis. In fact, it’s so common that many people use “quantitative” and “mathematical” analysis interchangeably.
Nature of Data: numeric. By definition, all data under mathematical analysis are numbers.
Motive: to extract measurable insights that can be used to act upon.

Artificial Intelligence & Machine Learning Analysis

Description: artificial intelligence and machine learning analyses designate techniques based on the titular skills. They are not traditionally mathematical, but they are quantitative since they use numbers. Applications of AI & ML analysis techniques are developing, but they’re not yet mainstream enough to show promise across the field.
Importance: Medium . As of today (September 2020), you don’t need to be fluent in AI & ML data analysis to be a great analyst. BUT, if it’s a field that interests you, learn it. Many believe that in 10 year’s time its importance will be very high .
Nature of Data: numeric.
Motive: to create calculations that build on themselves in order and extract insights without direct input from a human.

Descriptive Analysis

Description: descriptive analysis is a subtype of mathematical data analysis that uses methods and techniques to provide information about the size, dispersion, groupings, and behavior of data sets. This may sounds complicated, but just think about mean, median, and mode: all three are types of descriptive analysis. They provide information about the data set. We’ll look at specific techniques below.
Importance: Very high. Descriptive analysis is among the most commonly used data analyses in both corporations and research today.
Nature of Data: the nature of data under descriptive statistics is sets. A set is simply a collection of numbers that behaves in predictable ways. Data reflects real life, and there are patterns everywhere to be found. Descriptive analysis describes those patterns.
Motive: the motive behind descriptive analysis is to understand how numbers in a set group together, how far apart they are from each other, and how often they occur. As with most statistical analysis, the more data points there are, the easier it is to describe the set.

Diagnostic Analysis

Description: diagnostic analysis answers the question “why did it happen?” It is an advanced type of mathematical data analysis that manipulates multiple techniques, but does not own any single one. Analysts engage in diagnostic analysis when they try to explain why.
Importance: Very high. Diagnostics are probably the most important type of data analysis for people who don’t do analysis because they’re valuable to anyone who’s curious. They’re most common in corporations, as managers often only want to know the “why.”
Nature of Data : data under diagnostic analysis are data sets. These sets in themselves are not enough under diagnostic analysis. Instead, the analyst must know what’s behind the numbers in order to explain “why.” That’s what makes diagnostics so challenging yet so valuable.
Motive: the motive behind diagnostics is to diagnose — to understand why.

Predictive Analysis

Description: predictive analysis uses past data to project future data. It’s very often one of the first kinds of analysis new researchers and corporate analysts use because it is intuitive. It is a subtype of the mathematical type of data analysis, and its three notable techniques are regression, moving average, and exponential smoothing.
Importance: Very high. Predictive analysis is critical for any data analyst working in a corporate environment. Companies always want to know what the future will hold — especially for their revenue.
Nature of Data: Because past and future imply time, predictive data always includes an element of time. Whether it’s minutes, hours, days, months, or years, we call this time series data . In fact, this data is so important that I’ll mention it twice so you don’t forget: predictive analysis uses time series data .
Motive: the motive for investigating time series data with predictive analysis is to predict the future in the most analytical way possible.

Prescriptive Analysis

Description: prescriptive analysis is a subtype of mathematical analysis that answers the question “what will happen if we do X?” It’s largely underestimated in the data analysis world because it requires diagnostic and descriptive analyses to be done before it even starts. More than simple predictive analysis, prescriptive analysis builds entire data models to show how a simple change could impact the ensemble.
Importance: High. Prescriptive analysis is most common under the finance function in many companies. Financial analysts use it to build a financial model of the financial statements that show how that data will change given alternative inputs.
Nature of Data: the nature of data in prescriptive analysis is data sets. These data sets contain patterns that respond differently to various inputs. Data that is useful for prescriptive analysis contains correlations between different variables. It’s through these correlations that we establish patterns and prescribe action on this basis. This analysis cannot be performed on data that exists in a vacuum — it must be viewed on the backdrop of the tangibles behind it.
Motive: the motive for prescriptive analysis is to establish, with an acceptable degree of certainty, what results we can expect given a certain action. As you might expect, this necessitates that the analyst or researcher be aware of the world behind the data, not just the data itself.

Clustering Method

Description: the clustering method groups data points together based on their relativeness closeness to further explore and treat them based on these groupings. There are two ways to group clusters: intuitively and statistically (or K-means).
Importance: Very high. Though most corporate roles group clusters intuitively based on management criteria, a solid understanding of how to group them mathematically is an excellent descriptive and diagnostic approach to allow for prescriptive analysis thereafter.
Nature of Data : the nature of data useful for clustering is sets with 1 or more data fields. While most people are used to looking at only two dimensions (x and y), clustering becomes more accurate the more fields there are.
Motive: the motive for clustering is to understand how data sets group and to explore them further based on those groups.
Here’s an example set:

Classification Method

Description: the classification method aims to separate and group data points based on common characteristics . This can be done intuitively or statistically.
Importance: High. While simple on the surface, classification can become quite complex. It’s very valuable in corporate and research environments, but can feel like its not worth the work. A good analyst can execute it quickly to deliver results.
Nature of Data: the nature of data useful for classification is data sets. As we will see, it can be used on qualitative data as well as quantitative. This method requires knowledge of the substance behind the data, not just the numbers themselves.
Motive: the motive for classification is group data not based on mathematical relationships (which would be clustering), but by predetermined outputs. This is why it’s less useful for diagnostic analysis, and more useful for prescriptive analysis.

Forecasting Method

Description: the forecasting method uses time past series data to forecast the future.
Importance: Very high. Forecasting falls under predictive analysis and is arguably the most common and most important method in the corporate world. It is less useful in research, which prefers to understand the known rather than speculate about the future.
Nature of Data: data useful for forecasting is time series data, which, as we’ve noted, always includes a variable of time.
Motive: the motive for the forecasting method is the same as that of prescriptive analysis: the confidently estimate future values.

Optimization Method

Description: the optimization method maximized or minimizes values in a set given a set of criteria. It is arguably most common in prescriptive analysis. In mathematical terms, it is maximizing or minimizing a function given certain constraints.
Importance: Very high. The idea of optimization applies to more analysis types than any other method. In fact, some argue that it is the fundamental driver behind data analysis. You would use it everywhere in research and in a corporation.
Nature of Data: the nature of optimizable data is a data set of at least two points.
Motive: the motive behind optimization is to achieve the best result possible given certain conditions.

Content Analysis Method

Description: content analysis is a method of qualitative analysis that quantifies textual data to track themes across a document. It’s most common in academic fields and in social sciences, where written content is the subject of inquiry.
Importance: High. In a corporate setting, content analysis as such is less common. If anything Nïave Bayes (a technique we’ll look at below) is the closest corporations come to text. However, it is of the utmost importance for researchers. If you’re a researcher, check out this article on content analysis .
Nature of Data: data useful for content analysis is textual data.
Motive: the motive behind content analysis is to understand themes expressed in a large text

Narrative Analysis Method

Description: narrative analysis is a method of qualitative analysis that quantifies stories to trace themes in them. It’s differs from content analysis because it focuses on stories rather than research documents, and the techniques used are slightly different from those in content analysis (very nuances and outside the scope of this article).
Importance: Low. Unless you are highly specialized in working with stories, narrative analysis rare.
Nature of Data: the nature of the data useful for the narrative analysis method is narrative text.
Motive: the motive for narrative analysis is to uncover hidden patterns in narrative text.

Discourse Analysis Method

Description: the discourse analysis method falls under qualitative analysis and uses thematic coding to trace patterns in real-life discourse. That said, real-life discourse is oral, so it must first be transcribed into text.
Importance: Low. Unless you are focused on understand real-world idea sharing in a research setting, this kind of analysis is less common than the others on this list.
Nature of Data: the nature of data useful in discourse analysis is first audio files, then transcriptions of those audio files.
Motive: the motive behind discourse analysis is to trace patterns of real-world discussions. (As a spooky sidenote, have you ever felt like your phone microphone was listening to you and making reading suggestions? If it was, the method was discourse analysis.)

Framework Analysis Method

Description: the framework analysis method falls under qualitative analysis and uses similar thematic coding techniques to content analysis. However, where content analysis aims to discover themes, framework analysis starts with a framework and only considers elements that fall in its purview.
Importance: Low. As with the other textual analysis methods, framework analysis is less common in corporate settings. Even in the world of research, only some use it. Strangely, it’s very common for legislative and political research.
Nature of Data: the nature of data useful for framework analysis is textual.
Motive: the motive behind framework analysis is to understand what themes and parts of a text match your search criteria.

Grounded Theory Method

Description: the grounded theory method falls under qualitative analysis and uses thematic coding to build theories around those themes.
Importance: Low. Like other qualitative analysis techniques, grounded theory is less common in the corporate world. Even among researchers, you would be hard pressed to find many using it. Though powerful, it’s simply too rare to spend time learning.
Nature of Data: the nature of data useful in the grounded theory method is textual.
Motive: the motive of grounded theory method is to establish a series of theories based on themes uncovered from a text.

Clustering Technique: K-Means

Description: k-means is a clustering technique in which data points are grouped in clusters that have the closest means. Though not considered AI or ML, it inherently requires the use of supervised learning to reevaluate clusters as data points are added. Clustering techniques can be used in diagnostic, descriptive, & prescriptive data analyses.
Importance: Very important. If you only take 3 things from this article, k-means clustering should be part of it. It is useful in any situation where n observations have multiple characteristics and we want to put them in groups.
Nature of Data: the nature of data is at least one characteristic per observation, but the more the merrier.
Motive: the motive for clustering techniques such as k-means is to group observations together and either understand or react to them.

Regression Technique

Description: simple and multivariable regressions use either one independent variable or combination of multiple independent variables to calculate a correlation to a single dependent variable using constants. Regressions are almost synonymous with correlation today.
Importance: Very high. Along with clustering, if you only take 3 things from this article, regression techniques should be part of it. They’re everywhere in corporate and research fields alike.
Nature of Data: the nature of data used is regressions is data sets with “n” number of observations and as many variables as are reasonable. It’s important, however, to distinguish between time series data and regression data. You cannot use regressions or time series data without accounting for time. The easier way is to use techniques under the forecasting method.
Motive: The motive behind regression techniques is to understand correlations between independent variable(s) and a dependent one.

Nïave Bayes Technique

Description: Nïave Bayes is a classification technique that uses simple probability to classify items based previous classifications. In plain English, the formula would be “the chance that thing with trait x belongs to class c depends on (=) the overall chance of trait x belonging to class c, multiplied by the overall chance of class c, divided by the overall chance of getting trait x.” As a formula, it’s P(c|x) = P(x|c) * P(c) / P(x).
Importance: High. Nïave Bayes is a very common, simplistic classification techniques because it’s effective with large data sets and it can be applied to any instant in which there is a class. Google, for example, might use it to group webpages into groups for certain search engine queries.
Nature of Data: the nature of data for Nïave Bayes is at least one class and at least two traits in a data set.
Motive: the motive behind Nïave Bayes is to classify observations based on previous data. It’s thus considered part of predictive analysis.

Cohorts Technique

Description: cohorts technique is a type of clustering method used in behavioral sciences to separate users by common traits. As with clustering, it can be done intuitively or mathematically, the latter of which would simply be k-means.
Importance: Very high. With regard to resembles k-means, the cohort technique is more of a high-level counterpart. In fact, most people are familiar with it as a part of Google Analytics. It’s most common in marketing departments in corporations, rather than in research.
Nature of Data: the nature of cohort data is data sets in which users are the observation and other fields are used as defining traits for each cohort.
Motive: the motive for cohort analysis techniques is to group similar users and analyze how you retain them and how the churn.

Factor Technique

Description: the factor analysis technique is a way of grouping many traits into a single factor to expedite analysis. For example, factors can be used as traits for Nïave Bayes classifications instead of more general fields.
Importance: High. While not commonly employed in corporations, factor analysis is hugely valuable. Good data analysts use it to simplify their projects and communicate them more clearly.
Nature of Data: the nature of data useful in factor analysis techniques is data sets with a large number of fields on its observations.
Motive: the motive for using factor analysis techniques is to reduce the number of fields in order to more quickly analyze and communicate findings.

Linear Discriminants Technique

Description: linear discriminant analysis techniques are similar to regressions in that they use one or more independent variable to determine a dependent variable; however, the linear discriminant technique falls under a classifier method since it uses traits as independent variables and class as a dependent variable. In this way, it becomes a classifying method AND a predictive method.
Importance: High. Though the analyst world speaks of and uses linear discriminants less commonly, it’s a highly valuable technique to keep in mind as you progress in data analysis.
Nature of Data: the nature of data useful for the linear discriminant technique is data sets with many fields.
Motive: the motive for using linear discriminants is to classify observations that would be otherwise too complex for simple techniques like Nïave Bayes.

Exponential Smoothing Technique

Description: exponential smoothing is a technique falling under the forecasting method that uses a smoothing factor on prior data in order to predict future values. It can be linear or adjusted for seasonality. The basic principle behind exponential smoothing is to use a percent weight (value between 0 and 1 called alpha) on more recent values in a series and a smaller percent weight on less recent values. The formula is f(x) = current period value * alpha + previous period value * 1-alpha.
Importance: High. Most analysts still use the moving average technique (covered next) for forecasting, though it is less efficient than exponential moving, because it’s easy to understand. However, good analysts will have exponential smoothing techniques in their pocket to increase the value of their forecasts.
Nature of Data: the nature of data useful for exponential smoothing is time series data . Time series data has time as part of its fields .
Motive: the motive for exponential smoothing is to forecast future values with a smoothing variable.

Moving Average Technique

Description: the moving average technique falls under the forecasting method and uses an average of recent values to predict future ones. For example, to predict rainfall in April, you would take the average of rainfall from January to March. It’s simple, yet highly effective.
Importance: Very high. While I’m personally not a huge fan of moving averages due to their simplistic nature and lack of consideration for seasonality, they’re the most common forecasting technique and therefore very important.
Nature of Data: the nature of data useful for moving averages is time series data .
Motive: the motive for moving averages is to predict future values is a simple, easy-to-communicate way.

Neural Networks Technique

Description: neural networks are a highly complex artificial intelligence technique that replicate a human’s neural analysis through a series of hyper-rapid computations and comparisons that evolve in real time. This technique is so complex that an analyst must use computer programs to perform it.
Importance: Medium. While the potential for neural networks is theoretically unlimited, it’s still little understood and therefore uncommon. You do not need to know it by any means in order to be a data analyst.
Nature of Data: the nature of data useful for neural networks is data sets of astronomical size, meaning with 100s of 1000s of fields and the same number of row at a minimum .
Motive: the motive for neural networks is to understand wildly complex phenomenon and data to thereafter act on it.

Decision Tree Technique

Description: the decision tree technique uses artificial intelligence algorithms to rapidly calculate possible decision pathways and their outcomes on a real-time basis. It’s so complex that computer programs are needed to perform it.
Importance: Medium. As with neural networks, decision trees with AI are too little understood and are therefore uncommon in corporate and research settings alike.
Nature of Data: the nature of data useful for the decision tree technique is hierarchical data sets that show multiple optional fields for each preceding field.
Motive: the motive for decision tree techniques is to compute the optimal choices to make in order to achieve a desired result.

Evolutionary Programming Technique

Description: the evolutionary programming technique uses a series of neural networks, sees how well each one fits a desired outcome, and selects only the best to test and retest. It’s called evolutionary because is resembles the process of natural selection by weeding out weaker options.
Importance: Medium. As with the other AI techniques, evolutionary programming just isn’t well-understood enough to be usable in many cases. It’s complexity also makes it hard to explain in corporate settings and difficult to defend in research settings.
Nature of Data: the nature of data in evolutionary programming is data sets of neural networks, or data sets of data sets.
Motive: the motive for using evolutionary programming is similar to decision trees: understanding the best possible option from complex data.
Video example :

Fuzzy Logic Technique

Description: fuzzy logic is a type of computing based on “approximate truths” rather than simple truths such as “true” and “false.” It is essentially two tiers of classification. For example, to say whether “Apples are good,” you need to first classify that “Good is x, y, z.” Only then can you say apples are good. Another way to see it helping a computer see truth like humans do: “definitely true, probably true, maybe true, probably false, definitely false.”
Importance: Medium. Like the other AI techniques, fuzzy logic is uncommon in both research and corporate settings, which means it’s less important in today’s world.
Nature of Data: the nature of fuzzy logic data is huge data tables that include other huge data tables with a hierarchy including multiple subfields for each preceding field.
Motive: the motive of fuzzy logic to replicate human truth valuations in a computer is to model human decisions based on past data. The obvious possible application is marketing.

Text Analysis Technique

Description: text analysis techniques fall under the qualitative data analysis type and use text to extract insights.
Importance: Medium. Text analysis techniques, like all the qualitative analysis type, are most valuable for researchers.
Nature of Data: the nature of data useful in text analysis is words.
Motive: the motive for text analysis is to trace themes in a text across sets of very long documents, such as books.

Coding Technique

Description: the coding technique is used in textual analysis to turn ideas into uniform phrases and analyze the number of times and the ways in which those ideas appear. For this reason, some consider it a quantitative technique as well. You can learn more about coding and the other qualitative techniques here .
Importance: Very high. If you’re a researcher working in social sciences, coding is THE analysis techniques, and for good reason. It’s a great way to add rigor to analysis. That said, it’s less common in corporate settings.
Nature of Data: the nature of data useful for coding is long text documents.
Motive: the motive for coding is to make tracing ideas on paper more than an exercise of the mind by quantifying it and understanding is through descriptive methods.

Idea Pattern Technique

Description: the idea pattern analysis technique fits into coding as the second step of the process. Once themes and ideas are coded, simple descriptive analysis tests may be run. Some people even cluster the ideas!
Importance: Very high. If you’re a researcher, idea pattern analysis is as important as the coding itself.
Nature of Data: the nature of data useful for idea pattern analysis is already coded themes.
Motive: the motive for the idea pattern technique is to trace ideas in otherwise unmanageably-large documents.

Word Frequency Technique

Description: word frequency is a qualitative technique that stands in opposition to coding and uses an inductive approach to locate specific words in a document in order to understand its relevance. Word frequency is essentially the descriptive analysis of qualitative data because it uses stats like mean, median, and mode to gather insights.
Importance: High. As with the other qualitative approaches, word frequency is very important in social science research, but less so in corporate settings.
Nature of Data: the nature of data useful for word frequency is long, informative documents.
Motive: the motive for word frequency is to locate target words to determine the relevance of a document in question.

Types of data analysis in research

Types of data analysis in research methodology include every item discussed in this article. As a list, they are:

Quantitative
Qualitative
Mathematical
Machine Learning and AI
Descriptive
Prescriptive
Classification
Forecasting
Optimization
Grounded theory
Artificial Neural Networks
Decision Trees
Evolutionary Programming
Fuzzy Logic
Text analysis
Idea Pattern Analysis
Word Frequency Analysis
Nïave Bayes
Exponential smoothing
Moving average
Linear discriminant

Types of data analysis in qualitative research

As a list, the types of data analysis in qualitative research are the following methods:

Types of data analysis in quantitative research

As a list, the types of data analysis in quantitative research are:

Data analysis methods

As a list, data analysis methods are:

Content (qualitative)
Narrative (qualitative)
Discourse (qualitative)
Framework (qualitative)
Grounded theory (qualitative)

Quantitative data analysis methods

As a list, quantitative data analysis methods are:

Tabular View of Data Analysis Types, Methods, and Techniques

About the author.

Noah is the founder & Editor-in-Chief at AnalystAnswers. He is a transatlantic professional and entrepreneur with 5+ years of corporate finance and data analytics experience, as well as 3+ years in consumer financial products and business software. He started AnalystAnswers to provide aspiring professionals with accessible explanations of otherwise dense finance and data concepts. Noah believes everyone can benefit from an analytical mindset in growing digital world. When he's not busy at work, Noah likes to explore new European cities, exercise, and spend time with friends and family.

File available immediately.

Notice: JavaScript is required for this content.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Korean J Anesthesiol
v.71(2); 2018 Apr

Introduction to systematic review and meta-analysis

1 Department of Anesthesiology and Pain Medicine, Inje University Seoul Paik Hospital, Seoul, Korea

2 Department of Anesthesiology and Pain Medicine, Chung-Ang University College of Medicine, Seoul, Korea

Systematic reviews and meta-analyses present results by combining and analyzing data from different studies conducted on similar research topics. In recent years, systematic reviews and meta-analyses have been actively performed in various fields including anesthesiology. These research methods are powerful tools that can overcome the difficulties in performing large-scale randomized controlled trials. However, the inclusion of studies with any biases or improperly assessed quality of evidence in systematic reviews and meta-analyses could yield misleading results. Therefore, various guidelines have been suggested for conducting systematic reviews and meta-analyses to help standardize them and improve their quality. Nonetheless, accepting the conclusions of many studies without understanding the meta-analysis can be dangerous. Therefore, this article provides an easy introduction to clinicians on performing and understanding meta-analyses.

Introduction

A systematic review collects all possible studies related to a given topic and design, and reviews and analyzes their results [ 1 ]. During the systematic review process, the quality of studies is evaluated, and a statistical meta-analysis of the study results is conducted on the basis of their quality. A meta-analysis is a valid, objective, and scientific method of analyzing and combining different results. Usually, in order to obtain more reliable results, a meta-analysis is mainly conducted on randomized controlled trials (RCTs), which have a high level of evidence [ 2 ] ( Fig. 1 ). Since 1999, various papers have presented guidelines for reporting meta-analyses of RCTs. Following the Quality of Reporting of Meta-analyses (QUORUM) statement [ 3 ], and the appearance of registers such as Cochrane Library’s Methodology Register, a large number of systematic literature reviews have been registered. In 2009, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [ 4 ] was published, and it greatly helped standardize and improve the quality of systematic reviews and meta-analyses [ 5 ].

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f1.jpg

Levels of evidence.

In anesthesiology, the importance of systematic reviews and meta-analyses has been highlighted, and they provide diagnostic and therapeutic value to various areas, including not only perioperative management but also intensive care and outpatient anesthesia [6–13]. Systematic reviews and meta-analyses include various topics, such as comparing various treatments of postoperative nausea and vomiting [ 14 , 15 ], comparing general anesthesia and regional anesthesia [ 16 – 18 ], comparing airway maintenance devices [ 8 , 19 ], comparing various methods of postoperative pain control (e.g., patient-controlled analgesia pumps, nerve block, or analgesics) [ 20 – 23 ], comparing the precision of various monitoring instruments [ 7 ], and meta-analysis of dose-response in various drugs [ 12 ].

Thus, literature reviews and meta-analyses are being conducted in diverse medical fields, and the aim of highlighting their importance is to help better extract accurate, good quality data from the flood of data being produced. However, a lack of understanding about systematic reviews and meta-analyses can lead to incorrect outcomes being derived from the review and analysis processes. If readers indiscriminately accept the results of the many meta-analyses that are published, incorrect data may be obtained. Therefore, in this review, we aim to describe the contents and methods used in systematic reviews and meta-analyses in a way that is easy to understand for future authors and readers of systematic review and meta-analysis.

Study Planning

It is easy to confuse systematic reviews and meta-analyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A meta-analysis differs from a systematic review in that it uses statistical methods on estimates from two or more different studies to form a pooled estimate [ 1 ]. Following a systematic review, if it is not possible to form a pooled estimate, it can be published as is without progressing to a meta-analysis; however, if it is possible to form a pooled estimate from the extracted data, a meta-analysis can be attempted. Systematic reviews and meta-analyses usually proceed according to the flowchart presented in Fig. 2 . We explain each of the stages below.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f2.jpg

Flowchart illustrating a systematic review.

Formulating research questions

A systematic review attempts to gather all available empirical research by using clearly defined, systematic methods to obtain answers to a specific question. A meta-analysis is the statistical process of analyzing and combining results from several similar studies. Here, the definition of the word “similar” is not made clear, but when selecting a topic for the meta-analysis, it is essential to ensure that the different studies present data that can be combined. If the studies contain data on the same topic that can be combined, a meta-analysis can even be performed using data from only two studies. However, study selection via a systematic review is a precondition for performing a meta-analysis, and it is important to clearly define the Population, Intervention, Comparison, Outcomes (PICO) parameters that are central to evidence-based research. In addition, selection of the research topic is based on logical evidence, and it is important to select a topic that is familiar to readers without clearly confirmed the evidence [ 24 ].

Protocols and registration

In systematic reviews, prior registration of a detailed research plan is very important. In order to make the research process transparent, primary/secondary outcomes and methods are set in advance, and in the event of changes to the method, other researchers and readers are informed when, how, and why. Many studies are registered with an organization like PROSPERO ( http://www.crd.york.ac.uk/PROSPERO/ ), and the registration number is recorded when reporting the study, in order to share the protocol at the time of planning.

Defining inclusion and exclusion criteria

Information is included on the study design, patient characteristics, publication status (published or unpublished), language used, and research period. If there is a discrepancy between the number of patients included in the study and the number of patients included in the analysis, this needs to be clearly explained while describing the patient characteristics, to avoid confusing the reader.

Literature search and study selection

In order to secure proper basis for evidence-based research, it is essential to perform a broad search that includes as many studies as possible that meet the inclusion and exclusion criteria. Typically, the three bibliographic databases Medline, Embase, and Cochrane Central Register of Controlled Trials (CENTRAL) are used. In domestic studies, the Korean databases KoreaMed, KMBASE, and RISS4U may be included. Effort is required to identify not only published studies but also abstracts, ongoing studies, and studies awaiting publication. Among the studies retrieved in the search, the researchers remove duplicate studies, select studies that meet the inclusion/exclusion criteria based on the abstracts, and then make the final selection of studies based on their full text. In order to maintain transparency and objectivity throughout this process, study selection is conducted independently by at least two investigators. When there is a inconsistency in opinions, intervention is required via debate or by a third reviewer. The methods for this process also need to be planned in advance. It is essential to ensure the reproducibility of the literature selection process [ 25 ].

Quality of evidence

However, well planned the systematic review or meta-analysis is, if the quality of evidence in the studies is low, the quality of the meta-analysis decreases and incorrect results can be obtained [ 26 ]. Even when using randomized studies with a high quality of evidence, evaluating the quality of evidence precisely helps determine the strength of recommendations in the meta-analysis. One method of evaluating the quality of evidence in non-randomized studies is the Newcastle-Ottawa Scale, provided by the Ottawa Hospital Research Institute 1) . However, we are mostly focusing on meta-analyses that use randomized studies.

If the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) system ( http://www.gradeworkinggroup.org/ ) is used, the quality of evidence is evaluated on the basis of the study limitations, inaccuracies, incompleteness of outcome data, indirectness of evidence, and risk of publication bias, and this is used to determine the strength of recommendations [ 27 ]. As shown in Table 1 , the study limitations are evaluated using the “risk of bias” method proposed by Cochrane 2) . This method classifies bias in randomized studies as “low,” “high,” or “unclear” on the basis of the presence or absence of six processes (random sequence generation, allocation concealment, blinding participants or investigators, incomplete outcome data, selective reporting, and other biases) [ 28 ].

The Cochrane Collaboration’s Tool for Assessing the Risk of Bias [ 28 ]

Data extraction

Two different investigators extract data based on the objectives and form of the study; thereafter, the extracted data are reviewed. Since the size and format of each variable are different, the size and format of the outcomes are also different, and slight changes may be required when combining the data [ 29 ]. If there are differences in the size and format of the outcome variables that cause difficulties combining the data, such as the use of different evaluation instruments or different evaluation timepoints, the analysis may be limited to a systematic review. The investigators resolve differences of opinion by debate, and if they fail to reach a consensus, a third-reviewer is consulted.

Data Analysis

The aim of a meta-analysis is to derive a conclusion with increased power and accuracy than what could not be able to achieve in individual studies. Therefore, before analysis, it is crucial to evaluate the direction of effect, size of effect, homogeneity of effects among studies, and strength of evidence [ 30 ]. Thereafter, the data are reviewed qualitatively and quantitatively. If it is determined that the different research outcomes cannot be combined, all the results and characteristics of the individual studies are displayed in a table or in a descriptive form; this is referred to as a qualitative review. A meta-analysis is a quantitative review, in which the clinical effectiveness is evaluated by calculating the weighted pooled estimate for the interventions in at least two separate studies.

The pooled estimate is the outcome of the meta-analysis, and is typically explained using a forest plot ( Figs. 3 and and4). 4 ). The black squares in the forest plot are the odds ratios (ORs) and 95% confidence intervals in each study. The area of the squares represents the weight reflected in the meta-analysis. The black diamond represents the OR and 95% confidence interval calculated across all the included studies. The bold vertical line represents a lack of therapeutic effect (OR = 1); if the confidence interval includes OR = 1, it means no significant difference was found between the treatment and control groups.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f3.jpg

Forest plot analyzed by two different models using the same data. (A) Fixed-effect model. (B) Random-effect model. The figure depicts individual trials as filled squares with the relative sample size and the solid line as the 95% confidence interval of the difference. The diamond shape indicates the pooled estimate and uncertainty for the combined effect. The vertical line indicates the treatment group shows no effect (OR = 1). Moreover, if the confidence interval includes 1, then the result shows no evidence of difference between the treatment and control groups.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f4.jpg

Forest plot representing homogeneous data.

Dichotomous variables and continuous variables

In data analysis, outcome variables can be considered broadly in terms of dichotomous variables and continuous variables. When combining data from continuous variables, the mean difference (MD) and standardized mean difference (SMD) are used ( Table 2 ).

Summary of Meta-analysis Methods Available in RevMan [ 28 ]

The MD is the absolute difference in mean values between the groups, and the SMD is the mean difference between groups divided by the standard deviation. When results are presented in the same units, the MD can be used, but when results are presented in different units, the SMD should be used. When the MD is used, the combined units must be shown. A value of “0” for the MD or SMD indicates that the effects of the new treatment method and the existing treatment method are the same. A value lower than “0” means the new treatment method is less effective than the existing method, and a value greater than “0” means the new treatment is more effective than the existing method.

When combining data for dichotomous variables, the OR, risk ratio (RR), or risk difference (RD) can be used. The RR and RD can be used for RCTs, quasi-experimental studies, or cohort studies, and the OR can be used for other case-control studies or cross-sectional studies. However, because the OR is difficult to interpret, using the RR and RD, if possible, is recommended. If the outcome variable is a dichotomous variable, it can be presented as the number needed to treat (NNT), which is the minimum number of patients who need to be treated in the intervention group, compared to the control group, for a given event to occur in at least one patient. Based on Table 3 , in an RCT, if x is the probability of the event occurring in the control group and y is the probability of the event occurring in the intervention group, then x = c/(c + d), y = a/(a + b), and the absolute risk reduction (ARR) = x − y. NNT can be obtained as the reciprocal, 1/ARR.

Calculation of the Number Needed to Treat in the Dichotomous table

Fixed-effect models and random-effect models

In order to analyze effect size, two types of models can be used: a fixed-effect model or a random-effect model. A fixed-effect model assumes that the effect of treatment is the same, and that variation between results in different studies is due to random error. Thus, a fixed-effect model can be used when the studies are considered to have the same design and methodology, or when the variability in results within a study is small, and the variance is thought to be due to random error. Three common methods are used for weighted estimation in a fixed-effect model: 1) inverse variance-weighted estimation 3) , 2) Mantel-Haenszel estimation 4) , and 3) Peto estimation 5) .

A random-effect model assumes heterogeneity between the studies being combined, and these models are used when the studies are assumed different, even if a heterogeneity test does not show a significant result. Unlike a fixed-effect model, a random-effect model assumes that the size of the effect of treatment differs among studies. Thus, differences in variation among studies are thought to be due to not only random error but also between-study variability in results. Therefore, weight does not decrease greatly for studies with a small number of patients. Among methods for weighted estimation in a random-effect model, the DerSimonian and Laird method 6) is mostly used for dichotomous variables, as the simplest method, while inverse variance-weighted estimation is used for continuous variables, as with fixed-effect models. These four methods are all used in Review Manager software (The Cochrane Collaboration, UK), and are described in a study by Deeks et al. [ 31 ] ( Table 2 ). However, when the number of studies included in the analysis is less than 10, the Hartung-Knapp-Sidik-Jonkman method 7) can better reduce the risk of type 1 error than does the DerSimonian and Laird method [ 32 ].

Fig. 3 shows the results of analyzing outcome data using a fixed-effect model (A) and a random-effect model (B). As shown in Fig. 3 , while the results from large studies are weighted more heavily in the fixed-effect model, studies are given relatively similar weights irrespective of study size in the random-effect model. Although identical data were being analyzed, as shown in Fig. 3 , the significant result in the fixed-effect model was no longer significant in the random-effect model. One representative example of the small study effect in a random-effect model is the meta-analysis by Li et al. [ 33 ]. In a large-scale study, intravenous injection of magnesium was unrelated to acute myocardial infarction, but in the random-effect model, which included numerous small studies, the small study effect resulted in an association being found between intravenous injection of magnesium and myocardial infarction. This small study effect can be controlled for by using a sensitivity analysis, which is performed to examine the contribution of each of the included studies to the final meta-analysis result. In particular, when heterogeneity is suspected in the study methods or results, by changing certain data or analytical methods, this method makes it possible to verify whether the changes affect the robustness of the results, and to examine the causes of such effects [ 34 ].

Heterogeneity

Homogeneity test is a method whether the degree of heterogeneity is greater than would be expected to occur naturally when the effect size calculated from several studies is higher than the sampling error. This makes it possible to test whether the effect size calculated from several studies is the same. Three types of homogeneity tests can be used: 1) forest plot, 2) Cochrane’s Q test (chi-squared), and 3) Higgins I 2 statistics. In the forest plot, as shown in Fig. 4 , greater overlap between the confidence intervals indicates greater homogeneity. For the Q statistic, when the P value of the chi-squared test, calculated from the forest plot in Fig. 4 , is less than 0.1, it is considered to show statistical heterogeneity and a random-effect can be used. Finally, I 2 can be used [ 35 ].

I 2 , calculated as shown above, returns a value between 0 and 100%. A value less than 25% is considered to show strong homogeneity, a value of 50% is average, and a value greater than 75% indicates strong heterogeneity.

Even when the data cannot be shown to be homogeneous, a fixed-effect model can be used, ignoring the heterogeneity, and all the study results can be presented individually, without combining them. However, in many cases, a random-effect model is applied, as described above, and a subgroup analysis or meta-regression analysis is performed to explain the heterogeneity. In a subgroup analysis, the data are divided into subgroups that are expected to be homogeneous, and these subgroups are analyzed. This needs to be planned in the predetermined protocol before starting the meta-analysis. A meta-regression analysis is similar to a normal regression analysis, except that the heterogeneity between studies is modeled. This process involves performing a regression analysis of the pooled estimate for covariance at the study level, and so it is usually not considered when the number of studies is less than 10. Here, univariate and multivariate regression analyses can both be considered.

Publication bias

Publication bias is the most common type of reporting bias in meta-analyses. This refers to the distortion of meta-analysis outcomes due to the higher likelihood of publication of statistically significant studies rather than non-significant studies. In order to test the presence or absence of publication bias, first, a funnel plot can be used ( Fig. 5 ). Studies are plotted on a scatter plot with effect size on the x-axis and precision or total sample size on the y-axis. If the points form an upside-down funnel shape, with a broad base that narrows towards the top of the plot, this indicates the absence of a publication bias ( Fig. 5A ) [ 29 , 36 ]. On the other hand, if the plot shows an asymmetric shape, with no points on one side of the graph, then publication bias can be suspected ( Fig. 5B ). Second, to test publication bias statistically, Begg and Mazumdar’s rank correlation test 8) [ 37 ] or Egger’s test 9) [ 29 ] can be used. If publication bias is detected, the trim-and-fill method 10) can be used to correct the bias [ 38 ]. Fig. 6 displays results that show publication bias in Egger’s test, which has then been corrected using the trim-and-fill method using Comprehensive Meta-Analysis software (Biostat, USA).

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f5.jpg

Funnel plot showing the effect size on the x-axis and sample size on the y-axis as a scatter plot. (A) Funnel plot without publication bias. The individual plots are broader at the bottom and narrower at the top. (B) Funnel plot with publication bias. The individual plots are located asymmetrically.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f6.jpg

Funnel plot adjusted using the trim-and-fill method. White circles: comparisons included. Black circles: inputted comparisons using the trim-and-fill method. White diamond: pooled observed log risk ratio. Black diamond: pooled inputted log risk ratio.

Result Presentation

When reporting the results of a systematic review or meta-analysis, the analytical content and methods should be described in detail. First, a flowchart is displayed with the literature search and selection process according to the inclusion/exclusion criteria. Second, a table is shown with the characteristics of the included studies. A table should also be included with information related to the quality of evidence, such as GRADE ( Table 4 ). Third, the results of data analysis are shown in a forest plot and funnel plot. Fourth, if the results use dichotomous data, the NNT values can be reported, as described above.

The GRADE Evidence Quality for Each Outcome

N: number of studies, ROB: risk of bias, PON: postoperative nausea, POV: postoperative vomiting, PONV: postoperative nausea and vomiting, CI: confidence interval, RR: risk ratio, AR: absolute risk.

When Review Manager software (The Cochrane Collaboration, UK) is used for the analysis, two types of P values are given. The first is the P value from the z-test, which tests the null hypothesis that the intervention has no effect. The second P value is from the chi-squared test, which tests the null hypothesis for a lack of heterogeneity. The statistical result for the intervention effect, which is generally considered the most important result in meta-analyses, is the z-test P value.

A common mistake when reporting results is, given a z-test P value greater than 0.05, to say there was “no statistical significance” or “no difference.” When evaluating statistical significance in a meta-analysis, a P value lower than 0.05 can be explained as “a significant difference in the effects of the two treatment methods.” However, the P value may appear non-significant whether or not there is a difference between the two treatment methods. In such a situation, it is better to announce “there was no strong evidence for an effect,” and to present the P value and confidence intervals. Another common mistake is to think that a smaller P value is indicative of a more significant effect. In meta-analyses of large-scale studies, the P value is more greatly affected by the number of studies and patients included, rather than by the significance of the results; therefore, care should be taken when interpreting the results of a meta-analysis.

When performing a systematic literature review or meta-analysis, if the quality of studies is not properly evaluated or if proper methodology is not strictly applied, the results can be biased and the outcomes can be incorrect. However, when systematic reviews and meta-analyses are properly implemented, they can yield powerful results that could usually only be achieved using large-scale RCTs, which are difficult to perform in individual studies. As our understanding of evidence-based medicine increases and its importance is better appreciated, the number of systematic reviews and meta-analyses will keep increasing. However, indiscriminate acceptance of the results of all these meta-analyses can be dangerous, and hence, we recommend that their results be received critically on the basis of a more accurate understanding.

1) http://www.ohri.ca .

2) http://methods.cochrane.org/bias/assessing-risk-bias-included-studies .

3) The inverse variance-weighted estimation method is useful if the number of studies is small with large sample sizes.

4) The Mantel-Haenszel estimation method is useful if the number of studies is large with small sample sizes.

5) The Peto estimation method is useful if the event rate is low or one of the two groups shows zero incidence.

6) The most popular and simplest statistical method used in Review Manager and Comprehensive Meta-analysis software.

7) Alternative random-effect model meta-analysis that has more adequate error rates than does the common DerSimonian and Laird method, especially when the number of studies is small. However, even with the Hartung-Knapp-Sidik-Jonkman method, when there are less than five studies with very unequal sizes, extra caution is needed.

8) The Begg and Mazumdar rank correlation test uses the correlation between the ranks of effect sizes and the ranks of their variances [ 37 ].

9) The degree of funnel plot asymmetry as measured by the intercept from the regression of standard normal deviates against precision [ 29 ].

10) If there are more small studies on one side, we expect the suppression of studies on the other side. Trimming yields the adjusted effect size and reduces the variance of the effects by adding the original studies back into the analysis as a mirror image of each study.

Community Blog

Keep up-to-date on postgraduate related issues with our quick reads written by students, postdocs, professors and industry leaders.

Types of Research – Explained with Examples

By DiscoverPhDs
October 2, 2020

Types of Research

Research is about using established methods to investigate a problem or question in detail with the aim of generating new knowledge about it.

It is a vital tool for scientific advancement because it allows researchers to prove or refute hypotheses based on clearly defined parameters, environments and assumptions. Due to this, it enables us to confidently contribute to knowledge as it allows research to be verified and replicated.

Knowing the types of research and what each of them focuses on will allow you to better plan your project, utilises the most appropriate methodologies and techniques and better communicate your findings to other researchers and supervisors.

Classification of Types of Research

There are various types of research that are classified according to their objective, depth of study, analysed data, time required to study the phenomenon and other factors. It’s important to note that a research project will not be limited to one type of research, but will likely use several.

According to its Purpose

Theoretical research.

Theoretical research, also referred to as pure or basic research, focuses on generating knowledge , regardless of its practical application. Here, data collection is used to generate new general concepts for a better understanding of a particular field or to answer a theoretical research question.

Results of this kind are usually oriented towards the formulation of theories and are usually based on documentary analysis, the development of mathematical formulas and the reflection of high-level researchers.

Applied Research

Here, the goal is to find strategies that can be used to address a specific research problem. Applied research draws on theory to generate practical scientific knowledge, and its use is very common in STEM fields such as engineering, computer science and medicine.

This type of research is subdivided into two types:

Technological applied research : looks towards improving efficiency in a particular productive sector through the improvement of processes or machinery related to said productive processes.
Scientific applied research : has predictive purposes. Through this type of research design, we can measure certain variables to predict behaviours useful to the goods and services sector, such as consumption patterns and viability of commercial projects.

According to your Depth of Scope

Exploratory research.

Exploratory research is used for the preliminary investigation of a subject that is not yet well understood or sufficiently researched. It serves to establish a frame of reference and a hypothesis from which an in-depth study can be developed that will enable conclusive results to be generated.

Because exploratory research is based on the study of little-studied phenomena, it relies less on theory and more on the collection of data to identify patterns that explain these phenomena.

Descriptive Research

The primary objective of descriptive research is to define the characteristics of a particular phenomenon without necessarily investigating the causes that produce it.

In this type of research, the researcher must take particular care not to intervene in the observed object or phenomenon, as its behaviour may change if an external factor is involved.

Explanatory Research

Explanatory research is the most common type of research method and is responsible for establishing cause-and-effect relationships that allow generalisations to be extended to similar realities. It is closely related to descriptive research, although it provides additional information about the observed object and its interactions with the environment.

Correlational Research

The purpose of this type of scientific research is to identify the relationship between two or more variables. A correlational study aims to determine whether a variable changes, how much the other elements of the observed system change.

According to the Type of Data Used

Qualitative research.

Qualitative methods are often used in the social sciences to collect, compare and interpret information, has a linguistic-semiotic basis and is used in techniques such as discourse analysis, interviews, surveys, records and participant observations.

In order to use statistical methods to validate their results, the observations collected must be evaluated numerically. Qualitative research, however, tends to be subjective, since not all data can be fully controlled. Therefore, this type of research design is better suited to extracting meaning from an event or phenomenon (the ‘why’) than its cause (the ‘how’).

Quantitative Research

Quantitative research study delves into a phenomena through quantitative data collection and using mathematical, statistical and computer-aided tools to measure them . This allows generalised conclusions to be projected over time.

According to the Degree of Manipulation of Variables

Experimental research.

It is about designing or replicating a phenomenon whose variables are manipulated under strictly controlled conditions in order to identify or discover its effect on another independent variable or object. The phenomenon to be studied is measured through study and control groups, and according to the guidelines of the scientific method.

Non-Experimental Research

Also known as an observational study, it focuses on the analysis of a phenomenon in its natural context. As such, the researcher does not intervene directly, but limits their involvement to measuring the variables required for the study. Due to its observational nature, it is often used in descriptive research.

Quasi-Experimental Research

It controls only some variables of the phenomenon under investigation and is therefore not entirely experimental. In this case, the study and the focus group cannot be randomly selected, but are chosen from existing groups or populations . This is to ensure the collected data is relevant and that the knowledge, perspectives and opinions of the population can be incorporated into the study.

According to the Type of Inference

Deductive investigation.

In this type of research, reality is explained by general laws that point to certain conclusions; conclusions are expected to be part of the premise of the research problem and considered correct if the premise is valid and the inductive method is applied correctly.

Inductive Research

In this type of research, knowledge is generated from an observation to achieve a generalisation. It is based on the collection of specific data to develop new theories.

Hypothetical-Deductive Investigation

It is based on observing reality to make a hypothesis, then use deduction to obtain a conclusion and finally verify or reject it through experience.

According to the Time in Which it is Carried Out

Longitudinal study (also referred to as diachronic research).

It is the monitoring of the same event, individual or group over a defined period of time. It aims to track changes in a number of variables and see how they evolve over time. It is often used in medical, psychological and social areas .

Cross-Sectional Study (also referred to as Synchronous Research)

Cross-sectional research design is used to observe phenomena, an individual or a group of research subjects at a given time.

According to The Sources of Information

Primary research.

This fundamental research type is defined by the fact that the data is collected directly from the source, that is, it consists of primary, first-hand information.

Secondary research

Unlike primary research, secondary research is developed with information from secondary sources, which are generally based on scientific literature and other documents compiled by another researcher.

According to How the Data is Obtained

Documentary (cabinet).

Documentary research, or secondary sources, is based on a systematic review of existing sources of information on a particular subject. This type of scientific research is commonly used when undertaking literature reviews or producing a case study.

Field research study involves the direct collection of information at the location where the observed phenomenon occurs.

From Laboratory

Laboratory research is carried out in a controlled environment in order to isolate a dependent variable and establish its relationship with other variables through scientific methods.

Mixed-Method: Documentary, Field and/or Laboratory

Mixed research methodologies combine results from both secondary (documentary) sources and primary sources through field or laboratory research.

An academic transcript gives a breakdown of each module you studied for your degree and the mark that you were awarded.

The scope of the study is defined at the start of the study. It is used by researchers to set the boundaries and limitations within which the research study will be performed.

Impostor Syndrome is a common phenomenon amongst PhD students, leading to self-doubt and fear of being exposed as a “fraud”. How can we overcome these feelings?

Join thousands of other students and stay up to date with the latest PhD programmes, funding opportunities and advice.

Browse PhDs Now

Stay up to date with current information being provided by the UK Government and Universities about the impact of the global pandemic on PhD research studies.

An abstract and introduction are the first two sections of your paper or thesis. This guide explains the differences between them and how to write them.

Dr Manning gained his PhD in Chemical Engineering from the University of Sheffield in 2019. He is now a postdoc researcher studying molecular simulations on nanomaterials at the University of Bath.

Nidhi is a PhD student at Virginia Tech, focused on developing an engineered platform to study the breast tumor microenvironment, for diagnostic and prognostic purposes.

Join Thousands of Students

Marketing91

8 Types of Analysis in Research

June 12, 2023 | By Hitesh Bhasin | Filed Under: Marketing

Data analysis detailed process of analyzing cleaning transforming and presenting useful information with the goal of forming conclusions and supporting decision making . Data can be analyzed by multiple approaches for multiple domains. It is very essential for every business is today to analyze the data that is obtained from various means.

Data analysis is useful in drawing certain conclusions about the variables that are present in the research. The approach to analysis, however, depends on the research that is being carried out. Without using data analytics, it is difficult to determine the relationship between variables which would lead to a meaningful conclusion. Thus, data analysis is an important tool to arrive at a particular conclusion.

Data can be analyzed in various ways. Following are a few methods by which data can be analyzed :

Table of Contents

1) Exploratory Data Analysis (EDA)

It is one of the types of analysis in research which is used to analyze data and established relationships which were previously unknown. They are specifically used to discover and for new connections and for defining future studies or answering the questions pertaining to future studies.

The answers provided by exploratory analysis are not definitive in nature but they provide little insight into what is coming. The approach to analyzing data sets with visual methods is the commonly used technique for EDA. Exploratory data analysis was promoted by John Tukey and was defined in 1961.

Graphical techniques of representation are used primarily in exploratory data analysis and most used graphical techniques are a histogram, Pareto chart, stem and leaf plot, scatter plot, box plot, etc. The drawback of exploratory analysis is that it cannot be used for generalizing or predicting precisely about the upcoming events. The data provides correlation which does not imply causation. Exploratory data analysis can be applied to study census along with convenience sample data set.

Software and machine-aided have become very common in EDA analysis. Few of them are Data Applied, Ggobi, JMP, KNIME, Python etc.

2) Descriptive data analysis

This method requires the least amount of effort amongst all other methods of data analysis. It describes the main features of the collection of data, quantitatively. This is usually the initial kind of data analysis that is performed on the available data set. Descriptive data analysis is usually applied to the volumes of data such as census data. Descriptive data analysis has different steps for description and interpretation. There are two methods of statistical descriptive analysis that is univariate and bivariate. Both are types of analysis in research.

A) Univariate descriptive data analysis

The analysis which involves the distribution of a single variable is called univariate analysis.

B) Bivariate and multivariate analysis

When the data analysis involves a description of the distribution of more than one variable it is termed as bivariate and multivariate analysis. Descriptive statistics, in such cases, may be used to describe the relationship between the pair of variables.

3) Causal data analysis

Causal data analysis is also known as explanatory Data Analysis. Causal determines the cause and effect relationship between the variables. The analysis is primarily carried out to see what would happen to another variable if one variable would change.

Application of causal studies usually requires randomized studies but there are also approaches to concluding causation even and non-randomized studies. Causal models set to be the gold standard amongst all other types of data analysis. It is considered to be very complex and the researcher cannot be certain that other variables influencing the causal relationship are constant especially when the research is dealing with the attitudes of customers in business.

Often, the researcher has to consider psychological impacts that even the respondent may not be aware of at any point and these unconsidered parameters impact the data that is analyzed and may affect the conclusions.

4) Predictive data analysis

As the name suggests Predictive data analysis involves employing methods which analyze the current trends along with the historical facts to arrive at a conclusion that makes predictions about the future trends of future events.

The prediction and the success of the model depend on choosing and measuring the right variables. Predicting future trends is very difficult and requires technical expertise in the subject. Machine learning is a modern tool used interactive analysis for better results. Prediction analysis is used to predict the rising and changing trends in various industries.

Analytical customer relationship management , clinical decision support systems , collection analytics, fraud detection, portfolio management are a few of the applications of Predictive Data Analysis. Forecasting about the future financial trends is also a very important application of predictive data analysis.

Few of the software used to Predictive analysis are Apache Mahout, GNU Octave, OpenNN, MATLAB etc.

5) Inferential data analysis

Inferential data analysis is amongst the types of analysis in research that helps to test theories of different subjects based on the sample taken from the group of subjects. A small part of a population is studied and the conclusions are extrapolated for the bigger chunk of the population.

The goals of statistical models are to provide an inference or a conclusion based on a study in the small amount of representative population. Since the process involves drawing conclusions or inferences, selecting a proper statistical model for the process is very important.

The success of inferential data analysis will depend on proper statistical models used for analysis. The results of inferential analysis depend on the population and the sampling technique. It is very crucial that a variety of representative subjects are taken to study to have better results.

The data analysis is applied to the cross-sectional study of time retrospective data set and observational data analysis. Inferential data analysis can determine and predict excellent results if and only if the proper sampling technique is followed along with good tools for data analysis.

6) Decision trees

This is classified as a modern classification algorithm in data mining and is a very popular type of analysis in research which requires machine learning. It is usually represented as a tree-shaped diagram of a figure that provides information about regression models or classification.

The decision tree may be subdivided into the smaller database is that has similar values. The branches determine how the tree is built where does one go with the current choices and where would those choices lead to next.

The primary advantage of a decision tree is the domain knowledge is not an essential requirement for analysis. Also, the classification of the decision tree is a very simple and fast process which consumes less time compared to other data analysis techniques.

7) Mechanistic data analysis

This method is exactly opposite to the descriptive data analysis, which required the least amount of effort, mechanistic data analysis requires a maximum amount of effort. The primary idea behind mechanistic data analysis is to understand the nature of exact changes in variables that affect other variables.

Mechanistic data analysis is exceptionally difficult to predict except when the situations are simpler. This analysis used by physical and engineering science in case of the deterministic set of equations. The applications of this type of analysis are randomized trial data set.

8) Evolutionary programming

It combines different types of analysis in research using evolutionary algorithms to form meaningful data and is a very common concept in data mining. Genetic algorithms and evolutionary algorithms are the most popular programs of revolutionary programming. These are an accident in case of independent techniques since they have the ability to search and explore large spaces for discovering good solutions.

Liked this post? Check out the complete series on Market research

What is Research Design? Type of Research Designs
How to Write Research Proposal? Research Proposal Format
7 Key Differences between Research Method and Research Methodology
Qualitative Research: Meaning, and Features of Qualitative Research
Research Ethics – Importance and Principles of Ethics in Research
What is Primary Market Research? Types & Examples
Different types of marketing research and when to use them
11 Types Of Quantitative Research options that exist for Market Researchers
Research and Development – Meaning and Types
What is Survey Research? Objectives, Sampling Process, Types and Advantages

About Hitesh Bhasin

Hitesh Bhasin is the CEO of Marketing91 and has over a decade of experience in the marketing field. He is an accomplished author of thousands of insightful articles, including in-depth analyses of brands and companies. Holding an MBA in Marketing, Hitesh manages several offline ventures, where he applies all the concepts of Marketing that he writes about.

All Knowledge Banks (Hub Pages)

Marketing Hub
Management Hub
Marketing Strategy
Advertising Hub
Branding Hub
Market Research
Small Business Marketing
Sales and Selling
Marketing Careers
Internet Marketing
Business Model of Brands
Marketing Mix of Brands
Brand Competitors
Strategy of Brands
SWOT of Brands
Customer Management
Top 10 Lists

Types of Research Papers: Overview

A research paper is simply a piece of writing that uses outside sources. There are different types of research papers with varying purposes and expectations for sourcing.

While this guide explains those differences broadly, disciplines and assignments vary. Ask your professor for clarification on the purpose and types of appropriate research questions and sources.

Need More Help?

Related guides.

Literature Reviews
Annotated Bibliographies
Starting Your Research

Research and Writing Lab

Need last minute help but didn't book an appointment? Every week we offer online drop-in labs.

Tuesdays 3:00pm - 4:30pm via Zoom @ https://smu.zoom.us/j/92637892352 and in-person, Fondren Red 1st floor (near elevators)

Last Updated: Apr 12, 2024 1:00 PM
URL: https://guides.smu.edu/researchpapertypes

Create new account

The TypeFinder® Personality Test

Beyond briggs myers' 16 types, find your true strengths., 966,263 tests taken in the last 30 days.

This free personality test reveals who you really are. Discover the 16 personalities created by Myers & Briggs, test your personality type, and find your strengths.

To take the personality test, mark your answers based on how well each statement describes you.

Questions? Learn more about the personality test.

TypeFinder Personality Test FAQ

Q . what is this personality test based on.

A. This test is based on the personality theory created by Isabel Myers and Katharine Briggs. It measures your preferences on Myers and Briggs' four dimensions of personality type, as well as 23 more detailed facets of type to personalize your results.

Q. How long is this personality test?

A. The test consists of 130 questions and takes about 10-15 minutes to complete.

Q. Is this personality test really free?

A. You do not need to purchase or register to take this test and view an overview of your results. If you would like, you can purchase a more comprehensive full report for a small fee.

Q. Is this personality test accurate?

A. This test has been researched extensively to ensure it is valid and reliable, using a variety of statistical methods. These results are detailed in the TypeFinder Technical Documentation . Most of our users describe their results as both accurate and insightful. However, it is important to note that no test can determine personality type correctly for everyone—it's essential that you evaluate your results on your own to decide if they describe you well, and research other possible types if necessary.

Q. What will my results for this test look like?

A. You will first see a brief, free report showing the key points from your results. After reviewing your brief report, you then have the option to unlock your full report for a small fee. To see what you can expect from your full report, see this sample report .

Q. How can I access my personality test results?

A. After you take a test, you will have the option to create an account by entering your email address. If you create an account, you can view your test results at any time by returning to Truity.com and logging into your account. We do not email your results to you.

Q. Do I need to complete this personality test all at once?

A. If you’ve created an account and are logged in when you take the test, your responses will be saved as you go through the test. If you do not log in to a Truity account before starting the test, your progress will not be saved and you will need to complete the test all at once.

Q. Can I have my employees, team or group take the TypeFinder test?

A. Absolutely. Our Truity@Work platform is designed to make it easy to give a TypeFinder personality test to your team or group. See discounted group pricing and learn how to quickly and easily set up testing for your group on the Testing for Business page.

Q. Will this test tell me which careers are best for my type?

A. This test has brief information about the careers for your type, but if you main goal is to find the right career for you, then we recommend you take the TypeFinder for Career Planning , which is specifically designed to help you find the right career for your type as well as your individual interests and strengths.

Q. Is this personality test appropriate for children?

A. None of our tests are appropriate for children under the age of 14. Some of our tests may have mature content, and anyone younger than 18 should only take the test with parental guidance.

Q. Where can I find more information about the 16 personalities?

A. You can find comprehensive profiles of each of Myers and Briggs' personality types here: INFP • INFJ • INTP • INTJ • ENFP • ENFJ • ENTP • ENTJ • ISFP • ISFJ • ISTP • ISTJ • ESFP • ESFJ • ESTP • ESTJ

Q. Can my personality type change over time?

A. If you asked Isabel Briggs Myers and Katherine Briggs (the creators of the 16 personality types) or Carl Jung (the psychologist whose theories Briggs and Myers studied), they would say no, a person's personality type does not change over time. However, personality psychologists who study large populations have found that shifts in personality do occur over time. Research shows that age and individual life experiences can cause a shift in your personality. However, drastic shifts in personality are unusual, and most people find that changes are small and gradual.

Q. I'm looking for the official MBTI® assessment. Is this it?

A. The MBTI® is the original assessment developed by Isabel Myers and Katharine Briggs. The TypeFinder® is based on Myers and Briggs' theory, but is not the same as the MBTI® assessment. Some key differences:

The MBTI® Assessment

Developed by Isabel Briggs Myers
Based on theories of C.G. Jung, Katharine Briggs and Isabel Myers
Measures 4 preferences of personality type
Available through certified practitioners or online
Results cost $49 (for MBTI® Online )

The TypeFinder®

Developed by Truity
Based on Myers and Briggs' theory and original empirical research
Measures 4 dimensions and 23 facets of personality type
Available online
Results are free, or choose to purchase an expanded report

Q. Are you going to sell my data?

A. . We do not sell your email or other data to any third parties, and we have a zero-spam policy. We carefully comply with applicable privacy laws in handling your personal information. You can read more in our privacy policy .

Myers-Briggs Type Indicator, Myers-Briggs, and MBTI are registered trademarks of The Myers & Briggs Foundation in the United States and other countries. Truity has no affiliation with the organizations publishing or holding rights to the MBTI® assessment.

Get Our Newsletter

Market Research and Analysis - Part 5: Drawdown Analysis

Greg Morris

Note to the reader: This is the sixteenth in a series of articles I'm publishing here taken from my book, "Investing with the Trend." Hopefully, you will find this content useful. Market myths are generally perpetuated by repetition, misleading symbolic connections, and the complete ignorance of facts. The world of finance is full of such tendencies, and here, you'll see some examples. Please keep in mind that not all of these examples are totally misleading -- they are sometimes valid -- but have too many holes in them to be worthwhile as investment concepts. And not all are directly related to investing and finance. Enjoy! - Greg

While the world of finance believes risk is measured by volatility (standard deviation), it is my belief that loss of capital is risk, and not volatility.

In Figure 11.1, example A ends where it begins with zero gain or loss, yet modern finance says it is risky because it is volatile. Example B shows the end price lower than the beginning price, so it shows a loss; modern finance, in this instance, would say there is no risk because there is no volatility. I think you can draw your own conclusions.

Volatility can contribute to risk, but it also can contribute to price gains. Loss of capital is simple and reasonable to use as a risk measure, and in this chapter, risk is defined by drawdown.

What Is Drawdown?

Drawdown is the percentage that price has moved down from its previous all-time high price.
Drawdown is risk.
Drawdown is systematic risk.
Drawdown is loss of capital.
Drawdown can last longer than you can.
Drawdown can ruin your retirement plans.

Drawdown Terminology

The following describes the nomenclature used in Figure 11.2.

Drawdown Magnitude is the percentage that price has moved down from its previous all-time high.
Drawdown Decline is the amount of time the market declined from an all-time high to the trough.
Drawdown Duration is the total amount of time that it took the price to recover to is previous all-time high.
Drawdown Recovery is the time it took from the trough to get back to an all-time high.

Although the terminology for drawdowns is subjective, I'll stick with the ones that Sam Stovall (Standard & Poor's) uses, as they are as good as any. I have often thought one more term for bear markets greater than -40% would be good, such as Super Bear, but I have other battles to fight. See Table 11.1.

The Mathematics of Drawdown and Equivalent Return

Recovering from a severe drawdown takes an extraordinary return just to get back to where you were. This is sometimes referred to as equivalent return and is represented by this formula:

Percent Drawdown /(1 – Percent Drawdown ) - 1

If you don't have a calculator or table handy, just divide the percent decline by its complement (100 – percent), and then mentally place the decimal in the appropriate place. This is best done in privacy and not on a stage in front of many people.

From Figure 11.3, you can see that if you lose 50%, then it takes a 100% gain to get back to even. When was the last time you doubled your money? A 100% gain is the same as doubling your money. The bear market that began on October 9, 2007 dropped more than 55%; you can see that to recover. it takes a gain of more than 122% to get back to even. One thing the graphic clearly shows is that, the larger the loss, the greater the gain required to recover.

Cumulative Drawdown

Figure 11.4 is an example of cumulative drawdown. The line that moves across the tops of the price data (top plot) only moves up with the data and sideways when the data does not move up; in other words, it is constantly reflecting the price's all-time high value. The bottom plot is the percentage decline from that all-time high line. Whenever that line is at the top, it means that price in the top plot is at its all-time high. As the line in the bottom plot declines, it moves in percentages of where it was last at its all-time high price.

In the example shown, a new all-time high in price is reached at the vertical line labeled A. The bottom plot shows that, as prices move down from that point, the drawdown also moves in conjunction with price. The horizontal line that goes through the lower part of the drawdown plot is at -10%. You can't read the dates at the bottom, but it took almost six months before the prices recovered to point B and then moved above the level they had reached at point A. This is an example of drawdown that had a magnitude of -17%, shown by the lowest point reached on the drawdown line in the bottom plot. The drawdown also lasted (duration) almost six months, as shown by the time between line A and line B.

Figure 11.5 shows the percentage of drawdown over the entire history of the Dow Industrials since 1885. The top portion is the Dow Industrials, plotted using semi-log scaling, and the bottom plot is the drawdown percentage. The darker horizontal line through the bottom plot is the mean or average of the drawdown over the full time period since 1885. Its value is -22.1%. The other horizontal lines are shown at zero (top line), -20%, -35%, -50%, and -65%, I think the thing that stands out from this chart is that the period from 1929 through 1954 suffered an enormous drawdown, not only in magnitude but also in duration. The low was on June 28, 1932 at -88.67%. The equivalent return to get back to even from that point was a gain of more than 783%. That is why it took almost 25 years to accomplish.

Because the Depression-era drawdown distorts the other drawdowns, Figure 11.6 shows exactly the same data since about 1954, eliminating the scaling effect from the -88% Depression-era drawdown. The drawdown in 2008 clearly stands out as the biggest in modern times at -53.78% on March 9, 2009. As of this writing (in 2013), that drawdown has yet to recover. It should be noted that all of the time that the drawdown line in the bottom plot is not back up to the top (0%), the market is in a "state of drawdown," which is noted by the duration, not just the amount of the decline, which is the magnitude.

Remember: Every bear market ends, but rarely when you are still trying to pick the bottom.

S&P 500 Drawdown Analysis

The following data is from the S&P 500 Index, not adjusted for dividends or inflation, over the period from December 30, 1927, through December 31, 2012. It was a period that consisted of 21,353 market days and 1,016.81 calendar months. The S&P 500 has been widely regarded as the best single gauge of the large-cap U.S. equities market since the index was first published in 1957 and backfilled to 1927 with the S&P 90. The index has more than US$5.58 trillion benchmarked, with index assets comprising approximately US$1.31 trillion of this total. The index includes 500 leading companies in leading industries of the U.S. economy, capturing 75% coverage of U.S. equities.

Drawdown Decline — S&P 500

Table 11.2 is focused only on the percentage decline of the various drawdowns. The columns in the table are defined as follows:

Drawdown range. This is the percentage of drawdown decline, divided into various ranges which make up the rows in the table. The top row of data is for drawdowns with declines greater than 20%, and the bottom row is the data for all drawdowns.
Average max drawdown. This is the average of all the drawdowns for the percentage decline in the first column.
Average days in decline. This is the average number of market days that the drawdowns were in the decline whose percentage decline is defined by the first column.
Average months in drawdown. This is simply a calculation of dividing the average market days in decline by 21, which is the average number of market days per month, which yields calendar months.
Total days in decline. This is the sum of all the days the particular decline range was in decline.
Total months in decline. This is the total market days in decline divided by 21.
Percentage of time spent in decline. This is the percentage of time that the declines were in a state of decline based on the total number of market days for the period of analysis.

From Table 11.2, you can see that all drawdowns greater than 20%, which are also called bear markets, were in a state of decline for almost 17% of the time; in other words, bear market declines accounted for 17% of the total time from 1927 to 2012. The bottom row in the table above shows that all drawdowns (DD), no matter what their magnitude, spent almost 32% of the time declining.

Drawdown Recovery — S&P 500

Drawdown recovery is the term used to define the time spent from when a drawdown bottoms (hits its absolute lowest point and greatest percentage of decline) and completely recovers (gets back up to where the drawdown began). The columns in Table 11.3 are similar to the Drawdown Decline table in Table 11.2; we are just discussing the last portion of the drawdown here instead of the first portion.

Following a similar discussion as was done in the Drawdown Decline analysis, we can see that Drawdown Recoveries where the magnitude of the drawdown was greater than 20% took more than 50% of the total time to recover. Remember that recoveries from declines always take longer than the declines. This is generally defined by the fact that declines (selling) are more emotionally driven so usually are quicker and more abrupt.

There is a new column in the Drawdown Recovery table called Average Gain to Recovery . This is the percentage of gain (recovery) needed to get back to where the drawdown began. See the earlier part of this section that talks about equivalent return for more information. From Table 11.3, you can see that for drawdowns greater than 20%, on average, it takes a gain of more than 69% to get back to even. Remember we are dealing with averages in these tables. Elsewhere in the book are tables showing each of the drawdowns that were greater than 20%.

Drawdown Duration — S&P 500

Drawdown Duration is shown in Table 11.4; this is the total amount of time that a complete drawdown occurred. The previous two tables dealt with the decline and the recovery, this table is the total of those two.

Drawdowns of greater than 20% averaged 1,433 days, which is more than 68 months, or about 5-6 years. The total number of days of all drawdowns greater than 20% was 14,333 market days, or 682 months, which is more than 56 years. Now the real eye-catcher in this table is the last row, which shows all drawdowns regardless of the percentage decline. It shows that the market from 1927 to 2012 was in a state of drawdown for more than 95% of the time. In other words, the market was making new all-time highs less than 5% of the time.

The Drawdown Message — S&P 500

With all the above tables about the various stages of drawdown, the information taken from the Drawdown Duration table in Table 11.5 is the real message from this Drawdown Analysis; the percentage of time that the market, in this case the S&P 500 Index, has spent in a state of drawdown. In other words, the amount of time that the market has spent just to get back to where it had already been before is what most folks do not realize. Even if you eliminated the noise, which are the drawdowns of less than 5%, the market has been in a state of drawdown for 82-95% of the time.

Alternative Method

Figure 11.7 and the analysis below shows an alternative method to validate this drawdown analysis. In this mathematical process, the amount of time spent making new all-time highs was calculated using the same S&P 500 data. The top plot is the S&P 500 price shown plotted using semi-log scaling. The jagged line that moves along the top of the data is a line representing the all-time high price. It only moves up when the S&P is making a new all-time high, and moves sideways when the S&P is declining below its previous all-time high. The second plot is a calculation to identify only the days in which the all-time high line in the top plot was moving upward; in other words, the days in which the S&P 500 was making a new all-time high. The third plot has two lines; one is a summation of the second plot or the running sum of all the days making a new all-time high in price. The second line in the third plot is just calculating all the days of data in the S&P 500 by using the simple concept of Close price not equal to zero and then doing the running summation. The bottom plot is the percent of the new all-time highs summation to the total of all days of data. You can see (trust me) that the percentage of time the S&P 500 was making new all-time highs is 4.63%. In the previous drawdown analysis, it was shown that all drawdowns contained 95.24 percent of the data. 100% - 95.24% = 4.76%, which means there is only a 0.13% difference between the two totally independent calculations.

Average Drawdown — S&P 500

An additional calculation has been added to Table 11.4 — S&P 500 Drawdown Duration Data, shown in Table 11.6 as two new columns. These are the average of the average drawdown for each percentage category, and the total average of all drawdowns.

While the data in the previous tables breaks down the drawdowns over various ranges of percentage of decline, a common mistake in the world of finance is to focus on a term called maximum drawdown when comparing two issues, such as two mutual funds. One must keep in mind that maximum drawdown is a one-time isolated event and could be misleading.

Here is an example: Let's assume we are looking at two mutual funds, each with a 20-year history of net asset value (NAV). Fund A has a maximum drawdown of 45% and Fund B has a maximum drawdown of 30%. Which fund do you prefer? Most will say that Fund B is better because it has a smaller maximum drawdown. And they would be correct, but I think they need to view more information from the 20 years of data. Let's say Fund B had 12 additional drawdowns of 25% each and Fund A had additional drawdowns, with the largest being only 12%. Now which fund do you like? While the maximum drawdown is greater on Fund A, all of the remaining drawdowns are considerably less than those for Fund B. This is why I prefer to look at Average Drawdown, as shown in Table 11.6.

Distribution of Drawdowns — S&P 500

Figure 11.8 shows all drawdowns that were greater than 15%. You can see that for the period from 1927 to 2012, the S&P 500 had 2 drawdowns in the 15–19.99% range, 2 in the 20–24.99% range, and so on, for a total of 12 draw-downs of magnitude greater than 15%. Interestingly, there were no drawdowns in the 40–44.99% range. I would guess that once a market has declined over 40%, it creates so much fear it cannot stop until it moves further first.

Figure 11.9 shows all drawdowns, no matter how small. You can see that there were 424 drawdowns, with 369 of them less than 5%. Five percent declines are generally considered just noise and part of the market pricing mechanism. Drawdowns between 5% and 10% are considered pullbacks; there were 35 pullbacks during this period. Corrections are drawdowns between 10% and 20%; you can see that there were 10 (total of 10–14.99% and 15–19.99%). There were 10 drawdowns of 20% or greater. Also notice that Figure 11.8 is reflected in Figure 11.9, just that 3 more distribution percentages were added to the left.

Cumulative Drawdown for S&P 500

Figure 11.10 shows you a visual of all drawdowns during the analysis period. The 1929 drawdown is clearly exceptional not only in magnitude of decline, but also duration; so much so that it skews the visual effect of the remaining drawdowns.

S&P 500 Index Excluding the 1929 Bear Market

Often it is good to remove some statistical outliers, such as the giant drawdown/bear market that began in 1929 and lasted until 1954, 25 years total. The tables that follow (Table 11.7, Table 11.8, and Table 11.9) show the S&P 500 from 1927 to 2012 with exactly the same data as the previous respective tables; however, this time, the Great Depression drawdown has been removed from the data.

The Average Max Drawdown has decreased from -40.88% to only -35.84% for drawdowns more than 20%. You can look at the numbers for the remaining columns and see that they are all reduced, however, not by nearly as much as I would have guessed prior to doing the analysis.

Recovery data seems to have been reduced significantly more when removing the 1929 Depression drawdown. This is because not only was the magnitude of that bear market over -86%, but it also lasted for more than 25 years.

Because the Recovery numbers were significantly reduced with the removal of the 1929 bear market, it also stands to reason that the duration numbers would also be significantly reduced. And Table 11.9 confirms that is the case.

Drawdowns Greater than 20% are Bear Markets

Although this is also shown in earlier chapters, it is appropriate to include with this section of the book on Drawdown Analysis because bear markets are merely drawdowns of 20% or greater. Table 11.10 shows all the drawdowns (bear markets) of 20% or greater in the S&P 500 Index since 12/30/1927. Here is a brief description of the statistics that are at the bottom of Table 11.10.

Average. The same as the mean in statistics; add all values and then divide by the number of items.
Avg Ex 29. This is the Average with the 1929 bear market removed as it skews the data somewhat.
Minimum. The minimum value in that column.
Maximum. The maximum value in that column.
Std. Dev. This is standard deviation or sigma, which is a measure of the dispersion of the values in the column. About 65% of the values will fall within one standard deviation of the mean, and 95% will fall within two standard deviations of the mean.
Median. If the data is widely dispersed or has asymptotic outlier data, this is usually a better measure for central tendency than Average.

The number two drawdown as of 12/31/2012 is still in progress. While its magnitude of decline was -56.78%, the duration is still in progress and only fourth in rank as the current number 3 and 4 drawdowns, while not as steep, lasted longer.

S&P Total Return Analysis

This data is not as robust as the price data, but does reflect the reality of the markets for buy-and-hold or index investing, in which one received and reinvests the dividends earned by the individual stocks that make up the index. This data begins on March 31, 1936, so therefore will not include the Great Depression drawdown that began in 1929. Tables 11.11 through 11.13, Figures 11.11 and 11.12, and Table 11.14 follow the format of the preceding sections.

Drawdown Decline — S&P Total Return

Drawdown Recovery — S&P Total Return

Drawdown Duration — S&P Total Return

Distribution of Drawdowns Greater than 15% — S&P Total Return

Distribution of All Drawdowns — S&P Total Return

Bear Markets — S&P Total Return

Dow Jones Industrial Average Drawdown Analysis

This section follows the same order and format of the previous section on the S&P 500 Index drawdown; the only difference is that the analysis is done on the Dow Jones Industrial Average. (See Tables 11.15 through 11.19, Figures 11.13 through 11.15, and additional Tables 11.20 through 11.23.)

The Dow Jones Industrial Average, also referred to as The Dow, is a price-weighted measure of 30 U.S. blue-chip companies. The Dow covers all industries with the exception of transportation and utilities, which are covered by the Dow Jones Transportation Average and Dow Jones Utility Average. Although stock selection is not governed by quantitative rules, a stock typically is added to the Dow only if the company has an excellent reputation, demonstrates sustained growth and is of interest to a large number of investors. Maintaining adequate sector representation within the indexes is also a consideration in the selection process.

The following data is from the Dow Jones Industrial Average, not adjusted for dividends or inflation, over the period from February 17, 1885, through December 31, 2012. The drawdown analysis for the Dow Industrials consists of 35,179 market days, which translates into 1,675.19 calendar months.

Drawdown Decline — Dow Jones Industrial Average

Drawdown Recovery — Dow Jones Industrial Average

Drawdown Duration — Dow Jones Industrial Average

The Drawdown Message — Dow Jones Industrial Average

Average Drawdown — Dow Jones Industrial Average

Distribution of Drawdowns — Dow Jones Industrial Average

Cumulative Drawdown for Dow Industrials

Dow Industrials Excluding the 1929 Bear Market

Dow Industrials Total Return Analysis

This data is only available beginning on March 31, 1963. (See Tables 11.24 through 11.26 and Figures 11.16 and 11.17, followed by Table 11.27.)

Drawdown Decline — Dow Industrials Total Return

Drawdown Recovery — Dow Industrials Total Return

Drawdown Duration — Dow Industrials Total Return

Distribution of Drawdowns — Dow Industrials Total Return

Bear Markets for Dow Industrials Total Return

Gold Drawdown

Drawdowns are not restricted to the stock market; they can be analyzed on any time series data.

Figure 11.18 is a chart of gold. This shows the price of gold in the top plot since 1967 and its cumulative drawdown in the bottom plot. The two horizontal lines in the drawdown plot are at -20% and -50%. I think it is clear that anyone who bought gold in the Hunt Brothers 1981 silver era, and also the ending of the exceptional inflationary period of the 1970s, held an investment from 1980 until 2008 before the price of gold recovered. Twenty-eight years is a really long time to hold a loser. With gold's recent surge to new highs (as of 2013), the time value of money would probably continue to erode this 1980 investment, even though those folks are at least feeling better now.

Japan's Nikkei 225 Drawdown

Figure 11.19 is of the Japanese stock market and its drawdown. I think at this point no commentary is needed, as you can see that the Nikkei started dropping in late 1989 and is down in the -75% area since the end of 2008.

Copper Drawdown

Copper is often referred to as Doctor Copper , as many think it is a measure of economic activity, especially in the construction industry. Figure 11.20 is a chart of copper since 1971, with its cumulative drawdown in the bottom plot. Clearly, copper as an investment has spent an enormous amount of time in a state of drawdown.

Drawdown Intensity Evaluator (DIE)

In an attempt to further evaluate the pain of drawdown, I have created an indicator that measures not only the magnitude of the drawdown, but also the duration. Remember, it is not just how big the drop in price is, but also how long it takes to recover.

Figure 11.21 helps you understand how this concept works. The top plot is a price series, the middle plot (with the circles) is the cumulative drawdown, and the bottom plot is the Drawdown Intensity Evaluator (DIE). You can see at point A on the middle plot that a drawdown began and did not end until point D, which, in this example (Consumer Staples), means a time period from the end of 1998 until the middle of 2006. You also see that the DIE was at zero at point A and again at point D (vertical lines). From the middle plot of cumulative drawdown, you can see that the point of maximum drawdown is at point B (early 2000), which also corresponded with an initial peak in DIE. The middle plot of drawdown shows point C, which occurred in early 2003 and is not as low as point B; in fact, in this example, point B is -32.5% and point C is -27.4%. However, when you look at the bottom plot of DIE, the highest point is at point C. This is because even though point C occurred three years after point B, the pain of holding an investment during this time increased because the drawdown was still significant, even though it wasn't at its maximum. After point C, you can see that the drawdown slowing started to decrease, but did not get back to its starting point (A) for more than three years (point D). DIE represents the pain of drawdown using not only magnitude, but also, and equally important, the duration.

The DIE in Figure 11.21 uses the data for the entire period to determine the pain. The next example, Figure 11.22 , is an attempt to normalize the information using a four-year look-back. Normalizing data in this case resets the drawdown numbers and gives us a better picture of current conditions relative to a recent period of time and is particularly useful for long duration drawdowns. This means that it is measuring DIE over a moving four-year window.

Figure 11.22 is a chart of the Dow Industrial Average in the top plot, with the cumulative drawdown in the second plot. The third plot is the Drawdown Intensity Evaluator, or DIE. The bottom plot is the DIE that has been normalized over a four-year period. The data begins in 1969.

The DIE is a relatively simple process, as it merely calculates the percentage of drawdown and multiplies it by the number of cumulative days it is in drawdown. An example here is in 1987, when there was a large drawdown but it did not last very long; in fact, the market completely recovered in only two years

The world of finance, with its inadequate mathematics, inappropriate statistics, and faulty assumptions, wants investors to believe that risk is volatility as represented by Standard Deviation (sigma). Although volatility is a contributor to drawdown, it is also a contributor to price gains. Risk is loss of capital, and that is best measured by drawdown. An investment strategy that attempts to tackle and limit drawdowns will be a more comfortable "Investment Ride" for most investors.

This wraps up Part II: Market Research and Analysis. Let's now move to why we want to understand all this — building a trend-following rules-based model designed to participate as much as possible in the good times, trying to avoid the bad times, and most of all, keep the subjectivity out of the process.

Thanks for reading this far. I intend to publish one article in this series every week. Can't wait? The book is for sale here .

FOLLOW THIS BLOG

Subscribe to Dancing with the Trend for email notifications whenever a new article is posted

MICE Market Report Highlights

The meetings segment accounted for a majority share in 2022 due to its demand being driven by the increase in corporate events worldwide. During the pandemic, this section underwent a change as a result of the cancellation of significant events and lockdown orders
In addition, the rise in popularity of regional destinations coupled with less densely populated tier cities is also driving the demand for meetings in such regions
Asia Pacific was the biggest contributor to the industry in 2022. The expansion of this regional market is significantly influenced by the development of the travel and tourism sector
For instance, according to World Tourism Organization, Asia Pacific saw more than triple (+230%) international arrivals in the first nine months of 2022
AVIAREPS AG
Beyond Summits Ltd.
ITL World Company (MICEMINDS)
IMC International
Global Air-American Express Travel Services
Aviareps AG
Itl World Company (Miceminds)
Imc International

Table Information

Business Travel

MICE Market: Trends, Opportunities and Competitive Analysis (2023-2028)

 Report

MICE Industry By Type: Opportunity Analysis and Industry Forecast, 2023-2032

February 2024

Global MICE Industry Market Report and Forecast 2024-2032

Global Meetings, Incentives, Conventions, and Exhibitions (MICE) Market 2023-2027

August 2023

Events and Exhibition - Market Share Analysis, Industry Trends & Statistics, Growth Forecasts 2019 - 2029

ASK A QUESTION

We request your telephone number so we can contact you in the event we have difficulty reaching you via email. We aim to respond to all questions on the same business day.

Request a Quote

YOUR ADDRESS

YOUR DETAILS

PRODUCT FORMAT

DOWNLOAD SAMPLE

Please fill in the information below to download the requested sample.

Buy Me a Coffee

Home » Research Methodology – Types, Examples and writing Guide

Research Methodology – Types, Examples and writing Guide

Table of Contents

Research Methodology

Definition:

Research Methodology refers to the systematic and scientific approach used to conduct research, investigate problems, and gather data and information for a specific purpose. It involves the techniques and procedures used to identify, collect , analyze , and interpret data to answer research questions or solve research problems . Moreover, They are philosophical and theoretical frameworks that guide the research process.

Structure of Research Methodology

Research methodology formats can vary depending on the specific requirements of the research project, but the following is a basic example of a structure for a research methodology section:

I. Introduction

Provide an overview of the research problem and the need for a research methodology section
Outline the main research questions and objectives

II. Research Design

Explain the research design chosen and why it is appropriate for the research question(s) and objectives
Discuss any alternative research designs considered and why they were not chosen
Describe the research setting and participants (if applicable)

III. Data Collection Methods

Describe the methods used to collect data (e.g., surveys, interviews, observations)
Explain how the data collection methods were chosen and why they are appropriate for the research question(s) and objectives
Detail any procedures or instruments used for data collection

IV. Data Analysis Methods

Describe the methods used to analyze the data (e.g., statistical analysis, content analysis )
Explain how the data analysis methods were chosen and why they are appropriate for the research question(s) and objectives
Detail any procedures or software used for data analysis

V. Ethical Considerations

Discuss any ethical issues that may arise from the research and how they were addressed
Explain how informed consent was obtained (if applicable)
Detail any measures taken to ensure confidentiality and anonymity

VI. Limitations

Identify any potential limitations of the research methodology and how they may impact the results and conclusions

VII. Conclusion

Summarize the key aspects of the research methodology section
Explain how the research methodology addresses the research question(s) and objectives

Research Methodology Types

Types of Research Methodology are as follows:

Quantitative Research Methodology

This is a research methodology that involves the collection and analysis of numerical data using statistical methods. This type of research is often used to study cause-and-effect relationships and to make predictions.

Qualitative Research Methodology

This is a research methodology that involves the collection and analysis of non-numerical data such as words, images, and observations. This type of research is often used to explore complex phenomena, to gain an in-depth understanding of a particular topic, and to generate hypotheses.

Mixed-Methods Research Methodology

This is a research methodology that combines elements of both quantitative and qualitative research. This approach can be particularly useful for studies that aim to explore complex phenomena and to provide a more comprehensive understanding of a particular topic.

Case Study Research Methodology

This is a research methodology that involves in-depth examination of a single case or a small number of cases. Case studies are often used in psychology, sociology, and anthropology to gain a detailed understanding of a particular individual or group.

Action Research Methodology

This is a research methodology that involves a collaborative process between researchers and practitioners to identify and solve real-world problems. Action research is often used in education, healthcare, and social work.

Experimental Research Methodology

This is a research methodology that involves the manipulation of one or more independent variables to observe their effects on a dependent variable. Experimental research is often used to study cause-and-effect relationships and to make predictions.

Survey Research Methodology

This is a research methodology that involves the collection of data from a sample of individuals using questionnaires or interviews. Survey research is often used to study attitudes, opinions, and behaviors.

Grounded Theory Research Methodology

This is a research methodology that involves the development of theories based on the data collected during the research process. Grounded theory is often used in sociology and anthropology to generate theories about social phenomena.

Research Methodology Example

An Example of Research Methodology could be the following:

Research Methodology for Investigating the Effectiveness of Cognitive Behavioral Therapy in Reducing Symptoms of Depression in Adults

Introduction:

The aim of this research is to investigate the effectiveness of cognitive-behavioral therapy (CBT) in reducing symptoms of depression in adults. To achieve this objective, a randomized controlled trial (RCT) will be conducted using a mixed-methods approach.

Research Design:

The study will follow a pre-test and post-test design with two groups: an experimental group receiving CBT and a control group receiving no intervention. The study will also include a qualitative component, in which semi-structured interviews will be conducted with a subset of participants to explore their experiences of receiving CBT.

Participants:

Participants will be recruited from community mental health clinics in the local area. The sample will consist of 100 adults aged 18-65 years old who meet the diagnostic criteria for major depressive disorder. Participants will be randomly assigned to either the experimental group or the control group.

Intervention :

The experimental group will receive 12 weekly sessions of CBT, each lasting 60 minutes. The intervention will be delivered by licensed mental health professionals who have been trained in CBT. The control group will receive no intervention during the study period.

Data Collection:

Quantitative data will be collected through the use of standardized measures such as the Beck Depression Inventory-II (BDI-II) and the Generalized Anxiety Disorder-7 (GAD-7). Data will be collected at baseline, immediately after the intervention, and at a 3-month follow-up. Qualitative data will be collected through semi-structured interviews with a subset of participants from the experimental group. The interviews will be conducted at the end of the intervention period, and will explore participants’ experiences of receiving CBT.

Data Analysis:

Quantitative data will be analyzed using descriptive statistics, t-tests, and mixed-model analyses of variance (ANOVA) to assess the effectiveness of the intervention. Qualitative data will be analyzed using thematic analysis to identify common themes and patterns in participants’ experiences of receiving CBT.

Ethical Considerations:

This study will comply with ethical guidelines for research involving human subjects. Participants will provide informed consent before participating in the study, and their privacy and confidentiality will be protected throughout the study. Any adverse events or reactions will be reported and managed appropriately.

Data Management:

All data collected will be kept confidential and stored securely using password-protected databases. Identifying information will be removed from qualitative data transcripts to ensure participants’ anonymity.

Limitations:

One potential limitation of this study is that it only focuses on one type of psychotherapy, CBT, and may not generalize to other types of therapy or interventions. Another limitation is that the study will only include participants from community mental health clinics, which may not be representative of the general population.

Conclusion:

This research aims to investigate the effectiveness of CBT in reducing symptoms of depression in adults. By using a randomized controlled trial and a mixed-methods approach, the study will provide valuable insights into the mechanisms underlying the relationship between CBT and depression. The results of this study will have important implications for the development of effective treatments for depression in clinical settings.

How to Write Research Methodology

Writing a research methodology involves explaining the methods and techniques you used to conduct research, collect data, and analyze results. It’s an essential section of any research paper or thesis, as it helps readers understand the validity and reliability of your findings. Here are the steps to write a research methodology:

Start by explaining your research question: Begin the methodology section by restating your research question and explaining why it’s important. This helps readers understand the purpose of your research and the rationale behind your methods.
Describe your research design: Explain the overall approach you used to conduct research. This could be a qualitative or quantitative research design, experimental or non-experimental, case study or survey, etc. Discuss the advantages and limitations of the chosen design.
Discuss your sample: Describe the participants or subjects you included in your study. Include details such as their demographics, sampling method, sample size, and any exclusion criteria used.
Describe your data collection methods : Explain how you collected data from your participants. This could include surveys, interviews, observations, questionnaires, or experiments. Include details on how you obtained informed consent, how you administered the tools, and how you minimized the risk of bias.
Explain your data analysis techniques: Describe the methods you used to analyze the data you collected. This could include statistical analysis, content analysis, thematic analysis, or discourse analysis. Explain how you dealt with missing data, outliers, and any other issues that arose during the analysis.
Discuss the validity and reliability of your research : Explain how you ensured the validity and reliability of your study. This could include measures such as triangulation, member checking, peer review, or inter-coder reliability.
Acknowledge any limitations of your research: Discuss any limitations of your study, including any potential threats to validity or generalizability. This helps readers understand the scope of your findings and how they might apply to other contexts.
Provide a summary: End the methodology section by summarizing the methods and techniques you used to conduct your research. This provides a clear overview of your research methodology and helps readers understand the process you followed to arrive at your findings.

When to Write Research Methodology

Research methodology is typically written after the research proposal has been approved and before the actual research is conducted. It should be written prior to data collection and analysis, as it provides a clear roadmap for the research project.

The research methodology is an important section of any research paper or thesis, as it describes the methods and procedures that will be used to conduct the research. It should include details about the research design, data collection methods, data analysis techniques, and any ethical considerations.

The methodology should be written in a clear and concise manner, and it should be based on established research practices and standards. It is important to provide enough detail so that the reader can understand how the research was conducted and evaluate the validity of the results.

Applications of Research Methodology

Here are some of the applications of research methodology:

To identify the research problem: Research methodology is used to identify the research problem, which is the first step in conducting any research.
To design the research: Research methodology helps in designing the research by selecting the appropriate research method, research design, and sampling technique.
To collect data: Research methodology provides a systematic approach to collect data from primary and secondary sources.
To analyze data: Research methodology helps in analyzing the collected data using various statistical and non-statistical techniques.
To test hypotheses: Research methodology provides a framework for testing hypotheses and drawing conclusions based on the analysis of data.
To generalize findings: Research methodology helps in generalizing the findings of the research to the target population.
To develop theories : Research methodology is used to develop new theories and modify existing theories based on the findings of the research.
To evaluate programs and policies : Research methodology is used to evaluate the effectiveness of programs and policies by collecting data and analyzing it.
To improve decision-making: Research methodology helps in making informed decisions by providing reliable and valid data.

Purpose of Research Methodology

Research methodology serves several important purposes, including:

To guide the research process: Research methodology provides a systematic framework for conducting research. It helps researchers to plan their research, define their research questions, and select appropriate methods and techniques for collecting and analyzing data.
To ensure research quality: Research methodology helps researchers to ensure that their research is rigorous, reliable, and valid. It provides guidelines for minimizing bias and error in data collection and analysis, and for ensuring that research findings are accurate and trustworthy.
To replicate research: Research methodology provides a clear and detailed account of the research process, making it possible for other researchers to replicate the study and verify its findings.
To advance knowledge: Research methodology enables researchers to generate new knowledge and to contribute to the body of knowledge in their field. It provides a means for testing hypotheses, exploring new ideas, and discovering new insights.
To inform decision-making: Research methodology provides evidence-based information that can inform policy and decision-making in a variety of fields, including medicine, public health, education, and business.

Advantages of Research Methodology

Research methodology has several advantages that make it a valuable tool for conducting research in various fields. Here are some of the key advantages of research methodology:

Systematic and structured approach : Research methodology provides a systematic and structured approach to conducting research, which ensures that the research is conducted in a rigorous and comprehensive manner.
Objectivity : Research methodology aims to ensure objectivity in the research process, which means that the research findings are based on evidence and not influenced by personal bias or subjective opinions.
Replicability : Research methodology ensures that research can be replicated by other researchers, which is essential for validating research findings and ensuring their accuracy.
Reliability : Research methodology aims to ensure that the research findings are reliable, which means that they are consistent and can be depended upon.
Validity : Research methodology ensures that the research findings are valid, which means that they accurately reflect the research question or hypothesis being tested.
Efficiency : Research methodology provides a structured and efficient way of conducting research, which helps to save time and resources.
Flexibility : Research methodology allows researchers to choose the most appropriate research methods and techniques based on the research question, data availability, and other relevant factors.
Scope for innovation: Research methodology provides scope for innovation and creativity in designing research studies and developing new research techniques.

Research Methodology Vs Research Methods

About the author.

Muhammad Hassan

Researcher, Academic Writer, Web developer

How to Cite Research Paper – All Formats and...

Data Collection – Methods Types and Examples

Delimitations in Research – Types, Examples and...

Research Paper Format – Types, Examples and...

Research Process – Steps, Examples and Tips

Research Design – Types, Methods and Examples

SYSTEMATIC REVIEW article

The top 100 most cited articles on mucopolysaccharidoses: a bibliometric analysis.

Department of Orthopedics, The Third People’s Hospital of Chengdu, Chengdu, China

Background: Bibliometrics can trace general research trends in a particular field. Mucopolysaccharidoses (MPS), as a group of rare genetic diseases, seriously affect the quality of life of patients and their families. Scholars have devoted themselves to studying MPS’s pathogenesis and treatment modalities and have published many papers. Therefore, we conducted a bibliometric and visual study of the top 100 most highly cited articles to provide researchers with an indication of the current state of research and potential directions in the field.

Methods: The Web of Science Core Collection was searched for articles on MPS from 1 January 1900, to 8 November 2023, and the top 100 most cited articles were screened. The title, year of publication, institution, country, and first author of the articles were extracted and statistically analyzed using Microsoft Excel 2007. Keyword co-occurrence and collaborative networks were analyzed using VOSviewer 1.6.16.

Results: A total of 9,273 articles were retrieved, and the top 100 most cited articles were filtered out. The articles were cited 18,790 times, with an annual average of 188 citations (122–507). Forty-two journals published these articles, with Molecular Genetics and Metabolism and Proceedings of the National Academy of Sciences of the United States being the most published journal (N = 8), followed by Pediatrics (N = 7), Blood (N = 6). The United States (N = 68), the UK (N = 25), and Germany (N = 20) were the top contributing countries. The Royal Manchester Children’s Hospital (N = 20) and the University of North Carolina (N = 18) were the most contributing institutions. Muenzer J was the most prolific author (N = 14).

Conclusion: We conducted a bibliometric and visual analysis of the top 100 cited articles in MPS. This study identifies the most influential articles currently available in the field of MPS, which provides a good basis for a better understanding of the disease and informs future research directions.

1 Introduction

Mucopolysaccharidoses (MPSs) are a rare and heterogeneous group of inherited lysosomal storage disorders that can be classified into seven major disorders, including 11 subtypes ( Kobayashi, 2019 ). The combined incidence of all MPS ranges from 1.53 to 4.8 cases per 100,000 live births and is characterized by progressive multiorgan involvement ( Pinto et al., 2004 ; Khan et al., 2017 ). MPS is caused by defects in genes coding for different lysosomal enzymes degrading glycosaminoglycans (GAG), such as heparan sulfate (HS), chondroitin sulfate (CS), dermatan sulfate (DS) and keratan sulfate (KS). The deficient enzyme activity leads to systemic storage of GAG and a wide range of clinical manifestations ( Puckett et al., 2021 ). For example, accumulation of GAG in growth plates and articular cartilage accelerates chondrocyte apoptosis and inflammation, leading to growth failure, limited joint range of motion, and reduced mobility ( Clarke, 2011 ). Accumulation of GAG in the eye can lead to a variety of ocular comorbidities such as corneal clouding, glaucoma, retinopathy, and ocular nerve involvement, which can result in visual disability ( Nagpal et al., 2022 ) . There are also significant neurocognitive symptoms associated with MPS, such as developmental delays, behavioral disorders, and hydrocephalus ( Shapiro and Eisengart, 2021 ). These disease manifestations seriously affect the quality of life of patients and their families. Scholars have devoted themselves to studying the pathogenesis and treatment modalities of MPS, exploring the efficacy and pitfalls of therapeutic modalities such as hematopoietic stem cell transplantation (HSCT), enzyme replacement therapy (ERT), and gene therapy (GT), and many papers have been published.

Citation analysis is essential to bibliometrics, identifying the most influential works in MPS(J. J. Zhou et al., 2017 ; Zhu et al., 2021 ). In general, the more citations an article has received, the more valuable and significant it is in the field ( Kreutzer et al., 2017 ). Identifying the most cited works is crucial for clinicians or researchers in related fields to identify the most active areas and help guide future work. Therefore, this method is widely used in other areas of literature analysis ( Karslı and Tekin, 2021 ; Liu et al., 2022 ) to identify high-quality articles in the field. However, few analyses have reported the most cited works on MPS. Hence, this study aimed to conduct a longitudinal review of the research in this field to provide a comprehensive picture of the research in the field and to identify the top 100 most cited articles on MPS in Web of science (WoS) in an effort to identify important contributions to the literature in the field as well as to provide direction for future research.

2 Materials and methods

2.1 data sources.

The number of citations of the same article in different databases is not the same, in order to avoid inconsistency in the results, so we choose only one database to search ( Zhang et al., 2023 ). The Web of Science Core Collection (WoSCC) is the most extensively utilized database in academic research (L. Chen et al., 2023 ; Liu et al., 2024 ; Zhou et al., 2023 ), so we searched the WoSCC for articles related to MPS and sorted them in descending order of citations to filter the top 100 most-cited articles. The search was performed using the following terms: TI=(mucopolysaccharidosis) OR TI=(Mucopolysaccharidoses) OR TI=(Mucopolysaccharide Diseases) OR TI=(Mucopolysaccharides) OR TI=(MPS) OR TI=(Hurler syndrome) OR TI=(Hunter syndrome) OR TI=(Sanfilippo syndrome) OR TI=(Morquio syndrome) OR TI=(Marateaux-Lamy syndrome) OR TI=(Sly syndrome) OR TI=(Hyaluronidase deficiency), the language is set to English, the type of article is not limited, and the period from 1 January 1900 to 8 November 2023. Two investigators agreed on the search terms and independently screened the articles by reading the abstract or full text. If disagreements were encountered, a third researcher exercised judgment. This study did not require ethical approval as all data were obtained from publicly available WoS databases. ( Figure 1 ).

Figure 1 . Flowchart of literature selection and analysis.

2.2 Data extraction and organization

We extracted the following data from each article: title of the article, year of publication, first author, research institution and country (whichever is the first author), name of the journal in which the article was published, Journal Citation Reports (JCR) partition (if there are more than one partition, the highest division counts), impact factor, number of citations, type of article, average number of citations received after publication of each article, and WOS category (if it belongs to more than one category, the first one will be the most important). For the country information extracted from the study, we categorized Taiwan as China ( Gao et al., 2019 ; Huang et al., 2023 ).

2.3 Statistical analysis

Descriptive statistical analysis of the articles was performed using Microsoft EXCEL 2007, containing title, year of publication, journal of publication, overall number of citations, average number of citations, impact factor, etc.; Correlation analysis was performed using SPSS 24.0, using the Pearson’s correlation coefficient (R) to determine that the difference was considered statistically significant when p < 0.05; Knowledge graphical analysis was performed using VOSviewer1.6.16 for knowledge graph analysis to map the collaborative network between countries, institutions and authors. The network contains three features: node size, connectivity, and color. A node represents a specific element such as country, author, or institution; the node’s size indicates the number or frequency of publications, and the node’s color indicates the year in which the article was published. The lines between nodes represent the number of times they appear together.

3.1 Descriptive statistics

Based on the above search formula, we retrieved 9,273 articles related to MPS and filtered out the top 100 most cited documents.

3.2 Publication year, citation

Years of publication for the top 100 articles ranged from 1979 to 2017, articles published in 2018–2023 were not included. The annual publication rate varied from one to seven articles per year, with a majority of the articles (69%) being published since 1998. Notably, the highest number of articles (n = 7) was published in 2011. ( Figure 2 ).

Figure 2 . Publishing years of the 100 top-cited articles on mps.

The top 100 articles were cited 18,790 times, with an annual average of 188 citations (122–507). There were 23 articles with more than 200 citations. A highly significant correlation existed between total citations and average annual citations (rs = 0.653, p < 0.001). There was no significant correlation between total citations and article age (rs = −0.007, p = 0.946). There was a significant correlation between average annual citations and article age (rs = −0.668, p < 0.001).

3.3 Article types and contents

The articles were ranked in descending order of citations to obtain the top 100 highly cited articles ( Table 1 ). Among them, 85 were original articles and 15 were review articles.

Table 1 . 100 top-cited articles on MPS.

We found these articles mainly focused on the epidemiology, drug treatment trials, animal experiments, identification and diagnosis, management, and treatment guidelines of MPS by reading the titles or abstracts. These articles belong to 17 categories of Web of Science, of which the top three are Genetics and Heredity (N = 18), Pediatrics (N = 18), and Endocrinology and Metabolism (N = 13) ( Table 2 ). Bone marrow transplantation (BMT), enzyme replacement therapy, lysosomal storage disease, hurler syndrome, hunter-syndrome, and central nervous system (CNS) were the high-frequency keywords that appeared ( Figure 3 ).

Table 2 . Type of study and categories in the 100 top-cited studies on MPS.

Figure 3 . The co-occurrence network of keywords network on MPS.

3.4 Journal analysis

Forty-two journals published these articles; Table 3 shows the top 10 journals with more than three publications. Of these, Molecular Genetics and Metabolism and Proceedings of the National Academy of Sciences of the United States of America was the most published journal (N = 8), followed by Pediatrics (N = 7). The IF of 42 journals varied from 1.2 to 158.5. There were 22 journals with an IF < 5.000, 11 from 5.000–10.000, nine with an IF > 10.000, and three journals with an IF > 40. Three journals were not included in the 2022 edition of the JCR. The journal with the highest IF (158.5) was the New England Journal of Medicine, which published three of the most cited articles. Nine of the top 15 journals in the JCR are in Q1, five are in Q2, and one is in Q3.

Table 3 . Journals publishing the top 100 most cited articles.

3.5 Analysis of country

A total of 25 countries published these 100 papers. Table 4 shows the top ten countries with the most publications. Among the top 100 most cited articles, the USA (N = 68) contributed the most, followed by the UK(N = 25) and Germany (N = 20). When ranked by the average number of citations per article, the top three are Canada (226), Germany (212), and the UK (199). A vast network of collaborations has been formed in this field, with the United States of America, UK, and Germany having very close collaborations ( Figure 4 ).

Table 4 . Top 10 countries contributing to the 100 most cited articles.

Figure 4 . The country collaboration network on MPS.

3.6 Analysis of institution

A total of 234 institutions contributed to the one hundred articles. Table 5 shows the top 10 institutions contributing seven or more articles, six from the United States of America. The most significant contributor was Royal Manchester Children’s Hospital (N = 20) from the UK, followed by the University of North Carolina with 18 articles, Biomarin Pharmaceut Inc. and the University of Minnesota both with 13 articles. Led by the top institutions, the institutions collaborated extensively and closely, forming a more extensive collaborative network ( Figure 5 ).

Table 5 . Institutions contributing to the 100 most cited articles.

Figure 5 . The institution collaboration network on MPS.

3.7 Analysis of author

557 authors contributed to 100 articles, and Table 6 shows the top 10 authors who contributed the most to these 100 articles. Muenzer J was the most prolific author, with 14 publications and 3,487 citations. This scholar mainly focused on treating MPS type II, i.e., Hunter’s syndrome ( Muenzer et al., 2006 ; Muenzer et al., 2007 ; Muenzer, 2014 ). This was followed by Harmatz, P (N = 12) and Hopwood, JJ (N = 12). Muenzer, J and Harmatz, P are both from the United States of America, and Hopwood, JJ is from Australia. After visually analyzing author collaborations using VOSviewer and plotting the knowledge graph several times, the minimum number of author appearances was set to four ( Figure 6 ). Most researchers do not appear in our graph because they have fewer than four articles. The nodes in the graph represent authors, and the larger the node, the greater the number of articles they have published. Extensive collaboration exists between most of the top authors.

Table 6 . The top 10 authors most frequently appearing in publications.

Figure 6 . The author collaboration network on MPS.

5 Discussion

This study reviews clinical and research advances by bibliometric and visual mapping of the top 100 most cited articles in the field of MPS, with the expectation of providing new ideas to researchers. Molecular Genetics and Metabolism and Proceedings of the National Academy of Sciences of the United States of America published the highest number of papers, and the New England Journal of Medicine published the most articles with the highest average number of citations. The United States was the most productive country. The Royal Manchester Children’s Hospital was the most influential institution. Muenzer J was the most prolific author, with 14 publications. This study found that there is extensive and close collaboration between the top-ranked countries, institutions, and authors, and analyzing these collaborative networks not only visualizes the number of publications, but also reflects their connections and the evolution and development of the field as a whole, and it can help us to retrieve resources more efficiently ( Liu et al., 2024 ).

Among the top 100 most cited articles, 27 are MPS I (11 of them on MPS IH), 11 are MPS II, nine are MPS III (four on MPS IIIA), five are MPS IV (all on MPS IVA), eight are MPS VI, three are MPS VII, one is MPS IX, and the rest of the articles do not have a clear classification of the types of MPS were not categorized. As we can see, among the 100 most cited articles, MPS I is the most popular type. The possible reason for this is that MPS I is the most common type of MPS, with a higher prevalence than the other types, and therefore there is more attention paid to it ( Scott et al., 1995 ; Çelik et al., 2021 ). MPS II is the first MPS disease to be reported, and the manifestations of this disease were described in detail by Dr. Hunter in 1917 ( Hunter, 1917 ), hence the name Hunter syndrome. However, the treatment of MPS lagged by decades. In 1968, a study by Elizabeth Neufeld et al. first found that MPS progression could be delayed or even terminated by providing deficient enzymes to MPS patients ( Fratantoni et al., 1968 ). This result provided the framework for the modern treatment of MPS. Research in this field has been going on for more than a hundred years, but the amount of research produced is far less than the short burst of Covid19-related research (Y. Chen et al., 2021 ; Wang et al., 2023 ; Zhang et al., 2022 ). There are several possible reasons for this: MPS is a rare disease with a low incidence, and thus may attract less research attention ( Platt, 2018 ); The lack of public awareness of MPS may affect the raising of research funds and the promotion of research; Due to the limited number of MPS patients, there are few clinical data available for research, which may limit the depth of research and the development of new treatment methods. Therefore, increasing the understanding of MPS and investing more research resources are of great significance to improve the diagnosis and treatment of MPS patients.

Keywords are the condensed summary of an article, and if they frequently appear together, they are considered to be a research hotspot in this field ( Liu et al., 2024 ; Zhu and Zhang, 2021 ). The co-occurrence analysis of keywords in this study showed that bone marrow transplantation, enzyme replacement therapy, lysosomal storage disease, hurler syndrome, Children, alpha-l-iduronidase, hunter-syndrome, and central nervous system were the research hotspots in the field of MPS. These keywords mainly included the classification and treatment of MPS. Lysosomal storage disease (LSD) is a group of inherited metabolic diseases that includes more than 70 diseases ( Parenti et al., 2021 ), of which MPS is a subclass. The earliest attempts to treat LSD were to use bone marrow transplantation (BMT) in patients with MPS I, also known as hematopoietic stem cell transplantation (HSCT). The success of such attempts has resulted in hundreds of patients benefiting from this treatment and extending their life expectancy ( Aldenhoven et al., 2015 ; Rodgers et al., 2017 ; Taylor et al., 2019 ; Guffon et al., 2021 ). In addition, HSCT has a good effect on improving neurocognitive function. Therefore, it is also still considered a first-line treatment for MPS IH, even though it requires frequent medical interventions and creates a substantial burden of disease ( Taylor et al., 2019 ). The findings of long-term studies and the implementation of management guidelines on enzyme replacement therapy (ERT) suggest that patients with MPS derive multiple benefits from this treatment ( Giugliani et al., 2007 ; Muenzer et al., 2009 ; Muenzer et al., 2011a ; Muenzer et al., 2011b ; Hendriksz et al., 2014 ; Hendriksz et al., 2016 ). The most cited review article ( Wraith et al., 2008 ) and Randomised controlled trial (RCT) article ( Muenzer et al., 2006 ) both reported the therapeutic effect of idursulfase replacement therapy for MPS II, a weekly infusion of idursulfase (0.5 mg/kg) could significantly increase walking distance, improve lung function, increase elbow range of motion, reduce urinary GAG levels, and reduce organ size in patients with MPS II. However, conventional idursulfase does not cross the blood-brain barrier and may not improve CNS dysfunction in patients with severe MPS II. Therefore, a new generation of ERT has been developed and studied to overcome the inability of conventional ERT to reach the CNS, which will be described later.

Among the top 100 articles, there are 45 basic studies and 31 clinical studies. Basic research is the cornerstone of research in the biomedical field, and the etiology, pathogenesis, and treatment methods of MPS( Baldiotti et al., 2021 ). The most cited basic studies ( Snyder et al., 1995 ) published in Nature in 1995, which transplanted p-glucuronidase-expressing neural progenitor cells into the ventricles of MPS VII neonatal mice and showed that lysosomal stores were significantly reduced or absent in both neurons and glial cells of treated MPS VII mice compared to untreated controls. This provides a model for using neural progenitor cells to transfer other foreign genes or factors to the CNS. Recent basic studies have shed light on the link between storage-related substances, lysosomal dysfunction, innate immune activation, and hyperinflammation that aggravate MPS symptoms, and these mechanisms could be important targets for new therapies ( Kendall and Holian, 2021 ; Tillo et al., 2022 ; Xu and Núñez, 2023 ). Therefore, a new generation of ERT has been developed and investigated to overcome the problem that conventional ERT therapy does not reach the CNS, which we will talk about later.

The United States of America was the most prolific country, publishing 68 percent of the highly cited articles, followed by the UK, Germany, and France, with the majority of the top 100 articles coming from Europe and the United States, with only one coming from Asia. The contribution of the United States is reported to be influential not only in the field of MPS but also in other fields such as orthopedics. On the one hand, this is because the United States of America has many top academic institutions and scientific research personnel ( Adnan and Ullah, 2018 ). On the other hand, the United States provides strong support and more funding for academic activities ( Bullock et al., 2018 ; da Costa Rosa et al., 2022 ), which provides a solid foundation for academic research. Canada, the UK, and Germany are all important research partners, and these countries are also highly productive in the field, forming a close-knit collaborative network among themselves. However, when we changed the metric to the average number of citations per article, the top three countries became Canada, Germany, and the UK. We believe that one reason for this is that all three countries have close collaborations with the United States, and their research findings have increased visibility and dissemination, so their research findings are likely to receive more citations ( Sugimoto et al., 2017 ; Chinchilla-Rodríguez et al., 2019 ). The second reason may be that they publish fewer articles and have a more focused area of research, which makes them more likely to be cited. The larger size of the research community in the United States may result in a wider distribution of citations for many articles, thus reducing the average citation rate. although the average citation index per article is higher, this does not necessarily reflect a country’s overall research output or impact. Therefore we need to consider a variety of factors when selecting indicators for evaluation.

In general, rare diseases rarely attract the attention of pharmaceutical companies due to their small number of patients, complex conditions, and high research and development costs ( Platt, 2018 ). Interestingly, however, some pharmaceutical companies were included in our study and were among the top 100 highest-yielding institutions. Such as Genzyme corp., Biomarin pharmaceutics inc. Both companies specialize in the development of drugs for rare diseases. Genzyme Corp. developed the first biological therapy for LSD, enzyme replacement therapy for Gaucher disease type 1 ( Brady, 2006 ). This is an achievement of academic and commercial co-creation that has yielded promising clinical results and improved clinical outcomes for patients with Gaucher disease. Since then, Genzyme has focused on rare diseases, developing enzyme replacement drugs for patients with LSD to improve their quality of life ( Clarke et al., 2009 ; Muenzer et al., 2011a ). Genzyme corp’s product Aldurazyme™ can significantly improve the respiratory function and joint movement of patients with MPS Ⅰ, reduce the accumulation of glycosaminoglycan, and has good safety. Naglazyme™ developed by Biomarin pharmaceutics inc can significantly improve joint movement, valvular heart disease, and scoliosis in patients ( McGill et al., 2010 ). Therefore, the development of medicine cannot be done without the active involvement of pharmaceutical companies.

The impact factor represents the frequency with which a journal has been cited over some time and is an important measure of a journal’s academic impact ( Mainwaring et al., 2020 ). The highest impact factor in this study was the New England Journal of Medicine, with an IF of 158.5. The second and third-ranked journals were Nature Medicine and Nature, with ifs of 82.9 and 64.8, respectively. These top journals attract a large number of high-quality papers, which in turn are published by these journals to further increase their academic impact ( Callaham et al., 2002 ). Interestingly, Molecular Genetics and Metabolism (N = 8), IF = 3.8, one of the journals that published the most cited papers in this study, had an IF = 3.8, which suggests that even low IF journals can have highly cited papers and that we should pay attention to the quality of the papers and the value of the research itself as a real contribution to the field ( Duan et al., 2022 ). In addition, the lower IF of journals focusing exclusively on metabolic diseases may be due to the smaller population studying these rare diseases. Thus, the lower IF does not reflect the importance of journals such as Molecular Genetics and Metabolism for metabolic diseases.

The number of citations of an article is related to multiple factors, such as IF, publication time, and accessibility of the journal ( Zhu et al., 2021 ). Typically, the IF represents the quality and impact of a journal’s articles ( Karsan et al., 2019 ). The most cited article in this study was published in the New England Journal of Medicine. In addition to the importance of the research results, the IF of the journal may also be the reason for its high citation. Our analysis found no significant correlation between the total number of citations and the age of articles, that is, articles published later may receive more citations, which is similar to the results of ( Zhu et al., 2021 ). An article by Khan, SA et al. published in 2017 ( Khan et al., 2017 ) was published in a short period but ranked third in average annual citations (N = 25). This indicates that this article has played an important guiding role in the research in this field, and it can be predicted that it will become a new article with a high impact in the future. In addition, paid journals may have fewer citations than open-access journals, because some readers are not willing to pay for access to article resources, so they choose to look for the same type of article in open-access journals instead, resulting in fewer citations.

In recent years, with the joint efforts of scholars all over the world, some promising treatments have emerged in the field of MPS.

Pabinafusp alfa (JR-141), a novel ERT drug developed in Japan, can cross the blood-brain barrier through transferrin receptor transcytoendocytosis and has shown positive results in clinical trials and been successfully approved for marketing ( Giugliani et al., 2021 ; Okuyama et al., 2021 ). The study showed a significant reduction in GAG accumulation in the cerebrospinal fluid of patients with MPS II, indicating successful delivery of pabinafusp alfa with favorable clinical outcomes. This is potentially valuable for patients with MPS accompanied by CNS disease.

In vivo gene therapy is a promising option. The safety and tolerability of intracerebral administration of AAVrh.10 vectors carrying the human SGSH gene with the PGK promoter have been demonstrated in four patients with MPSIIIA ( Tardieu et al., 2014 ). Another piece of good news is that Regenxbio announces a pivotal trial of RGX-121 for the treatment of MPS II achieves the primary endpoint, patients with reduced cerebrospinal fluid biomarkers below maximum attenuated disease levels ( p = 0.00016) ( Regenxbio, 2024 ). RGX-121 has also been previously reported to consistently reduce GAGs in CSF( Regenxbio, 2023 ), with some patients still benefiting for up to 3 years ( Regenxbio, 2023 ). Ex vivo HSCGT also has great potential in the treatment of MPS disorders, has proven revolutionary in similar lysosomal disorders, and is currently in several clinical trials ( Wood and Bigger, 2022 ).

Several immunomodulatory drugs have also been used in the treatment of MPS and are promising. In 2017, Polgreen et al. conducted a clinical study of adalimumab (a human monoclonal antibody that blocks TNF-α), which showed that adalimumab may help to reduce pain and improve physical and neurological function in patients with MPS I and II ( Polgreen et al., 2017 ). Anakinra is a recombinant, non-glycosylated human interleukin-1 receptor antagonist, which can improve neurocognitive symptoms when used in MPS III patients (NCT 04018755). Resveratrol is a natural phenolic compound and phytoantitoxin, and Rintz et al. demonstrated that long-term continuous administration of 50 mg/kg/day of resveratrol improved neurological symptoms and reduced urinary GAG levels in a mouse model of MPS IIIB ( Rintz et al., 2023 ). The application of these treatments is very promising in the future, and scholars can do more exploration based on the above results.

Increased knowledge of MPS’s pathophysiology and natural history and therapeutic modalities such as HSCT and ERT have improved survival and reduced morbidity. However, there are still some issues that need to be addressed, such as the safety of gene therapy, expensive treatments, and bone deformity ( Donati et al., 2018 ). In addition to new treatments, the disease diagnosis should be moved forward. For example, newborn screening associated with MPS is increasingly being implemented. But before that, more comprehensive epidemiologic investigations of patients with MPS are needed to provide a basis for determining appropriate newborn screening methods. If managed appropriately, this should lead to earlier initiation of treatment and better outcomes. We also hope that more attention and resources will be devoted to research on MPS and other rare diseases to bring patients longer and better lives.

6 Limitation

This study has several limitations. First, the data in this study came from the WoS core repository, and articles from other databases, such as PubMed and Scopus were not searched, which may lead to some missing research results. Second, the citation counts in this study did not exclude self-citations, which may also lead to bias in the results, with some high-impact articles having fewer citations instead. Some articles may have been cited more often because they have been open for a more extended period, which does not represent the quality of the articles. Third, the quality of the top 100 articles was not assessed in this study, so it is possible that there are articles of varying quality, affecting the interpretation of the results. Last and most importantly, although we reviewed articles in this field, we did not include influential or highly cited papers published in the last 5 years, and new developments in this field are not reflected in our article. We will analyze the latest developments in this field in a subsequent article.

7 Conclusion

We conducted a bibliometric and visual analysis of the top 100 cited articles in MPS, a rich and promising area of research. This study identifies the most influential articles currently available in the field of MPS, which provides a good basis for a better understanding of the disease and informs future research directions.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent from the [patients/ participants OR patients/participants legal guardian/next of kin] was not required to participate in this study in accordance with the national legislation and the institutional requirements.

Author contributions

RL: Data curation, Writing–original draft. RG: Data curation, Writing–original draft. YY: Data curation, Software, Writing–original draft. YX: Data curation, Writing–original draft. LiC: Supervision, Writing–review and editing. LaC: Resources, Writing–review and editing.

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Adnan, S., and Ullah, R. (2018). Top-cited articles in regenerative endodontics: a bibliometric analysis. J. Endod. 44 (11), 1650–1664. doi:10.1016/j.joen.2018.07.015

PubMed Abstract | CrossRef Full Text | Google Scholar

Aldenhoven, M., Wynn, R. F., Orchard, P. J., O'Meara, A., Veys, P., Fischer, A., et al. (2015). Long-term outcome of Hurler syndrome patients after hematopoietic cell transplantation: an international multicenter study. Blood 125 (13), 2164–2172. doi:10.1182/blood-2014-11-608075

Baldiotti, A. L. P., Amaral-Freitas, G., Barcelos, J. F., Freire-Maia, J., Perazzo, M. F., Freire-Maia, F. B., et al. (2021). The top 100 most-cited papers in cariology: a bibliometric analysis. Caries Res. 55 (1), 32–40. doi:10.1159/000509862

Brady, R. O. (2006). Enzyme replacement for lysosomal diseases. Annu. Rev. Med. 57, 283–296. doi:10.1146/annurev.med.57.110104.115650

Bullock, N., Ellul, T., Bennett, A., Steggall, M., and Brown, G. (2018). The 100 most influential manuscripts in andrology: a bibliometric analysis. Basic Clin. Androl. 28, 15. doi:10.1186/s12610-018-0080-4

Callaham, M., Wears, R. L., and Weber, E. (2002). Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. JAMA 287 (21), 2847–2850. doi:10.1001/jama.287.21.2847

Çelik, B., Tomatsu, S. C., Tomatsu, S., and Khan, S. A. (2021). Epidemiology of mucopolysaccharidoses update. Diagn. (Basel) 11 (2), 273. doi:10.3390/diagnostics11020273

CrossRef Full Text | Google Scholar

Chen, L., Wan, Y., Yang, T., Zhang, Q., Zeng, Y., Zheng, S., et al. (2023). Bibliometric and visual analysis of single-cell sequencing from 2010 to 2022. Front. Genet. 14, 1285599. doi:10.3389/fgene.2023.1285599

Chen, Y., Zhang, X., Chen, S., Zhang, Y., Wang, Y., Lu, Q., et al. (2021). Bibliometric analysis of mental health during the COVID-19 pandemic. Asian J. Psychiatr. 65, 102846. doi:10.1016/j.ajp.2021.102846

Chinchilla-Rodríguez, Z., Sugimoto, C. R., and Larivière, V. (2019). Follow the leader: on the relationship between leadership and scholarly impact in international collaborations. PLoS One 14 (6), e0218309. doi:10.1371/journal.pone.0218309

Clarke, L. A. (2011). Pathogenesis of skeletal and connective tissue involvement in the mucopolysaccharidoses: glycosaminoglycan storage is merely the instigator. Rheumatol. Oxf. 50 (5), v13–v18. doi:10.1093/rheumatology/ker395

Clarke, L. A., Wraith, J. E., Beck, M., Kolodny, E. H., Pastores, G. M., Muenzer, J., et al. (2009). Long-term efficacy and safety of laronidase in the treatment of mucopolysaccharidosis I. Pediatrics 123 (1), 229–240. doi:10.1542/peds.2007-3847

da Costa Rosa, T., Pintor, A. V. B., Magno, M. B., Marañón-Vásquez, G. A., Maia, L. C., and Neves, A. A. (2022). Worldwide trends on molar incisor and deciduous molar hypomineralisation research: a bibliometric analysis over a 19-year period. Eur. Arch. Paediatr. Dent. 23 (1), 133–146. doi:10.1007/s40368-021-00676-5

Donati, M. A., Pasquini, E., Spada, M., Polo, G., and Burlina, A. (2018). Newborn screening in mucopolysaccharidoses. Ital. J. Pediatr. 44 (2), 126. doi:10.1186/s13052-018-0552-3

Duan, S. L., Qi, L., Li, M. H., Liu, L. F., Wang, Y., and Guan, X. (2022). The top 100 most-cited papers in pheochromocytomas and paragangliomas: a bibliometric study. Front. Oncol. 12, 993921. doi:10.3389/fonc.2022.993921

Fratantoni, J. C., Hall, C. W., and Neufeld, E. F. (1968). Hurler and Hunter syndromes: mutual correction of the defect in cultured fibroblasts. Science 162 (3853), 570–572. doi:10.1126/science.162.3853.570

Gao, Y., Shi, S., Ma, W., Chen, J., Cai, Y., Ge, L., et al. (2019). Bibliometric analysis of global research on PD-1 and PD-L1 in the field of cancer. Int. Immunopharmacol. 72, 374–384. doi:10.1016/j.intimp.2019.03.045

Giugliani, R., Harmatz, P., and Wraith, J. E. (2007). Management guidelines for mucopolysaccharidosis VI. Pediatrics 120 (2), 405–418. doi:10.1542/peds.2006-2184

Giugliani, R., Martins, A. M., So, S., Yamamoto, T., Yamaoka, M., Ikeda, T., et al. (2021). Iduronate-2-sulfatase fused with anti-hTfR antibody, pabinafusp alfa, for MPS-II: a phase 2 trial in Brazil. Mol. Ther. 29 (7), 2378–2386. doi:10.1016/j.ymthe.2021.03.019

Guffon, N., Pettazzoni, M., Pangaud, N., Garin, C., Lina-Granade, G., Plault, C., et al. (2021). Long term disease burden post-transplantation: three decades of observations in 25 Hurler patients successfully treated with hematopoietic stem cell transplantation (HSCT). Orphanet J. Rare Dis. 16 (1), 60. doi:10.1186/s13023-020-01644-w

Hendriksz, C. J., Burton, B., Fleming, T. R., Harmatz, P., Hughes, D., Jones, S. A., et al. (2014). Efficacy and safety of enzyme replacement therapy with BMN 110 (elosulfase alfa) for Morquio A syndrome (mucopolysaccharidosis IVA): a phase 3 randomised placebo-controlled study. J. Inherit. Metab. Dis. 37 (6), 979–990. doi:10.1007/s10545-014-9715-6

Hendriksz, C. J., Parini, R., AlSayed, M. D., Raiman, J., Giugliani, R., Solano Villarreal, M. L., et al. (2016). Long-term endurance and safety of elosulfase alfa enzyme replacement therapy in patients with Morquio A syndrome. Mol. Genet. Metab. 119 (1-2), 131–143. doi:10.1016/j.ymgme.2016.05.018

Huang, Y., Chen, P., Peng, B., Liao, R., Huang, H., Huang, M., et al. (2023). The top 100 most cited articles on triple-negative breast cancer: a bibliometric analysis. Clin. Exp. Med. 23 (2), 175–201. doi:10.1007/s10238-022-00800-9

Hunter, C. (1917). A rare disease in two brothers. Proc. R. Soc. Med. 10, 104–116. doi:10.1177/003591571701001833

Karsan, R. B., Powell, A. G., Nanjaiah, P., Mehta, D., and Valtzoglou, V. (2019). The top 100 manuscripts in emergency cardiac surgery. Potential role in cardiothoracic training. A bibliometric analysis. Ann. Med. Surg. (Lond) 43, 5–12. doi:10.1016/j.amsu.2019.05.002

Karslı, B., and Tekin, S. B. (2021). The top 100 most-cited articles on ankle arthroscopy: bibliometric analysis. J. Foot Ankle Surg. 60 (3), 477–481. doi:10.1053/j.jfas.2020.08.028

Kendall, R. L., and Holian, A. (2021). The role of lysosomal ion channels in lysosome dysfunction. Inhal. Toxicol. 33 (2), 41–54. doi:10.1080/08958378.2021.1876188

Khan, S. A., Peracha, H., Ballhausen, D., Wiesbauer, A., Rohrbach, M., Gautschi, M., et al. (2017). Epidemiology of mucopolysaccharidoses. Mol. Genet. Metab. 121 (3), 227–240. doi:10.1016/j.ymgme.2017.05.016

Kobayashi, H. (2019). Recent trends in mucopolysaccharidosis research. J. Hum. Genet. 64 (2), 127–137. doi:10.1038/s10038-018-0534-8

Kreutzer, J. S., Agyemang, A. A., Weedon, D., Zasler, N., Oliver, M., Sorensen, A. A., et al. (2017). The top 100 cited neurorehabilitation papers. NeuroRehabilitation 40 (2), 163–174. doi:10.3233/NRE-161415

Liu, P. C., Lu, Y., Lin, H. H., Yao, Y. C., Wang, S. T., Chang, M. C., et al. (2022). Classification and citation analysis of the 100 top-cited articles on adult spinal deformity since 2011: a bibliometric analysis. J. Chin. Med. Assoc. 85 (3), 401–408. doi:10.1097/jcma.0000000000000642

Liu, R., Peng, B., Yuan, J., Hu, J., Yang, J., Shan, N., et al. (2024). Research on stem cell therapy for spinal cord injury: a bibliometric and visual analysis from 2018-2023. Front. Genet. 15, 1327216. doi:10.3389/fgene.2024.1327216

Mainwaring, A., Bullock, N., Ellul, T., Hughes, O., and Featherstone, J. (2020). The top 100 most cited manuscripts in bladder cancer: a bibliometric analysis (review article). Int. J. Surg. 75, 130–138. doi:10.1016/j.ijsu.2020.01.128

McGill, J. J., Inwood, A. C., Coman, D. J., Lipke, M. L., de Lore, D., Swiedler, S. J., et al. (2010). Enzyme replacement therapy for mucopolysaccharidosis VI from 8 weeks of age--a sibling control study. Clin. Genet. 77 (5), 492–498. doi:10.1111/j.1399-0004.2009.01324.x

Muenzer, J. (2014). Early initiation of enzyme replacement therapy for the mucopolysaccharidoses. Mol. Genet. Metab. 111 (2), 63–72. doi:10.1016/j.ymgme.2013.11.015

Muenzer, J., Beck, M., Eng, C. M., Giugliani, R., Harmatz, P., Martin, R., et al. (2011a). Long-term, open-labeled extension study of idursulfase in the treatment of Hunter syndrome. Genet. Med. 13 (2), 95–101. doi:10.1097/GIM.0b013e3181fea459

Muenzer, J., Beck, M., Giugliani, R., Suzuki, Y., Tylki-Szymanska, A., Valayannopoulos, V., et al. (2011b). Idursulfase treatment of Hunter syndrome in children younger than 6 years: results from the Hunter Outcome Survey. Genet. Med. 13 (2), 102–109. doi:10.1097/GIM.0b013e318206786f

Muenzer, J., Gucsavas-Calikoglu, M., McCandless, S. E., Schuetz, T. J., and Kimura, A. (2007). A phase I/II clinical trial of enzyme replacement therapy in mucopolysaccharidosis II (Hunter syndrome). Mol. Genet. Metab. 90 (3), 329–337. doi:10.1016/j.ymgme.2006.09.001

Muenzer, J., Wraith, J. E., Beck, M., Giugliani, R., Harmatz, P., Eng, C. M., et al. (2006). A phase II/III clinical study of enzyme replacement therapy with idursulfase in mucopolysaccharidosis II (Hunter syndrome). Genet. Med. 8 (8), 465–473. doi:10.1097/01.gim.0000232477.37660.fb

Muenzer, J., Wraith, J. E., and Clarke, L. A.International Consensus Panel on Management and Treatment of Mucopolysaccharidosis I (2009). Mucopolysaccharidosis I: management and treatment guidelines. Pediatrics 123 (1), 19–29. doi:10.1542/peds.2008-0416

Nagpal, R., Goyal, R. B., Priyadarshini, K., Kashyap, S., Sharma, M., Sinha, R., et al. (2022). Mucopolysaccharidosis: a broad review. Indian J. Ophthalmol. 70 (7), 2249–2261. doi:10.4103/ijo.IJO_425_22

Okuyama, T., Eto, Y., Sakai, N., Nakamura, K., Yamamoto, T., Yamaoka, M., et al. (2021). A phase 2/3 trial of pabinafusp alfa, IDS fused with anti-human transferrin receptor antibody, targeting neurodegeneration in MPS-II. Mol. Ther. 29 (2), 671–679. doi:10.1016/j.ymthe.2020.09.039

Parenti, G., Medina, D. L., and Ballabio, A. (2021). The rapidly evolving view of lysosomal storage diseases. EMBO Mol. Med. 13 (2), e12836. doi:10.15252/emmm.202012836

Pinto, R., Caseiro, C., Lemos, M., Lopes, L., Fontes, A., Ribeiro, H., et al. (2004). Prevalence of lysosomal storage diseases in Portugal. Eur. J. Hum. Genet. 12 (2), 87–92. doi:10.1038/sj.ejhg.5201044

Platt, F. M. (2018). Emptying the stores: lysosomal diseases and therapeutic strategies. Nat. Rev. Drug Discov. 17 (2), 133–150. doi:10.1038/nrd.2017.214

Polgreen, L. E., Kunin-Batson, A., Rudser, K., Vehe, R. K., Utz, J. J., Whitley, C. B., et al. (2017). Pilot study of the safety and effect of adalimumab on pain, physical function, and musculoskeletal disease in mucopolysaccharidosis types I and II. Mol. Genet. Metab. Rep. 10, 75–80. doi:10.1016/j.ymgmr.2017.01.002

Puckett, Y., Mallorga-Hernández, A., and Montaño, A. M. (2021). Epidemiology of mucopolysaccharidoses (MPS) in United States: challenges and opportunities. Orphanet J. Rare Dis. 16 (1), 241. doi:10.1186/s13023-021-01880-8

Regenxbio (2023). Additional positive interim data from phase I/II/III CAMPSIITE™ trial of REGENXBIO's RGX-121 for the treatment of MPS II (hunter syndrome) presented at 19th annual WORLDSymposiumTM. Avaliable at: https://regenxbio.gcs-web.com/news-releases/news-release-details/additional-positive-interim-data-phase-iiiiii-campsiitetm-trial .

Google Scholar

Regenxbio (2024). REGENXBIO announces pivotal trial of RGX-121 for the treatment of MPS II achieves primary endpoint. Avaliable at: https://www.prnewswire.com/news-releases/regenxbio-announces-pivotal-trial-of-rgx-121-for-the-treatment-of-mps-ii-achieves-primary-endpoint-302056283.html .

Rintz, E., Podlacha, M., Cyske, Z., Pierzynowska, K., Węgrzyn, G., and Gaffke, L. (2023). Activities of (Poly)phenolic antioxidants and other natural autophagy modulators in the treatment of sanfilippo disease: remarkable efficacy of resveratrol in cellular and animal models. Neurotherapeutics 20 (1), 254–271. doi:10.1007/s13311-022-01323-7

Rodgers, N. J., Kaizer, A. M., Miller, W. P., Rudser, K. D., Orchard, P. J., and Braunlin, E. A. (2017). Mortality after hematopoietic stem cell transplantation for severe mucopolysaccharidosis type I: the 30-year University of Minnesota experience. J. Inherit. Metab. Dis. 40 (2), 271–280. doi:10.1007/s10545-016-0006-2

Scott, H. S., Bunge, S., Gal, A., Clarke, L. A., Morris, C. P., and Hopwood, J. J. (1995). Molecular genetics of mucopolysaccharidosis type I: diagnostic, clinical, and biological implications. Hum. Mutat. 6 (4), 288–302. doi:10.1002/humu.1380060403

Shapiro, E. G., and Eisengart, J. B. (2021). The natural history of neurocognition in MPS disorders: a review. Mol. Genet. Metab. 133 (1), 8–34. doi:10.1016/j.ymgme.2021.03.002

Snyder, E. Y., Taylor, R. M., and Wolfe, J. H. (1995). Neural progenitor cell engraftment corrects lysosomal storage throughout the MPS VII mouse brain. Nature 374 (6520), 367–370. doi:10.1038/374367a0

Sugimoto, C. R., Robinson-Garcia, N., Murray, D. S., Yegros-Yegros, A., Costas, R., and Larivière, V. (2017). Scientists have most impact when they're free to move. Nature 550 (7674), 29–31. doi:10.1038/550029a

Tardieu, M., Zérah, M., Husson, B., de Bournonville, S., Deiva, K., Adamsbaum, C., et al. (2014). Intracerebral administration of adeno-associated viral vector serotype rh.10 carrying human SGSH and SUMF1 cDNAs in children with mucopolysaccharidosis type IIIA disease: results of a phase I/II trial. Hum. Gene Ther. 25 (6), 506–516. doi:10.1089/hum.2013.238

Taylor, M., Khan, S., Stapleton, M., Wang, J., Chen, J., Wynn, R., et al. (2019). Hematopoietic stem cell transplantation for mucopolysaccharidoses: past, present, and future. Biol. Blood Marrow Transpl. 25 (7), e226–e246. doi:10.1016/j.bbmt.2019.02.012

Tillo, M., Lamanna, W. C., Dwyer, C. A., Sandoval, D. R., Pessentheiner, A. R., Al-Azzam, N., et al. (2022). Impaired mitophagy in Sanfilippo a mice causes hypertriglyceridemia and brown adipose tissue activation. J. Biol. Chem. 298 (8), 102159. doi:10.1016/j.jbc.2022.102159

Wang, W., Wang, H., Yao, T., Li, Y., Yi, L., Gao, Y., et al. (2023). The top 100 most cited articles on COVID-19 vaccine: a bibliometric analysis. Clin. Exp. Med. 23 (6), 2287–2299. doi:10.1007/s10238-023-01046-9

Wood, S. R., and Bigger, B. W. (2022). Delivering gene therapy for mucopolysaccharide diseases. Front. Mol. Biosci. 9, 965089. doi:10.3389/fmolb.2022.965089

Wraith, J. E., Scarpa, M., Beck, M., Bodamer, O. A., De Meirleir, L., Guffon, N., et al. (2008). Mucopolysaccharidosis type II (Hunter syndrome): a clinical review and recommendations for treatment in the era of enzyme replacement therapy. Eur. J. Pediatr. 167 (3), 267–277. doi:10.1007/s00431-007-0635-4

Xu, J., and Núñez, G. (2023). The NLRP3 inflammasome: activation and regulation. Trends Biochem. Sci. 48 (4), 331–344. doi:10.1016/j.tibs.2022.10.002

Zhang, Y., Hu, M., Wang, J., Wang, P., Shi, P., Zhao, W., et al. (2022). A bibliometric analysis of personal protective equipment and COVID-19 researches. Front. Public Health 10, 855633. doi:10.3389/fpubh.2022.855633

Zhang, Y., Rong, L., Wang, Z., and Zhao, H. (2023). The top 100 most cited articles in helical tomotherapy: a scoping review. Front. Oncol. 13, 1274290. doi:10.3389/fonc.2023.1274290

Zhou, J. J., Koltz, M. T., Agarwal, N., Tempel, Z. J., Kanter, A. S., Okonkwo, D. O., et al. (2017). 100 most influential publications in scoliosis surgery. Spine (Phila Pa 1976) 42 (5), 336–344. doi:10.1097/BRS.0000000000001860

Zhou, Y., Liu, M., Huang, X., Liu, Z., Sun, Y., Wang, M., et al. (2023). Emerging trends and thematic evolution of immunotherapy for glioma based on the top 100 cited articles. Front. Oncol. 13, 1307924. doi:10.3389/fonc.2023.1307924

Zhu, H., and Zhang, Z. (2021). Emerging trends and research foci in cataract genes: a bibliometric and visualized study. Front. Genet. 12, 610728. doi:10.3389/fgene.2021.610728

Zhu, Y., Zhang, C., Wang, J., Xie, Y., Wang, L., and Xu, F. (2021). The top 100 highly cited articles on anterior cruciate ligament from 2000 to 2019: a bibliometric and visualized analysis. Orthop. Traumatol. Surg. Res. 107 (8), 102988. doi:10.1016/j.otsr.2021.102988

Keywords: mucopolysaccharidoses, lysosomal storage disease, MPS, bibliometric analysis, VOSviewer

Citation: Liao R, Geng R, Yang Y, Xue Y, Chen L and Chen L (2024) The top 100 most cited articles on mucopolysaccharidoses: a bibliometric analysis. Front. Genet. 15:1377743. doi: 10.3389/fgene.2024.1377743

Received: 28 January 2024; Accepted: 29 March 2024; Published: 12 April 2024.

Reviewed by:

Copyright © 2024 Liao, Geng, Yang, Xue, Chen and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Lan Chen, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

Knowledge Base
Research bias

Types of Bias in Research | Definition & Examples

Research bias results from any deviation from the truth, causing distorted results and wrong conclusions. Bias can occur at any phase of your research, including during data collection , data analysis , interpretation, or publication. Research bias can occur in both qualitative and quantitative research .

Understanding research bias is important for several reasons.

Bias exists in all research, across research designs , and is difficult to eliminate.
Bias can occur at any stage of the research process .
Bias impacts the validity and reliability of your findings, leading to misinterpretation of data.

It is almost impossible to conduct a study without some degree of research bias. It’s crucial for you to be aware of the potential types of bias, so you can minimize them.

For example, the success rate of the program will likely be affected if participants start to drop out ( attrition ). Participants who become disillusioned due to not losing weight may drop out, while those who succeed in losing weight are more likely to continue. This in turn may bias the findings towards more favorable results.

Information bias, interviewer bias.

Publication bias

Researcher bias

Response bias.

Selection bias

Cognitive bias

How to avoid bias in research

Other types of research bias, frequently asked questions about research bias.

Information bias , also called measurement bias, arises when key study variables are inaccurately measured or classified. Information bias occurs during the data collection step and is common in research studies that involve self-reporting and retrospective data collection. It can also result from poor interviewing techniques or differing levels of recall from participants.

The main types of information bias are:

Recall bias
Observer bias

Performance bias

Regression to the mean (rtm).

Over a period of four weeks, you ask students to keep a journal, noting how much time they spent on their smartphones along with any symptoms like muscle twitches, aches, or fatigue.

Recall bias is a type of information bias. It occurs when respondents are asked to recall events in the past and is common in studies that involve self-reporting.

As a rule of thumb, infrequent events (e.g., buying a house or a car) will be memorable for longer periods of time than routine events (e.g., daily use of public transportation). You can reduce recall bias by running a pilot survey and carefully testing recall periods. If possible, test both shorter and longer periods, checking for differences in recall.

A group of children who have been diagnosed, called the case group
A group of children who have not been diagnosed, called the control group

Since the parents are being asked to recall what their children generally ate over a period of several years, there is high potential for recall bias in the case group.

The best way to reduce recall bias is by ensuring your control group will have similar levels of recall bias to your case group. Parents of children who have childhood cancer, which is a serious health problem, are likely to be quite concerned about what may have contributed to the cancer.

Thus, if asked by researchers, these parents are likely to think very hard about what their child ate or did not eat in their first years of life. Parents of children with other serious health problems (aside from cancer) are also likely to be quite concerned about any diet-related question that researchers ask about.

Observer bias is the tendency of research participants to see what they expect or want to see, rather than what is actually occurring. Observer bias can affect the results in observationa l and experimental studies, where subjective judgment (such as assessing a medical image) or measurement (such as rounding blood pressure readings up or down) is part of the d ata collection process.

Observer bias leads to over- or underestimation of true values, which in turn compromise the validity of your findings. You can reduce observer bias by using double-blinded and single-blinded research methods.

Based on discussions you had with other researchers before starting your observations , you are inclined to think that medical staff tend to simply call each other when they need specific patient details or have questions about treatments.

At the end of the observation period, you compare notes with your colleague. Your conclusion was that medical staff tend to favor phone calls when seeking information, while your colleague noted down that medical staff mostly rely on face-to-face discussions. Seeing that your expectations may have influenced your observations, you and your colleague decide to conduct semi-structured interviews with medical staff to clarify the observed events. Note: Observer bias and actor–observer bias are not the same thing.

Performance bias is unequal care between study groups. Performance bias occurs mainly in medical research experiments, if participants have knowledge of the planned intervention, therapy, or drug trial before it begins.

Studies about nutrition, exercise outcomes, or surgical interventions are very susceptible to this type of bias. It can be minimized by using blinding , which prevents participants and/or researchers from knowing who is in the control or treatment groups. If blinding is not possible, then using objective outcomes (such as hospital admission data) is the best approach.

When the subjects of an experimental study change or improve their behavior because they are aware they are being studied, this is called the Hawthorne effect (or observer effect). Similarly, the John Henry effect occurs when members of a control group are aware they are being compared to the experimental group. This causes them to alter their behavior in an effort to compensate for their perceived disadvantage.

Regression to the mean (RTM) is a statistical phenomenon that refers to the fact that a variable that shows an extreme value on its first measurement will tend to be closer to the center of its distribution on a second measurement.

Medical research is particularly sensitive to RTM. Here, interventions aimed at a group or a characteristic that is very different from the average (e.g., people with high blood pressure) will appear to be successful because of the regression to the mean. This can lead researchers to misinterpret results, describing a specific intervention as causal when the change in the extreme groups would have happened anyway.

In general, among people with depression, certain physical and mental characteristics have been observed to deviate from the population mean .

This could lead you to think that the intervention was effective when those treated showed improvement on measured post-treatment indicators, such as reduced severity of depressive episodes.

However, given that such characteristics deviate more from the population mean in people with depression than in people without depression, this improvement could be attributed to RTM.

Interviewer bias stems from the person conducting the research study. It can result from the way they ask questions or react to responses, but also from any aspect of their identity, such as their sex, ethnicity, social class, or perceived attractiveness.

Interviewer bias distorts responses, especially when the characteristics relate in some way to the research topic. Interviewer bias can also affect the interviewer’s ability to establish rapport with the interviewees, causing them to feel less comfortable giving their honest opinions about sensitive or personal topics.

Participant: “I like to solve puzzles, or sometimes do some gardening.”

You: “I love gardening, too!”

In this case, seeing your enthusiastic reaction could lead the participant to talk more about gardening.

Establishing trust between you and your interviewees is crucial in order to ensure that they feel comfortable opening up and revealing their true thoughts and feelings. At the same time, being overly empathetic can influence the responses of your interviewees, as seen above.

Publication bias occurs when the decision to publish research findings is based on their nature or the direction of their results. Studies reporting results that are perceived as positive, statistically significant , or favoring the study hypotheses are more likely to be published due to publication bias.

Publication bias is related to data dredging (also called p -hacking ), where statistical tests on a set of data are run until something statistically significant happens. As academic journals tend to prefer publishing statistically significant results, this can pressure researchers to only submit statistically significant results. P -hacking can also involve excluding participants or stopping data collection once a p value of 0.05 is reached. However, this leads to false positive results and an overrepresentation of positive results in published academic literature.

Researcher bias occurs when the researcher’s beliefs or expectations influence the research design or data collection process. Researcher bias can be deliberate (such as claiming that an intervention worked even if it didn’t) or unconscious (such as letting personal feelings, stereotypes, or assumptions influence research questions ).

The unconscious form of researcher bias is associated with the Pygmalion effect (or Rosenthal effect ), where the researcher’s high expectations (e.g., that patients assigned to a treatment group will succeed) lead to better performance and better outcomes.

Researcher bias is also sometimes called experimenter bias, but it applies to all types of investigative projects, rather than only to experimental designs .

Good question: What are your views on alcohol consumption among your peers?
Bad question: Do you think it’s okay for young people to drink so much?

Response bias is a general term used to describe a number of different situations where respondents tend to provide inaccurate or false answers to self-report questions, such as those asked on surveys or in structured interviews .

This happens because when people are asked a question (e.g., during an interview ), they integrate multiple sources of information to generate their responses. Because of that, any aspect of a research study may potentially bias a respondent. Examples include the phrasing of questions in surveys, how participants perceive the researcher, or the desire of the participant to please the researcher and to provide socially desirable responses.

Response bias also occurs in experimental medical research. When outcomes are based on patients’ reports, a placebo effect can occur. Here, patients report an improvement despite having received a placebo, not an active medical treatment.

While interviewing a student, you ask them:

“Do you think it’s okay to cheat on an exam?”

Common types of response bias are:

Acquiescence bias

Demand characteristics.

Social desirability bias

Courtesy bias

Question-order bias

Extreme responding

Acquiescence bias is the tendency of respondents to agree with a statement when faced with binary response options like “agree/disagree,” “yes/no,” or “true/false.” Acquiescence is sometimes referred to as “yea-saying.”

This type of bias occurs either due to the participant’s personality (i.e., some people are more likely to agree with statements than disagree, regardless of their content) or because participants perceive the researcher as an expert and are more inclined to agree with the statements presented to them.

Q: Are you a social person?

People who are inclined to agree with statements presented to them are at risk of selecting the first option, even if it isn’t fully supported by their lived experiences.

In order to control for acquiescence, consider tweaking your phrasing to encourage respondents to make a choice truly based on their preferences. Here’s an example:

Q: What would you prefer?

A quiet night in
A night out with friends

Demand characteristics are cues that could reveal the research agenda to participants, risking a change in their behaviors or views. Ensuring that participants are not aware of the research objectives is the best way to avoid this type of bias.

On each occasion, patients reported their pain as being less than prior to the operation. While at face value this seems to suggest that the operation does indeed lead to less pain, there is a demand characteristic at play. During the interviews, the researcher would unconsciously frown whenever patients reported more post-op pain. This increased the risk of patients figuring out that the researcher was hoping that the operation would have an advantageous effect.

Social desirability bias is the tendency of participants to give responses that they believe will be viewed favorably by the researcher or other participants. It often affects studies that focus on sensitive topics, such as alcohol consumption or sexual behavior.

You are conducting face-to-face semi-structured interviews with a number of employees from different departments. When asked whether they would be interested in a smoking cessation program, there was widespread enthusiasm for the idea.

Note that while social desirability and demand characteristics may sound similar, there is a key difference between them. Social desirability is about conforming to social norms, while demand characteristics revolve around the purpose of the research.

Courtesy bias stems from a reluctance to give negative feedback, so as to be polite to the person asking the question. Small-group interviewing where participants relate in some way to each other (e.g., a student, a teacher, and a dean) is especially prone to this type of bias.

Question order bias

Question order bias occurs when the order in which interview questions are asked influences the way the respondent interprets and evaluates them. This occurs especially when previous questions provide context for subsequent questions.

When answering subsequent questions, respondents may orient their answers to previous questions (called a halo effect ), which can lead to systematic distortion of the responses.

Extreme responding is the tendency of a respondent to answer in the extreme, choosing the lowest or highest response available, even if that is not their true opinion. Extreme responding is common in surveys using Likert scales , and it distorts people’s true attitudes and opinions.

Disposition towards the survey can be a source of extreme responding, as well as cultural components. For example, people coming from collectivist cultures tend to exhibit extreme responses in terms of agreement, while respondents indifferent to the questions asked may exhibit extreme responses in terms of disagreement.

Selection bias is a general term describing situations where bias is introduced into the research from factors affecting the study population.

Common types of selection bias are:

Sampling or ascertainment bias

Attrition bias
Self-selection (or volunteer) bias
Survivorship bias
Nonresponse bias
Undercoverage bias

Sampling bias occurs when your sample (the individuals, groups, or data you obtain for your research) is selected in a way that is not representative of the population you are analyzing. Sampling bias threatens the external validity of your findings and influences the generalizability of your results.

The easiest way to prevent sampling bias is to use a probability sampling method . This way, each member of the population you are studying has an equal chance of being included in your sample.

Sampling bias is often referred to as ascertainment bias in the medical field.

Attrition bias occurs when participants who drop out of a study systematically differ from those who remain in the study. Attrition bias is especially problematic in randomized controlled trials for medical research because participants who do not like the experience or have unwanted side effects can drop out and affect your results.

You can minimize attrition bias by offering incentives for participants to complete the study (e.g., a gift card if they successfully attend every session). It’s also a good practice to recruit more participants than you need, or minimize the number of follow-up sessions or questions.

You provide a treatment group with weekly one-hour sessions over a two-month period, while a control group attends sessions on an unrelated topic. You complete five waves of data collection to compare outcomes: a pretest survey, three surveys during the program, and a posttest survey.

Self-selection or volunteer bias

Self-selection bias (also called volunteer bias ) occurs when individuals who volunteer for a study have particular characteristics that matter for the purposes of the study.

Volunteer bias leads to biased data, as the respondents who choose to participate will not represent your entire target population. You can avoid this type of bias by using random assignment —i.e., placing participants in a control group or a treatment group after they have volunteered to participate in the study.

Closely related to volunteer bias is nonresponse bias , which occurs when a research subject declines to participate in a particular study or drops out before the study’s completion.

Considering that the hospital is located in an affluent part of the city, volunteers are more likely to have a higher socioeconomic standing, higher education, and better nutrition than the general population.

Survivorship bias occurs when you do not evaluate your data set in its entirety: for example, by only analyzing the patients who survived a clinical trial.

This strongly increases the likelihood that you draw (incorrect) conclusions based upon those who have passed some sort of selection process—focusing on “survivors” and forgetting those who went through a similar process and did not survive.

Note that “survival” does not always mean that participants died! Rather, it signifies that participants did not successfully complete the intervention.

However, most college dropouts do not become billionaires. In fact, there are many more aspiring entrepreneurs who dropped out of college to start companies and failed than succeeded.

Nonresponse bias occurs when those who do not respond to a survey or research project are different from those who do in ways that are critical to the goals of the research. This is very common in survey research, when participants are unable or unwilling to participate due to factors like lack of the necessary skills, lack of time, or guilt or shame related to the topic.

You can mitigate nonresponse bias by offering the survey in different formats (e.g., an online survey, but also a paper version sent via post), ensuring confidentiality , and sending them reminders to complete the survey.

You notice that your surveys were conducted during business hours, when the working-age residents were less likely to be home.

Undercoverage bias occurs when you only sample from a subset of the population you are interested in. Online surveys can be particularly susceptible to undercoverage bias. Despite being more cost-effective than other methods, they can introduce undercoverage bias as a result of excluding people who do not use the internet.

Cognitive bias refers to a set of predictable (i.e., nonrandom) errors in thinking that arise from our limited ability to process information objectively. Rather, our judgment is influenced by our values, memories, and other personal traits. These create “ mental shortcuts” that help us process information intuitively and decide faster. However, cognitive bias can also cause us to misunderstand or misinterpret situations, information, or other people.

Because of cognitive bias, people often perceive events to be more predictable after they happen.

Although there is no general agreement on how many types of cognitive bias exist, some common types are:

Anchoring bias
Framing effect
Actor-observer bias
Availability heuristic (or availability bias)
Confirmation bias
Halo effect
The Baader-Meinhof phenomenon

Anchoring bias

Anchoring bias is people’s tendency to fixate on the first piece of information they receive, especially when it concerns numbers. This piece of information becomes a reference point or anchor. Because of that, people base all subsequent decisions on this anchor. For example, initial offers have a stronger influence on the outcome of negotiations than subsequent ones.

Framing effect

Framing effect refers to our tendency to decide based on how the information about the decision is presented to us. In other words, our response depends on whether the option is presented in a negative or positive light, e.g., gain or loss, reward or punishment, etc. This means that the same information can be more or less attractive depending on the wording or what features are highlighted.

Actor–observer bias

Actor–observer bias occurs when you attribute the behavior of others to internal factors, like skill or personality, but attribute your own behavior to external or situational factors.

In other words, when you are the actor in a situation, you are more likely to link events to external factors, such as your surroundings or environment. However, when you are observing the behavior of others, you are more likely to associate behavior with their personality, nature, or temperament.

One interviewee recalls a morning when it was raining heavily. They were rushing to drop off their kids at school in order to get to work on time. As they were driving down the highway, another car cut them off as they were trying to merge. They tell you how frustrated they felt and exclaim that the other driver must have been a very rude person.

At another point, the same interviewee recalls that they did something similar: accidentally cutting off another driver while trying to take the correct exit. However, this time, the interviewee claimed that they always drive very carefully, blaming their mistake on poor visibility due to the rain.

Availability heuristic

Availability heuristic (or availability bias) describes the tendency to evaluate a topic using the information we can quickly recall to our mind, i.e., that is available to us. However, this is not necessarily the best information, rather it’s the most vivid or recent. Even so, due to this mental shortcut, we tend to think that what we can recall must be right and ignore any other information.

Confirmation bias

Confirmation bias is the tendency to seek out information in a way that supports our existing beliefs while also rejecting any information that contradicts those beliefs. Confirmation bias is often unintentional but still results in skewed results and poor decision-making.

Let’s say you grew up with a parent in the military. Chances are that you have a lot of complex emotions around overseas deployments. This can lead you to over-emphasize findings that “prove” that your lived experience is the case for most families, neglecting other explanations and experiences.

The halo effect refers to situations whereby our general impression about a person, a brand, or a product is shaped by a single trait. It happens, for instance, when we automatically make positive assumptions about people based on something positive we notice, while in reality, we know little about them.

The Baader-Meinhof phenomenon

The Baader-Meinhof phenomenon (or frequency illusion) occurs when something that you recently learned seems to appear “everywhere” soon after it was first brought to your attention. However, this is not the case. What has increased is your awareness of something, such as a new word or an old song you never knew existed, not their frequency.

While very difficult to eliminate entirely, research bias can be mitigated through proper study design and implementation. Here are some tips to keep in mind as you get started.

Clearly explain in your methodology section how your research design will help you meet the research objectives and why this is the most appropriate research design.
In quantitative studies , make sure that you use probability sampling to select the participants. If you’re running an experiment, make sure you use random assignment to assign your control and treatment groups.
Account for participants who withdraw or are lost to follow-up during the study. If they are withdrawing for a particular reason, it could bias your results. This applies especially to longer-term or longitudinal studies .
Use triangulation to enhance the validity and credibility of your findings.
Phrase your survey or interview questions in a neutral, non-judgmental tone. Be very careful that your questions do not steer your participants in any particular direction.
Consider using a reflexive journal. Here, you can log the details of each interview , paying special attention to any influence you may have had on participants. You can include these in your final analysis.
Baader–Meinhof phenomenon
Sampling bias
Ascertainment bias
Self-selection bias
Hawthorne effect
Omitted variable bias
Pygmalion effect
Placebo effect

Research bias affects the validity and reliability of your research findings , leading to false conclusions and a misinterpretation of the truth. This can have serious implications in areas like medical research where, for example, a new form of treatment may be evaluated.

Observer bias occurs when the researcher’s assumptions, views, or preconceptions influence what they see and record in a study, while actor–observer bias refers to situations where respondents attribute internal factors (e.g., bad character) to justify other’s behavior and external factors (difficult circumstances) to justify the same behavior in themselves.

Response bias is a general term used to describe a number of different conditions or factors that cue respondents to provide inaccurate or false answers during surveys or interviews. These factors range from the interviewer’s perceived social position or appearance to the the phrasing of questions in surveys.

Nonresponse bias occurs when the people who complete a survey are different from those who did not, in ways that are relevant to the research topic. Nonresponse can happen because people are either not willing or not able to participate.

Is this article helpful?

Other students also liked.

Attrition Bias | Examples, Explanation, Prevention
Observer Bias | Definition, Examples, Prevention
What Is Social Desirability Bias? | Definition & Examples

research type analysis

IMAGES

VIDEO

COMMENTS

Data Analysis in Research: Types & Methods

Why analyze data in research?

Data analysis in qualitative research

Data analysis in quantitative research

Phase I: Data Validation

Phase II: Data Editing

Phase III: Data Coding

Descriptive statistics

Measures of Frequency

Measures of Central Tendency

Measures of Dispersion or Variation

Measures of Position

Inferential statistics

MORE LIKE THIS

21 Best Contact Center Experience Software in 2024

Government Customer Experience: Impact on Government Service

Employee Engagement App: Top 11 For Workforce Improvement

Top 15 Employee Evaluation Software to Enhance Performance

8 Types of Data Analysis

What Are the Different Types of Data Analysis?

Two Camps of Data Analysis

Types of Data Analysis

1. Descriptive Analysis

Descriptive Analysis Example

2. Diagnostic Analysis

Diagnostic Analysis Example

3. Exploratory Analysis (EDA)

Exploratory Analysis Example

4. Inferential Analysis

Inferential Analysis Example

5. Predictive Analysis

Predictive Analysis Example

6. Causal Analysis

Causal Analysis Example

7. Mechanistic Analysis

Mechanistic Analysis Example

8. Prescriptive Analysis

Prescriptive Analysis Example

When to Use the Different Types of Data Analysis

Great Companies Need Great People. That's Where We Come In.

Have a language expert improve your writing

Research Methods | Definition, Types, Examples

Table of contents

Qualitative vs quantitative data

Primary vs secondary data

Descriptive vs experimental data

Prevent plagiarism, run a free check.

Qualitative analysis methods

Quantitative analysis methods

Is this article helpful?

Data Analysis: Types, Methods & Techniques (a Complete List)

Descriptive, Diagnostic, Predictive, & Prescriptive Analysis

Tree diagram of Data Analysis Types, Methods, and Techniques

Tree Diagram Explained

Difference between methods and techniques

Data sets: observations and fields

Quantitative Analysis

Qualitative Analysis

Mathematical Analysis

Artificial Intelligence & Machine Learning Analysis

Descriptive Analysis

Diagnostic Analysis

Predictive Analysis

Prescriptive Analysis

Clustering Method

Classification Method

Forecasting Method

Optimization Method

Content Analysis Method

Narrative Analysis Method

Discourse Analysis Method

Framework Analysis Method

Grounded Theory Method

Clustering Technique: K-Means

Regression Technique

Nïave Bayes Technique

Cohorts Technique

Factor Technique

Linear Discriminants Technique

Exponential Smoothing Technique