Data-analysis 2021-2022
Beschrijving
Take a look at the full description of the programme here
The Data Analysis Programme consists of 19 individual modules, as listed below:
- M1-Getting Started with R Software for Data Analysis (12 hrs - €480)
- M2-Drawing Conclusions from Data: an Introduction (21 hrs - €960)
- M3-Single Cell Seq Data Analysis Boot Camp (15 hrs - €925)
- M4-Getting Started with Python for Data Scientists (17,5 hrs - €600)
- M5-Exploiting Sources of Variation in your Data: the ANOVA Approach (17,5 hrs - €960)
- M6-Leverage your R Skills: Data Wrangling & Plotting with Tidyverse (6 hrs - €240)
- M7-Structural Equation Modelling I + II: Identifying Latent Data Structures (24 hrs - €1440)
- M8-High Dimensional Data Analysis (21 hrs - €1110)
- M9-Getting Started with NVivo for Qualitative Data Analysis (6 hrs - €240)
- M10-Dynamic Report Generation with R Markdown (6 hrs - €240)
- M11-From Prior Belief to Data Driven Evidence: Bayesian Data Analysis in Action (18 hrs - €1110)
- M12-Explaining and Predicting Outcomes with Linear Regression (17,5 hrs - €960)
- M13-Upgrade your Python Skills: Data Wrangling & Plotting (15 hrs - €600)
- M14-Microbiome Data Analysis Boot Camp (15 hrs - €925)
- M15-Mastering R Skills: Selected Topics for Successful Programming (12 hrs - €480)
- M16-From Language to Information: Natural Language Processing (21 hrs - €1110)
- M17-Building Interactive Apps with Shiny© in R (15 hrs - €600)
- M18-Artificial Neural Networks: from the Ground Up (15 hrs - €925)
- M19-Machine Learning with Python (21 hrs - €1320)
Schrijf je hier in voor lessen uit deze cursus
M1-Getting Started with R Software for Data Analysis
This course targets professionals and investigators from diverse areas with little to no R-programming experience who wish to start using R for their data manipulation, data exploration or statistical analysis.
R is a flexible environment for statistical computing and graphics, which is becoming increasingly popular as a tool to get insight in often complex data. While in some ways similar to other programming languages (such as C, Java and Perl), R is particularly suited for data analysis because ready-made functions are available for a wide variety of statistical (classical statistical tests, linear and nonlinear modeling, timeseries analysis, classification, clustering, ...) and graphical techniques.
The base R program can be extended with user-submitted packages, which means new techniques are often implemented in R before being available in other software. This is one of the reasons why R is becoming the de facto standard in certain fields such as bioinformatics (Bioconductor) and financial services.
This course introduces the use of the R environment for the implementation of data management, data exploration, basic statistical analysis and automation of procedures.
It starts with a description of the R GUI, the use of the command line and an overview of basic data structures. The application of standard procedures to import data or to export results to external files will be illustrated.
Creation of new variables, subsetting, merging and stacking of data sets will be covered in the data management section. Exploration of the data by histograms, box plots, scatter plots, summary numbers, correlation coefficients and cross-tabulations will be performed.
Simple statistical procedures that will be covered are:
-comparisons of observed group means (t-test, ANOVA and their non-parametric versions) and proportions
-test for independence in 2-way cross tables and linear regression (focusing on the R-implementation of the statistical methods that are the subject of other modules of the statistics series)
Finally, installing new packages and automation of analysis procedures will also be discussed.
Practical sessions and specific exercises will be provided to allow participants to practice their R skills in interaction with the teacher.
=> More information and enrolment: https://www.ugent.be/we/en/services/ICES/courses/dataanalysis2021-2022/m1.htm
Language: English
Exam: There is no exam connected to this module. Participants receive a certificate of attendance via e-mail at the end of the course.
Dates and times: October 5, 12, 19 and 26, 2021, from 5.30 pm to 9 pm.
Total course length: 12 hrs
Registration fees:
- Private sector: € 480
- Non-profit: € 360; € 310 intern UGent
- Students, job seekers, retirees: € 160; € 140 intern UGent
Inschrijven
M2-Drawing Conclusions from Data: an Introduction
This course will benefit professionals and investigators from diverse areas, research scientists, clinical research associates, investing in data handling and wishing to acquire insight into basic statistical methods or to refresh their knowledge and practice of statistics. The course is open to all. It is necessary to have an understanding of basic algebra (basic rules, solving equations, ...), exponents and square roots.
The first sessions will be dedicated to getting to know the software packages SPSS and R. Participants are encouraged to participate in both parts.
We start with concise graphical and numerical descriptions of data obtained from observational or experimental studies. The most common and frequently used probability distributions of discrete and continuous variables will be presented. Statistical inference draws conclusions about a population based on sampled data. Chance variations are taken into account such that a level of confidence is attached to these conclusions.
The correct use of the t-test will be discussed. Nonparametric methods are considered as a possible alternative in case the requirements of the t-test are not met.
We cover the basic concepts of hypothesis testing for categorical data, including the chi-square test.
Quite often the relationship between two variables, where the outcome of one variable is seen as depending on the value of the other, is the focus of scientific interest.
We will give an introduction to linear regression analysis, where a regression line based on observations obtained in a sample describes this relation.
Hands-on exercises are worked out behind the PC using the R software. If preferred, participants can use SPPS.
=> More information and enrolment: https://www.ugent.be/we/en/services/ICES/courses/dataanalysis2021-2022/m2.htm
Language: English
Exam: Project assignment
Dates and times: November 9, 16, 23 and 30, December 7 and 14, 2021, from 5.30 pm to 9.30 pm
Total course length: 21 hrs.
Registration fees:
- Private sector: € 960
- Non-profit: € 720; € 615 intern UGent
- Students, job seekers, retirees: € 325; € 280 intern UGent
- Exam: €30
Inschrijven
M3-Single Cell Seq Data Analysis Boot Camp
This course is aimed at biologists, bioinformaticians and statisticians interested in analysing single-cell RNA seq datasets. A basic knowledge of R programming and statistics is assumed.
The course will provide a full single-cell RNA-sequencing (scRNA-seq) data analysis pipeline, starting from raw data up to the identification of trajectories / cell types, and corresponding (marker) genes associated with the biological structure in the data. Participants can expect a mix between background theory as taught through slides and hands-on lab sessions where real scRNA-seq data will be analyzed. The course will focus on tools and methods implemented within the R / Bioconductor environment.
The detailed schedule includes:
1. Overview of the course
2. Introduction to single-cell RNA-seq technology: concepts and protocols of bulk and single-cell RNA sequencing; RNA-seq data characteristics; research questions that can be assessed using bulk and single-cell RNA-sequencing.
3. Preprocessing and quality control of scRNA-seq data: Processing raw FASTQ-files (demultiplexing, mapping, barcode identification); quality control (low-quality/dead cells, doublets, empty droplets); The Bioconductor infrastructure for the analysis of scRNA-seq data; Normalization of scRNA-seq data.
4. Dimensionality reduction, clustering and cell type identification: The curse of dimensionality; linear and non-linear dimensionality reduction methods; unsupervised cell type identification through clustering; (semi-)supervised cell type identification.
5. Dataset integration and batch correction.
6. Trajectory inference: dimensionality reduction for trajectory inference; trajectory inference concepts; RNA velocity.
7. Differential expression between cell types, patients, and across/between trajectories.
Language: English
Exam: Project assignment
Dates and times:
November 17 and 24, December 1, 8, and 15, 2021, from 5.30 pm to 9.30 pm
Total course length: 15 hrs.
Registration fees:
- Private sector: € 925
- Non-profit: € 695
- Students, job seekers, retirees: € 310
- Staff UGent € 595
- Students UGent € 265
Inschrijven
M4-Getting Started with Python for Data Scientists
This course targets professionals and investigators from diverse areas with little to no Python-programming experience who wish to start using Python for their data manipulation, data exploration or statistical analysis.The course is open to all interested persons. Knowledge of basic statistical concepts and experience with other programming languages are considered advantages, but not required for learning the Python language.
Python started off as a general-purpose programming language, but in the last decade it has become a popular environment for data science. The reason is that the community of Python users have recently created useful add-on packages which are suitable for data manipulation, preparation, visualization and analysis. This practical course introduces both base Python and the most important packages in a hands-on way with many exercises.
Course content:
-Introduction: Python and the Anaconda distribution
-Data types: numbers, strings, lists, tuples, sets and dictionaries
-Automation: control flow and self-defined functions
-Importing data and exporting results
-Managing data with NumPy and pandas
-Graphs with matplotlib and seaborn
-Statistical analysis with statsmodels
The objective of the course is that you are capable of doing data management, visualization and analysis in Python on your own.
Python is an open-source programming language which you can freely download from https://www.anaconda.com/download/ (i.e. the Anaconda distribution). Python version 3 or higher is recommended.
Language: English
Exam: There is no exam connected to this module. Participants receive a certificate of attendance via e-mail at the end of the course.
Dates and times: December 6, 9, 13, 16 and 20, 2021, from 5.30 pm to 9.30 pm
Total course length: 17,5 hrs.
Registration fees:
- Private sector: € 600
- Non-profit: € 450
- Students, job seekers, retirees: € 205
- UGent staff €385
- UGent students €265
Inschrijven
M5-Exploiting Sources of Variation in your Data: the ANOVA Approach
This course targets professionals and investigators from diverse areas, who need to use statistical methods in the collection and handling of data in their research, in particular for assessing the effect of e.g. different treatments. Participants are expected to have an active knowledge of the basic principles underlying statistical strategies, at a level equivalent to the "Introductory Statistics" course of this program.
Analysis of variance (ANOVA) is a statistical tool used in the comparison of means of a random variable over populations that differ in one or more characteristics (factors), e.g. treatment, age, sex, subject, etc.
First, we cover one-way ANOVA, where only one factor is of concern. Depending on the type of the factor, the conclusions pertain to just those factor levels included in the study (fixed factor model), or to a population of factor levels of which we observed a sample (random effects model).
In two-way and multi-way ANOVA where populations differ in more than one characteristic, the effects of factors are studied simultaneously. This yields information about the main effects of each of the factors as well as about any special joint effects (factorial design).
We also consider nested designs, where each level of a second (mostly random) factor occurs in conjunction with only one level of the first factor. One special challenge in multi-way ANOVA lies in verifying the assumptions that must be satisfied.
In this course we will focus on correct execution of data analysis and understanding its results. We pay attention to expressing these conclusions in a correct and understandable way.
The different methods will be extensively illustrated with examples from scientific studies in a variety of fields.
Exercises are worked out behind PC using the R software. If preferred, participants can use SPSS.
Language: English
Exam: Project assignment
Dates and times: January 11, 18 and 25, February 1 and 8, 2022, from 5.30 pm to 9.30 pm
Total course length: 17,5 hrs.
Registration fees:
- Private sector: € 960
- Non-profit: € 720
- Students, job seekers, retirees: € 325
- Staff UGent € 615
- Students UGent € 275
Inschrijven
M6-Leverage your R Skills: Data Wrangling & Plotting with Tidyverse
This course targets anyone who wants to use R for data processing and needs to produce professional looking graphs and/or summary statistics.The course is open to all interested persons. Basic R skills as provided in Module 1 of this year's program are advised.
Tidyverse is a collection of R-packages used for data wrangling and visualization that share a common design philosophy. The goal of this course is to get you up to speed with the most up-to-date and essential tidyverse tools for data exploration. After attending this course, you’ll have the tools to tackle a wide variety of data wrangling and visualization challenges, using the best parts of R tidyverse.
This course covers the most essential tools from 3 main R tidyverse packages that are frequently used in general data analysis procedure.
Lectures with R code demonstrations are blended with hands-on exercises which allows you to try out the tools you’ve seen in the class under guides.
What you will learn:
-Data transforming and summarizing with dplyr: narrowing in on observations of interest, creating new variables that are functions of existing variables, and calculating a set of summary statistics (like counts or means)
-Data visualization with ggplot2: creating more informative graphs (e.g., scatter plot, bar plot, histogram, smoother/regression line, …) in an elegant and efficient way. Arranging multiple plots on a grid
-Data ingest and tidying with tidyr: storing it in a consistent form that matches the semantics of the dataset with the way it is stored.
-Extra tools for programming: Merging and comparing two datasets based on various matching or filtering criterion. Other useful tools for R programming.
Not included in this course:
-A systematic training guide in basics of R. If you never used R or RStudio before, we highly recommend you to take the ICES course “Introduction to R” which will guide you to be familiar with the R environment for the implementation of data management and exploration tasks.
-Big data. This course focuses on small, in-memory datasets as you can’t tackle big data easily unless you have experience with small data.
-Statistics. Although you will see many basic statistics in this course, the main focus is on R and the tidyverse tools instead of explaining the statistical concepts.
Language: English
Exam: There is no exam connected to this module. Participants receive a certificate of attendance via e-mail at the end of the course.
Dates and times: January 25 and 27, 2022, from 1.30 pm to 4.30 pm
Total course length: 6 hrs.
Registration fees:
- Private sector: € 240
- Non-profit: € 180
- Students, job seekers, retirees: € 80
- UGent staff €155
- UGent students €70
Inschrijven
M7-Structural Equation Modelling I + II: Identifying Latent Data Structures
Part I
This course targets everyone with an interest in testing theories or models that involve relationships between both observed and latent variables. The audience for this course can include both novices with little or no previous experience with SEM, as well as existing users who wish to refresh or update their theoretical and practical understanding of structural equation modeling.
Participants should have a solid understanding of regression analysis and basic statistics (hypothesis testing, p-values, etc.). Some knowledge of exploratory factor analysis (or PCA) is recommended, but not required. Because lavaan is an R package, some experience with R (reading in a dataset, fitting a regression model) is recommended, but not required.
Structural equation modeling (SEM) is a general statistical modeling technique to study the relationships among observed variables. It spans a wide range of multivariate methods including path analysis, mediation analysis, confirmatory factor analysis, growth curve modeling, and many more. Many applications of SEM can be found in the social, economic, behavioral and health sciences, but the technology is increasingly used in disciplines like biology, neuroscience and operation research. SEM is often used to test theories or hypotheses that can be represented by a path diagram. In a path diagram, observed variables are depicted by boxes, while latent variables (hypothetical constructs measured by multiple indicators) are depicted by circles. Hypothesized (possibly causal) effects among these variables are represented by single-headed arrows. If you had ever found yourself drawing a path diagram in order to get a better overview of the complex interrelations among some key variables in your data, this course is for you.
The first day of the course provides an introduction to the theory and application of structural equation modeling. On the second day, we discuss several special topics that are often needed by applied users (handling missing data, nonnormal data, categorical data, longitudinal data, etc.). Hands-on sessions are included in order to ensure that all participants are able to perform the analyses using SEM software. The software used in this course is the open-source R package `lavaan' (see http://lavaan.org).
Part II
This course targets everyone who has had some exposure to either multilevel modeling and/or structural equation modeling, and who wants to deepen their understanding of both the theoretical and practical connection between the two frameworks. The course also targets everyone who wants to better understand the new multilevel SEM framework available in lavaan. Participants should have a solid understanding of regression analysis and basic statistics (hypothesis testing, p-values, etc.). At least some minimal knowledge of multilevel modeling and/or structural equation modeling is recommended. Because lavaan is an R package, some experience with R (reading in a dataset, fitting a regression model) is recommended, but not required.
Hierarchically clustered (multilevel or nested) data are common in most scientific fields, including the medical, biological and social sciences. For example, individuals may be nested within geographical areas, institutions, or companies, the canonical example being students nested within schools. Multilevel data also arise in longitudinal studies where one or several outcomes are measured on several occasions. Another feature of multilevel data is that variables can be measured at any level. For example, we may have collected measures of student outcomes and student characteristics, but we may also have collected variables at the school level.
This course starts with a refresher of multilevel modeling (MLM). We will discuss key concepts of MLM, introduce the linear mixed model, and provide several examples of univariate multilevel regression analysis. All analyses will be done in R, using a variety of packages (nlme, lme4, lavaan). Next, we will discuss the relationship between classic (single-level) regression, multilevel regression, and structural equation modeling (SEM). We will do this both from a theoretical point of view as well as from a software point of view. We will show how and under which conditions (classic, non-multilevel) SEM software can produce identical results as dedicated multilevel (or mixed modeling) software.
On the second day, we will introduce the multilevel SEM framework. We will start from a regression perspective, and gradually proceed from a simple regression analysis, to a two-level regression analysis, towards more complicated (regression) models, exploiting the full power of the multilevel SEM framework. Special attention will be given to multilevel mediation models, and the difference between the latent and manifest covariate approach to represent observed exogenous covariates at the between level. Next, we will take a latent-variable (CFA) perspective, and discuss various examples of multilevel CFA, and eventually multilevel SEM involving latent variables and regressions among latent variables. Here, special attention will be given to the interpretation of the latent variables at both the within and between level, together with a typology of possible approaches. Along the way, we will discuss many practical issues including the role of centering, the treatment of missing and/or non-normal data, and how to deal with categorical data. Finally, we will discuss some alternative approaches to handle clustering in the data in a SEM framework, including the design-based (survey) approach, and the 'wide format' approach.
The main software used in this course is the open-source R package `lavaan' (see http://lavaan.org).
Language: English
Exam: Project assignment optional, only for participants who follow both parts.
Dates and times:
Part I: February 7 and 8, 2022, from 9 am to 12 pm and from 1 pm to 4 pm
Part II: May 23 and 24, 2022, from 9 am to 12 pm and from 1 pm to 4 pm
Total course length: 24 hrs (12+12).
Registration fees:
- Private sector: Part I: € 720, Part II: € 720
- Non-profit: Part I: € 540, Part II: € 540
- Students, job seekers, retirees: Part I: € 245, Part II: € 245
- Staff UGent: Part I: € 460, Part II: € 460
- Students UGent: Part I: € 210, Part II: € 460
Inschrijven
M8-High Dimensional Data Analysis
This course targets professionals and investigators from all areas that are high-dimensional. Course prerequisites are ready at hand knowledge of basic statistics: data exploration and descriptive statistics, statistical modeling, and inference: linear models, confidence intervals, t-tests, F-tests, anova, chi-squared test, such as covered in M2: Introductory Statistics with R (or SPSS), M5: Analysis of Variance with R (or SPSS) and M12: Applied Linear Regression) of this years' course program.
Modern high throughput technologies easily generate data on thousands of variables; e.g. health care data, genomics, chemometrics, environmental monitoring, web logs, movie ratings, …
Conventional statistical methods are no longer suited for effectively analysing such high-dimensional data.
Multivariate statistical methods may be used, but for often the dimensionality of the data set is much larger than the number of (biological) samples.
Modern advances in statistical data analyses allow for the appropriate analysis of such data.
Methods for the analysis of high dimensional data rely heavily on multivariate statistical methods.
Therefore a large part of the course content is devoted to multivariate methods, but with a focus on high dimensional settings and issues.
Multivariate statistical analysis covers many methods. In this course a selection of techniques is covered based on our experience that they are frequently used in industry and research institutes.
The course is taught using case studies with applications from different fields (analytical chemistry, ecology, biotechnology, genomics, …).
Content:
1. Dimension reduction: Singular Value Decomposition (SVD), Principal Component Analysis (PCA), Multidimensional Scaling (MDS) and biplots for dimension-reduced data visualisation
2. Sparse SVD and sparse PCA
3. Prediction with high dimensional predictors: principal component regression; ridge, lasso and elastic net penalised regression methods
4. Classification (prediction of class membership): (penalised) logistic regression and linear discriminant analysis
5. Evaluation of prediction models: sensitivity, specificity, ROC curves, mean squared error, cross validation
6. Clustering
7. Large scale hypotheses testing: FDR, FDR control methods, empirical Bayes (local) FDR control
Language: English
Exam: Project assignment
Dates and times: February 7, 10, 14, 17, 21 and 24, 2022, from 5.30 pm to 9.30 pm
Total course length: 21 hrs
Registration fees:
- Private sector: € 1110
- Non-profit: € 835
- Students, job seekers, retirees: € 375
- Staff UGent € 710
- Students UGent € 320
Inschrijven
M9-Getting Started with NVivo for Qualitative Data Analysis
This module targets young researchers and data analysts who are new to qualitative research and curious about NVivo.There are no course prerequisites for this course.
NVivo is a widely used computer assisted qualitative data analysis software package which provides a potentially useful tool for the management and analysis of qualitative research data. This course is intended as a basic introduction to using NVivo for qualitative data analysis. Whether you are completely new to NVivo or have some previous experience with it, you will find this course both useful and enjoyable. This course blends lectures with hands-on exercises which allows you to try out the tools you've seen in the class under guidance.
What you will learn: At the end of this course you will master the core functionalities to apply the latest version of NVivo (1.0) to your project, including:
-Import - Creating a research project and importing different data formats such as Word documents, PDFs, webpages, audio, video and images into NVivo; classifying data files and managing their classifications
-Organize - Organizing codes, code text and create codes; apply coding stripes and highlights; use cases with classification and attributes; make annotations and memos, create sets and links to files
-Explore - Exploring lexical queries, word frequency and text search; apply code and matrix queries; illustrate with visualizations such as mind maps, concept maps, and coding matrix charts; coordinate team work by applying coding comparison
Not included in this course:
-Theoretical framework of qualitative data analysis - Although this course will introduce some basic concepts of qualitative data analysis it is not a systematic review of the different theories.
-Advanced qualitative methodologies - This course covers only the most salient features of NVivo and does not teach how to analyse qualitative data according to specific qualitative methods or designs, such as thematic analysis, grounded theory, content analysis, discourse analysis etc.
Language: English
Exam: No
Dates and times: February 10, 2022, from 9 am to 12 pm and from 1 pm to 4 pm
Total course length: 6 hrs
Registration fees:
- Private sector: € 240
- Non-profit: € 180
- Students, job seekers, retirees: € 80
- Staff UGent € 155
- Students UGent € 70
M10-Dynamic Report Generation with R Markdown
This module targets anyone who wants to produce professional looking reports using R.Basic knowledge of R is required. Knowledge of tidyverse is helpful.
R offers many first class features for statistics and data science. One of these, is certainly Rmarkdown, that allows seamless integration of analysis (code) and text. This greatly improves reproducibility, reduces copy-paste and others errors and enhances possibilities for automation.
R markdown offers three main types of output: pdf, html and docx. The first session introduces the basic framework, the output-specific possibilities and the bookdown-extension.
The second session explores some general approaches for automation (using self-built templates for report-sections or complete reports) and presents Officedown. The latter is less flexible than Rmarkdown, but offers more options for docx-output.
Language: English
Exam: No
Dates and times: February 11, 2022, from 9 am to 12 pm and from 1 pm to 4 pm
Total course length: 6 hrs
Registration fees:
- Private sector: € 240
- Non-profit: € 180
- Students, job seekers, retirees: € 80
- Staff UGent € 155
- Students UGent € 70
M11-From Prior Belief to Data Driven Evidence: Bayesian Data Analysis in Action
This course targets professionals and investigators from diverse areas who wish to get acquainted with Bayesian techniques to be able to apply them to their practical applications. Participants are expected to have an active knowledge of the basic principles underlying statistical strategies, at a level equivalent to the “Introductory Statistics” course of this program.
Basic knowledge of the statistical programming language R is required.
Recent years have seen a tremendous increase in the use and development of Bayesian methods for academic research. In its wake, more and more companies employing statisticians are valuing the knowledge brought by these approaches. The goal of this course is to give participants a brief and intensive introduction to Bayesian statistics.
Participants will learn how Bayesian inference differs from classical inference and how to interpret its results in a meaningful way. They will acquire the skills to use Bayesian techniques correctly in a range of practical applications.
Topics that will be discussed include the difference between Bayesian and frequentist/classical probability, the likelihood function, choice of prior distributions, conjugate priors, the posterior distribution and methods for summarizing the posterior. In addition, an overview will be given about the most important Markov Chain Monte Carlo Methods that are often used to simulate the posterior distribution. These methods include the Gibbs sampler, Importance sampling, Metropolis-Hastings and the Slice sampler.
Depending on the interest and background of the participants, the Bayesian estimation of one (or more) of the following approaches will be explained and discussed: linear regression, choice models (logit, probit, multinomial), Bayesian hypothesis testing, quantile regression, mixed models, Bayesian variable selection, …
All exercises in this course will use R together with the rjags R-package and the JAGS software. Note that JAGS is very similar (if not identical) to the popular BUGS/winBUGS language for Bayesian modeling.
Language: English
Exam: Yes, optional
Dates and times: February 15 and 22, March 1, 8, 15 and 22, 2022, from 5.30 pm to 9 pm
Total course length: 18 hrs
Registration fees:
- Private sector: € 1110
- Non-profit: € 835
- Students, job seekers, retirees: € 375
- Staff UGent € 710
- Students UGent € 320
Inschrijven
M12-Explaining and Predicting Outcomes with Linear Regression
This course targets professionals and investigators from all areas who are involved in prediction problems or need to model the relationship between a dependent variable and one or more explanatory variables. Participants are expected to have an active knowledge of the basic principles underlying statistical strategies, at a level equivalent to the "Introductory Statistics" course of this program.
Prediction and effect-estimation takes a central place in science and in many business applications. Machine Learning and AI applications are on the rise, but statistical models still perform very well, especially outside the realm of ‘Huge Data’. Additionally they often provide much more insight due to better interpretability.
Different techniques are available for different outcomes (continuous outcomes, binary or categorical outcomes, counts,…), the basics of estimation and interpretation are relatively similar over the different techniques. As such, linear regression (for continuous outcomes) provide a perfect stepping stone to more specialized techniques, while being broadly applicable by themselves.
The first two sessions of this module introduce the conceptual framework of this method using the simple case of a single predictor. Formulas and technicalities are kept to a minimum and the main focus is on interpretation of results and assessing model validity. This includes confidence statements on the predictor effect (hypothesis tests and confidence intervals), using the regression model to predict future results and verification of model assumptions.
In session 3 and 4 we allow for more than one predictor leading to the multiple linear regression model. We focus on either explanation or prediction. How to come to a parsimonious model starting from a large number of predictors will be discussed in detail. In these complex linear models special attention will be given to interpreting individual predictor effects, as they critically depend on other terms in the model and underlying relations between predictors (confounding).
In the last session a more elaborate data analysis is discussed. We touch on problems where linear regression is not appropriate and replaced by related approaches such as generalized linear models and mixed models.
Different features will be illustrated with case examples from the instructors practical experience, and participants are encouraged to bring examples from their own work.
Hands-on exercises are worked out behind the PC using the R software. If preferred, participants can use SPSS.
Language: English
Exam: Optional.
Dates and times: March 1, 8, 15, 22 and 29, 2022, from 5.30 pm to 9.30 pm
Total course length: 17,5 hrs
Registration fees:
- Private sector: € 960
- Non-profit: € 720
- Students, job seekers, retirees: € 325
- Staff UGent € 615
- Students UGent € 275
Inschrijven
M13-Upgrade your Python Skills: Data Wrangling & Plotting
The course is intended for professionals who wish to enhance their general data manipulation and visualization skills in Python, with a specific focus on tabular data. This course is intended for participants that have at least basic programming skills. A basic (scientific) programming course should suffice. For those who have experience in another programming language (e.g. Matlab, R, ...), following a Python tutorial prior to the course is strongly recommended. A good introduction is the ‘Python language introduction’ section of the Scipy lecture notes: https://scipy-lectures.org/intro/language/python_language.html
The handling of data is a recurring task for data analysts. Reading in experimental data, checking its properties, and creating visualisations may become tedious tasks. Hence, increasing the efficiency in this process is beneficial for many professionals handling data. Spreadsheet-based software lacks the ability to properly support this process, due to the lack of automation and repeatability. The usage of a high-level scripting language such as Python is ideal for these tasks.
This course trains participants to use Python effectively to do these tasks. The course focuses on data manipulation and cleaning of tabular data, explorative analysis and visualisation using some important packages such as Pandas, Numpy, Matplotlib and Seaborn.
After setting up the programming environment with the required packages using the conda package manager and an introduction of the Jupyter notebook environment, the data analysis package Pandas and the plotting packages Matplotlib and Seaborn are introduced. Advanced usage of Pandas for different data cleaning and manipulation tasks is taught and the acquired skills will immediately be brought into practice to handle real-world data sets. Applications include time series handling, categorical data, merging data, geospatial data,... The course closes with a discussion on the scientific Python ecosystem and the visualisation landscape learning participants to create interactive charts.
The course does not cover statistics, data mining, machine learning, or predictive modelling. It aims to provide participants the means to effectively tackle commonly encountered data handling tasks in order to increase the overall efficiency. These skills are both useful for data cleaning as well as feature engineering.
All sessions are hands-on in Jupyter notebooks.
Language: English
Exam: There is no exam connected to this module. Participants receive a certificate of attendance via e-mail at the end of the course.
Dates and times: March 7, 10, 14, 17 and 24, 2022, from 5.30 pm to 9 pm
Total course length: 15 hrs.
Registration fees:
- Private sector: € 600
- Non-profit: € 450
- Students, job seekers, retirees: € 205
- Staff UGent € 385
- Students UGent € 175
Inschrijven
M14-Microbiome Data Analysis Boot Camp
This course is intended for scientists who need statistical data analysis for microbiome studies (biologists, statisticians, data scientists, bioinformaticians, …) Participants must have experience with R, and a basic knowledge of sequencing, the microbiome and statistics.
High-throughput sequencing technologies allow easy characterisation of the microbiome, but the data analysis faces many particular issues and difficulties. The data analysis starts with the processing of the raw read counts to turn them into an OTU table. In this process, quality control, filtering and clustering into OTUs are essential steps. Once the OTU count table is ready, the choice of data analysis method depends on the research objectives, but very often a first visual data exploration is performed. Ordination methods, which often originate from ecology, are well suited for this purpose, but new methods tailored to microbiome data behave better for the overdispersed, zero inflated sequencing data. Formal statistical data analysis methods are required for identifying species that are differentially abundant between several conditions; again there is a need for special methods that can deal with overdispersion, zero-inflation, library size variability and potentially with the compositional nature of microbiome data. The data analysis becomes even more elaborated for longitudinal data when studying the evolution of the microbiome over time. These analyses may focus on either individual taxa or on diversity of the microbial community (richness, alpha and beta diversity, ...).
We focus on 16S rRNA amplicon sequencing data. The course starts with a brief overview of the processing of raw reads data into an OTU table (including filtering, trimming and clustering into OTUs). We continue with summarizing, exploring and plotting the high dimensional data with ordination and clustering methods. Next we focus on the estimation of diversity (including eveness, richness, beta diversity) and relative abundances, while spending attention on normalization issues. We discuss several methods for testing for differential abundance and diversity, including methods for longitudinal data analysis.
During the practical exercises we will use R and several packages that will be provided later on.
Language: English
Exam: Project assignment optional, only for participants who follow both parts.
Dates and times: April 4 and 5, 2022 from 9 am to 12 pm and from 1 pm to 4 pm, and April 6, from 9 am to 12 pm
Total course length: 15 hrs
Registration fees:
- Private sector: € 925
- Non-profit: € 695
- Students, job seekers, retirees: € 310
- Staff UGent € 595
- Students UGent € 265
Inschrijven
M15-Mastering R Skills: Selected Topics for Successful Programming
This course is aimed at R users with previous experience who want to optimize their workflow. The tools offered are useful both in research and in more commercial applications.Participants need previous experience with R and RStudio. They should have a good insight in how to work with vectors, matrices, data frames and lists. They also need to have a good basic understanding of the tidyverse data wrangling packages (dplyr and tidyr) and ggplot2. The topics covered in previous IPVW courses on R are considered as known.
R is a powerful and extensive language, and in recent years a lot of useful additions have been made. This also makes it confusing at times to link the different paradigms together and find the most optimal workflow. This course is aimed at giving you more insight in how to work more efficiently with R and RStudio. We focus on some more challenging data types, and give an overview of the necessary building tools to optimize your workflow. The following topics will be covered:
-Working with text: text comparison and editing with help of regular expressions
-Working with dates and times
-Working with files, folders and projects to organise your work
-Automating using functions for standard R, tidyverse and ggplot2
=> More information and enrollment procedures
Language: English
Exam: No.
Dates and times: April 4, 7, 11 and 14, 2022, from 5.30 pm to 9 pm
Total course length: 12 hrs
Registration fees:
- Private sector: € 480
- Non-profit: € 360
- Students, job seekers, retirees: € 160
- Staff UGent € 310
- Students UGent € 140
Inschrijven
M16-From Language to Information: Natural Language Processing
This course is aimed at professionals and investigators from diverse areas, who need to analyze information conveyed by texts. It is of particular interest to researchers, graduate students or postdocs in health-related specialities who need to analyze information conveyed by patient records, scientific publications, social media, etc. Course prerequisites are familiarity with Python programming language. Some knowledge of supervised machine learning is considered a plus.
In many sources of data, relevant information is conveyed by free text: this is the case for instance when analyzing the contents of patient records, scientific publications, social media, etc. Because of the non-formal nature of human language, contrary for instance to programming languages, computer-based extraction of structured information from natural language text is challenged by the high variation in expression and the importance of context for correct interpretation. Natural Language Processing aims to design methods that address these challenges, using human knowledge or data-driven methods. This course aims to bring participants to the level where they can independently perform text classification and extract data from text for further data processing and analysis.
The course provides an introduction to Natural Language Processing, including how to handle language units such as words, phrases, sentences, and additional information such as part-of-speech and syntactic structure. The most common applications of supervised machine learning to text analytics will be introduced, such as text classification, sequence labelling for information extraction, focusing on entity recognition and classification, as well as the creation and use of word embeddings and neural classifiers. The course will take biomedical text as illustration, supported by a short introduction to the representation and processing of biomedical terminology.
Content structure:
-Introduction to Natural Language Processing
-Basic Natural Language Processing tools
-Machine learning for text classification
-Sequence labelling for information extraction
-Biomedical terminology for entity recognition
-Word embeddings and neural classifiers for entity recognition
Language: English
Exam: Project assignment optional, only for participants who follow both parts.
Dates and times: April 11, 12 & 13, 2022, from 9 to 12.15 pm and from 2 pm to 5:15 pm
Total course length: 21 hrs.
Registration fees:
- Private sector: € 1110
- Non-profit: € 835
- Students, job seekers, retirees: € 375
- Staff UGent € 710
- Students UGent € 320
Inschrijven
M17-Building Interactive Apps with Shiny© in R
The course is targeted to experienced R users that want to provide interfaces to their code. This can be researchers, data managers and related professionals from both the academic world and the industry. The participants are expected to be fluent in R, including the use of functions and visualisation using ggplot. The material covered in R intermediate is considered known.
Shiny offers the ability to implement interactive interfaces to R code. The framework gained a lot of traction in recent years, and is used to create a wild variety of dashboards. While many shiny applications run on servers, recent additions make it possible to include interactivity in R notebooks and R Markdown documents, build interfaces into packages or distribute them as github repositories.
In this course we cover the basics of Shiny and reactive programming, the paradigm on which shiny is built. We cover the following topics:
-The basics of shiny applications
-Controlling the layout (front end)
-Reactive programming (back end)
-Deployment of shiny apps in different formats
-Debugging and profiling your code for speed
-Use of modules (if time allows)
Participants will have ample opportunity to practice hands-on the concepts learnt.
Language: English
Exam: No
Dates and times: April 19 and 26, May 3, 10 and 17, 2022, from 5.30 pm to 9 pm
Total course length: 15 hrs
Registration fees:
- Private sector: € 600
- Non-profit: € 450
- Students, job seekers, retirees: € 205
- Staff UGent € 385
- Students UGent € 175
Inschrijven
M18-Artificial Neural Networks: from the Ground Up
This course is aimed at professionals and investigators from diverse areas who want to learn how to apply neural networks on diverse problems, or who want to learn about the possibilities, applicability, and variants of neural networks. Course prerequisites are basic knowledge of the Python programming language.
Since their earliest conception in the 1940s, artificial neural networks have been alternatively regarded as extremely promising machine learning models, capable of learning anything, and as glorified linear combinations, unable to achieve relevant results in practice.
However, along the last decade, the availability of general-purpose GPU architectures and large quantities of data has enabled the rise of deep neural networks, which have attained state-of-the-art performance in many applications, from image classification to text translation. This have given rise to a whole new field of research, ranging from generative models to adversarial attacks (and defenses against them).
This course is intended as a first contact with artificial neural networks, followed by an overview of the different architectures that are currently available:
-Introduction to neurons and neural networks
-Training with backpropagation
-Challenges and solutions to train deep neural networks
-Convolutional networks
-Adversarial examples
-Generative models
°Autoregressive models
°Autoencoders
°Variational autoencoders (VAE)
°Generative adversarial networks (GAN)
-Transformers and BERT
-Recurrent neural networks
The practical sessions use the Python library TensorFlow to implement some of the models discussed in the course, with particular emphasis on how to adapt the networks to the characteristics of a specific problem.
Language: English
Exam: Project assignment optional, only for participants who follow both parts.
Dates and times: April 21 and 28, May 5, 12 and 18, 2022, from 5.30 pm to 9 pm
Total course length: 15 hrs
Registration fees:
- Private sector: € 925
- Non-profit: € 695
- Students, job seekers, retirees: € 310
- Staff UGent € 595
- Students UGent € 265
Inschrijven
M19-Machine Learning with Python
This course targets professionals and investigators from all areas that are involved in predictive modeling based on large and/or high-dimensional databases.Participants are expected to be familiar with basic statistical modeling (as for instance taught in Module 2 of this program), and to have a had a first experience programming in Python (as for instance taught in Module 3 of this program).
Many modern digital applications increasingly rely on machine learning as a means to derive predictive strength from high-dimensional data sets. Compared to traditional statistics, the absence of a focus on scientific hypotheses, and the need for easily leveraging detailed signals in the data require a different set of models, tools, and analytical reflexes.
This course aims to bring participants to the level where they can independently tackle the analytical part of data mining projects. This means that the most common types of projects will be addressed - regression-type with continuous outcomes, classification with categorical outcomes, and clustering. For each of these, the practical use of a set of standard methods will be shown, like Random Forests, Gradient Boosting Machines, Support Vector Machines, k-Nearest-Neighbors, K-means,... Furthermore, throughout the course, concepts will be highlighted that are of concern in every statistical learning applications, like the curse of dimensionality, model capacity, overfitting and regularization, and practical strategies will be offered to deal with them, introducing techniques such as the Lasso and ridge regression, cross-validation, bagging and boosting. Instructions will also be given on a selection of specific techniques that are often of interest, such as modern visualization of high-dimensional data, model calibration, outlier detection using isolation forests, explanation of black-box models,... Finally, the last lecture will introduce the idea of deep learning as a powerful tool for data analysis, discussing when and how to practically use it, and when to shy away from it.
=> More information and enrollment procedures
Language: English
Exam: Project assignment optional, only for participants who follow both parts.
Dates and times: April 25, May 2, 9, 16, 19 and 30, June 13, 2022, from 5.30 pm to 9 pm
Total course length: 21 hrs
Registration fees:
- Private sector: € 1320
- Non-profit: € 990
- Students, job seekers, retirees: € 445
- Staff UGent € 845
- Students UGent € 380