The structure of the NTR survey data

ANTR and YNTR surveys

The majority of data collection within NTR consists of survey studies. Data collection in NTR is subdivided in two lines of research: the Young NTR (YNTR, young twins and their families) and the Adult NTR (ANTR, adult twins and their families). This does not mean YNTR and ANTR are separate cohorts. Once young twins turn 16, they (and their families) can join the ANTR and participate as an adult. The main difference between YNTR and ANTR is the method of data collection, and therefore, the structure of the datasets.

ANTR

Data collection in ANTR is straightforward. Every 2-3 years, a new wave of survey research is conducted. Invitations for these surveys are sent to all adult twins and their families, regardless of their age or their role in the family.

Project codes

The ANTR survey projects all have a project code starting with AS. The number in an AS project code typically reflects the number of the wave: AS_1 is the first ANTR survey (in 1991), and AS_14 is the 14th (in 2019). Notable exceptions are:

project codes starting with AS_0 - these are long-running generic surveys sent to new participants when they register, to collect some basic information.
the project AS_COV, which was conducted during the COVID pandemic. It doesn't have a number like the other studies because it ran partly in parallel with AS_14, and under rather different circumstances than the other surveys (which is also reflected in the contents).

Dataset structure

The dataset structure for ANTR surveys is simple: each row is a participant, and each variable is a self-reported questionnaire item.

YNTR

Data collection in YNTR is more complex, due to the fact that YNTR surveys are not conducted in waves, but based on the age of the twins, and the surveys are completed by several different reporters: mothers, fathers, teachers, or the subjects themselves.

Project codes

Most YNTR surveys are sent to parents who are asked report about their young twins/multiples. The parents are invited to participate each time their twins reach a specific age: 1, 2, 3, 5, 7, 10 and 12 years. These data collections have a project code that starts with YS (for YNTR survey), and the number in a YS project code reflects the age of the twins. So YS_1 is the survey sent to parents of twins aged 1, and YS_12 is sent to parents of 12-year old twins.

In addition to mothers and fathers, NTR also invites teachers to report about the twins. These data are collected under the project codes starting with YS_TRF, followed by a number indicating the age of the twins: 5, 7, 10 or 12. Teachers of non-twin siblings are also invited to participate, in which case the subject's age will deviate a bit from this number.

Once YNTR twins turn 14, they are invited to fill in the self-report surveys for adolescents. These surveys have project codes starting with YS_DHBQ, followed by a number indicating the age of the twins: 14, 16 or 18. Note that siblings are also invited but will of course have a different age than the twins.

Changes in content

The age-bound parent and teacher surveys are long-running projects that have been ongoing for decades. While the aim is to keep the content of the surveys the same over the years, this is not always possible. Occasionally, items are removed, added or modified. This means there are many different versions of each survey, and some questionnaire items may only be available for a subset of participants who completed a specific version of the survey.

Dataset structure

The structure of a YNTR dataset depends on the project and the variables. Usually, projects based on parent and teacher reports are organized such that each row represents a child, and the project code, variable names and/or variable labels indicate the reporter. The main ID in the file is that of the child, but there may be an additional ID for the reporter.

The self-report projects (DHBQ) are also organized by child fisnumber, but in this case the child is also the reporter, so there is no separate reporter ID.

Finally, in some cases, parents provide some data about themselves, which may be valuable for research, such as their education level. These data may be organized under the child's ID (labeled, e.g., parental education level) or under the parent's ID (labeled, e.g., education level [self-report]).

Due to the many possibilities, it is essential to always read the variable labels carefully to see whether a variable represents a self-report, or a parent or teacher report.

Overlap between waves

When conducting a longitudinal study, it is of course important to know how much overlap there is in participants between one wave and another. As general rule of thumb, YS studies have more overlap with other YS studies than with AS studies, and vice versa, and the closer together the ages (YNTR) or years of measurement (ANTR), the greater the overlap. To get a better indication, please consult the overlap table. However, exact numbers can only be given based on specific combinations of variables.