When teaching statistics, it is important to let students practice with analysing datasets. Ideally, different students can practice by answering multiple questions, each with their own slightly different dataset. With the dataset parameterisation function in Grasple, this is now possible. In this article, we explain how you can create a linear regression based on a sample of a dataset. 

The example question we created can be found here and has the following introduction:

Notice that above this introduction is a button to download a dataset. This dataset contains only a sample of a complete dataset and is used to create the linear regression output which is shown in the second table of the question:

The data in this table can now be used to create multiple sub-questions to help students practice with the interpretation of these models. Alternatively, you can ask questions about the consequences of adding a new independent variable. All of these questions can be parameterised, providing students multiple changes to study with the same concepts. 

Creating the linear regression table

 So how is this table created? Within the edit-view, the table looks like this:

In this table we show all the relevant data of the regression. This table is parameterised based on sample of an uploaded dataset and shows several parameters which simply round the output of another parameter called regression. The definition of these parameters look like this (not all definitions are shown):

We created a parameters with the name regression, which contains all the information of a linear regression model applied to two of the columns which we selected from a sample of the dataset. Note that this regression parameter has two sub-fields, coeffs and  info, which in turn contain sub-fields with more specific information. 

All sub-fields of a parameter can be inserted into any latex box by selecting them from the drop-down list, just as regular parameters, as you can see on the following screenshot:

Creating the parametrised datasets

The first step is to create the parametrised dataset. Here's how:

  1. Create a parameter and select 'Dataset'
  2. Upload your .csv dataset.  

The original dataset contains 1000 rows of data. At this point in time it is important to keep in mind that any uploaded dataset should:

  • be a CSV file less than 2MB in size, 
  • have the headers of the columns in the first row,
  • contain columns separated with a comma, 
  • the decimal sign for numbers should be a dot ('.'), 
  • should not contain missing values.

The filename is automatically filled in for you. Here that is less important because we do not want to make the complete dataset available for download. All students should get a sampled dataset, which we will create next:

This parameter samples 500 rows from the dataset. Note that we sample less rows than there are available in the original dataset, as at the moment it is not possible to sample more rows than there are available. 

The sampled dataset is made available for download to allow students to play around with the data. The 'download dataset' button will be linked to this sample under the specified filename, in this case 'bike_sharing_sample.csv'. It will contain all the columns of the original dataset, but only the samples rows. 

Select the right columns

The next step is to select the different columns from the sampled dataset. We select three columns and store these columns in three different parameters:

To select a column we need to specify the source of our selection in the first input field. In our case this is the datasetSample parameter. The second selector is used to indicate the rows we want to select, in this case we want to select all rows so we use the ':' range indicator. 

In the last input field we specify the columns we want to select. If there are only two input fields you can add the last input field by clicking the add selector-button. 

In this example we need the sixteenth column for the number of bikes rented out, the fourth column for the day of the year, and the tenth column for the temperature. Note that these indexes are one-based, i.e. if you would want to select the first column you need to specify a '1'. 

In this way we end up with three parameters, each of which contains a column of data from the sampled dataset. With these parameters we can configure our linear regression parameter.

Creating the linear regression

Create a new parameter and select 'Linear Regression'.  

We specify the bikeCount parameter as the dependent variable on the left-hand side, and the two other columns as independent variables on the right-hand side. Note that you can add more columns by pressing the '+'-button after the e-term. 

After saving this template you should see rows of data coming in which look like this: 

You can download both the original dataset as well as the sampled dataset to verify that the sampling works as expected. Moreover, each of the selected columns can be downloaded to verify that they contain the correct data.

Our complete configuration should now look like this:

This configuration can now be extended to, for example, include more independent variables or by adding a correlation analysis between different columns. The second extension can be done by using the Correlation parameter type.

Create a table or have students download the dataset

You can now use the generated parameters to create a table. Or you can just ask students to download their dataset, perform an analysis. 

Then fill in the parameters you want in the 'correct answer' box.
Students can then check their results with the generated results from the linear regression in Grasple. 

Do you have feedback?

At this point we are collecting feedback on this new feature. If you have any questions about or feedback on this functionality, just let us know via the chat or email!

Did this answer your question?