This coursework may be slightly difficult to read on my website. For clearer formatting of equations and access to the appendix, you can download the PDF version here.

Research Question 1: Using the Quarterly Labour Force Survey (QLFS) data, estimate and interpret an econometric model of the wage equation, with a specific focus on estimating the gender wage gap and returns to education

Research Question 2: Evaluate the evidence that the gender wage gap changes for different levels of education.

1.1 Introduction and description of the economic model

The wage equation is a foundational model in labour economics used to explore factors which affect wages of individuals in the labour market. Models typically present wages as a function of human capital, that is, as Borjas (2006 p.229) puts it ‘unique set of abilities and acquired skills’ but also demographic characteristics such as gender or race. Typically, we associate higher wages with those with higher levels of education. Additionally, the existence of a gender wage gap is a notable area of discussion. The OECD (2025) defines the gender wage gap as ‘the difference between… earnings of men and of women relative to the… earnings of men’. Hence, in order to investigate this relationship it would be necessary to include variables regarding education and gender in our model. Additionally, Blau and Kahn (2017) suggest that this gender wage gap varies by education. Indeed, while higher education generally leads to higher wages for both genders, the wage premium appears to be larger for men (Blau and Kahn, 2017). Accordingly, we may suspect to find similar results when analysing data from the Quarterly Labour Force Survey (QLFS) i.e the wage gap does exist, and that higher levels of education have greater gender wage gaps.

1.2 Description of your econometric model(s)

The foundation of this model is based on the Mincer (1974) earnings equation in which the equation looks to determine the returns of schooling of an individual.

ln(wage_i) = beta0 + beta1*schooling_i + beta2*experience_i + beta3*experience_i^2 + u_i

For our model we have maintained the core structure of the equation but extended it to analyse gender wage inequality more explicitly , as well as adding additional controls. This allows for a more nuanced understanding of the gender wage gap and the returns to education. Thus, the population regression function we will use in this discussion is as follow: ln(wage_i)=beta 0+beta1*female i+beta2*education_level i+ beta* ( female_i * education _ level_i ) + beta4*potexp_i + beta5*potex_i2 + beta6*pt+beta7*london+u_i. Just as the Mincer equation, we have opted to use a log-level wage equation.

Population regression used:
ln(wage_i) = beta0 + beta1*female_i + beta2*education_level_i
+ beta3*(female_i*education_level_i) + beta4*potexp_i
+ beta5*potexp_i^2 + beta6*pt + beta7*london + u_i

In our case the dependent variable is the natural logarithm of hourly wage. This functional form allows for interpreting coefficients as approximate percentage changes in wages, and is typical in modeling wage determinants (Borjas, 2006, p.13). Since our model aims at investigating the gender wage gap, we have included the binary variable . We maintain an education variable which represented female categorically as through dummy variables which include , education_levelgcsealevel and , in which (no education) is the base category . This enables degreenone comparison of wage premiums across the highest levels of education achieved. Furthermore, and represent potential labour market experience of an potexp potexp2 individual, by their age minus the age when they left education. The choice to include both a linear and a quadratic term captures the concave relationship between experience and wages, reflecting diminishing returns to experience over time. Indeed, Mincer (1958) describes 'earnings [as increasing] with age, up to a point when biological decline begins to affect productivity' (Mincer , 1958, p. 301). Additionally , this model is characterised by the inclusion of interaction terms with gender regarding education and potential experience. In doing so, it will allow the testing of whether these returns differ between males and females and hence the gender wage gap. The rest of variables in this model are self-explanatory; and pt are all binary variables which indicate part-time employment status, and london whether an individual works in London, respectively . And as standard, u_i is the error term capturing unobserved factors.

The choice to omit certain variables were for various reasons. For instance, managerial status was not included because controlling for managerial status could diminish part of the effect of education, leading to biased estimates of education returns. This is since education may lead to managerial status which then improves wages. Other variables did not show significance and or were not included so the model could be more parsimonious.

1.3 Presentation of your estimated model(s) and specification tests

Estimated sample regression (coefficients):
ln(wage_i) = 2.341 + (-0.054)*female_i + 0.143*gcse_i + 0.307*alevel_i
+ 0.543*degree_i + 0.026*potexp_i + (-0.0004)*potexp_i^2
+ (-0.176)*pt_i + 0.179*london_i
+ (-0.069)*(female_i*gcse_i)
+ (-0.152)*(female_i*alevel_i)
+ (-0.094)*(female_i*degree_i) + u_i

After the regression, postestimation tests are run to analyse the robustness of the model. ln(wage_i)=2.341+(-0.054)female i+(0.143)gcse i+(0.307)alevel i+ ( 0 . 543 ) degree i + ( 0 . 026 ) potexp_i + ( - 0 . 0004 ) potexp_i2 + ( - 0 . 176 ) pt_i + (0.179)london_i+(-0.069)(female i*gcse i)+(-0.152)(female i*alevel i) + (-0.094)(female i*degree i)+u_i

Breusch-Pagan Test (See Appendix B)

A visual examination of the residuals of the model against each of the education levels highlights potential heteroscedasticity , as education level increases the variance of the residuals increase (See Chart 1, Appendix B). This makes theoretical sense; those with higher levels of education have greater range in a choice of careers. While we would suspect the presence of heteroscedasticity , the extent of this is highlighted by running a Breush-Pagan test which returns a significant result (See Appendix B). This can be done manually by regressing the squared residuals of the model on all explanatory variables (this is the auxiliary regression), then testing whether the auxiliary regression has explanatory power . The presence of heteroscedasticity violates the key Gauss-Markov assumption that errors have constant variance, and hence OLS is no longer the best linear unbiased estimator as it is no longer efficient, although it is still unbiased and linear . Furthermore with heteroscedasticity , the standard errors computed for the least squares estimators are incorrect, this means hypothesis tests which use these errors may be affected be invalid. To deal with this heteroskedasticity , an attempt was made to apply feasible generalised least squares to improve efficiency , but this approach yielded similar coefficient estimates and did not substantially improve model fit (See Table 1, Appendix B for comparison and method). Additionally , since the variance structure is unknown, the approximating may lead to incorrect weighting and therefore lead to biased and inconsistent coefficient estimates. As a result, robust standard errors were used instead, as they provide valid inference of the model without requiring homoscedasticity.

RESET Test

The Ramsey RESET test was used to detect model misspecification. The test returned a significant result, suggesting that some relevant functional form or variables may be omitted. Additionally, because of the presence of heteroscedasticity, his test was done using robust errors (See Appendix D). While some important variables or nonlinearities may be missing, the core variables such as education, gender and experience are grounded theoretically and attempts to improve the model specification were not found.

Multicollinearity, VIF

While not a specification or a test that affects estimators, variance inflation factors (VIFs) were used to check for multicollinearity. The results (See Appendix F) indicate some moderate multicollinearity in the model, particularly for the variables female, potential experience, and experience squared. However, this is expected since the model includes both a variable and its squared term, and the inclusion of female and its interactions is theoretically important for estimating gender differences. As a result, despite some high VIFs, all variables were retained, and multicollinearity here should not be considered a major issue.

1.4 Statement of the hypotheses to be tested

In order to estimate and interpret the gender wage gap and returns to education using the QLFS data, we test hypotheses related to gender differences in wages and how education impacts these differences. Firstly , we examine whether being female significantly affects wages. Since we are using a model with interactions we need to do a joint significance test with female and all its interactions (See Appendix G).

(Gender has no effect on wages) h0: female =0
(Gender has an effect on wages) h1: female =/= 0

Additionally, to examine whether education level significantly affects log wages we do a joint significance test for education level variables.

(Education level has no effect on wages) h0: gcse = alevel = degree =0
(Education level has an effect on wages) h1: at least one of gcse, alevel, degree =/= 0

Finally in order to assess whether the gender wage gap differs by education level we perform a joint significance test between the gender and education interaction variables.

(The gender gap is the same across education levels) h0: female*gcse)=(female*alevel)=(female*degree)=0
(The gender gap is the different across education levels) h1: at least one of (female*gcse), (female*alevel), (female*degree) =/= 0

In regards to the method, we use joint significance tests to investigate for all the gender pay gap, returns to education and if the pay gap varies through education levels.. These F-tests compare the sum of squared residuals from restricted models against the unrestricted model, assessing whether imposing restrictions significantly worsens model fit. This approach allows us to evaluate the effects and differences in wage structures across gender and education groups.

1.5 Interpretation of your results (See Appendix G for Hypothesis Results)

The estimated log-level wage equation provides insights into the gender wage gap and returns to education. The dependent variable is the natural logarithm of hourly wage, so the coefficient should be interpreted as percentage changes in wages. This is done by exponentiating the coefficient, subtracting one and multiplying by 100. Starting with the intercept we can interpret the base category individual i.e a full-time male worker living in London with no qualifications. They are expected to have a log hourly wage of 2.341 which we can interpret to be an hourly wage of approximately £9.38.

The returns to education are substantial and statistically significant for both genders. Compared to men with no qualifications, men with GCSEs earn 15.4% more, male A-level holders earn 35.9% more, and men with degrees earn 72.1% more. The joint F-test on the education dummies confirms that education significantly affects wages. These findings are consistent with the human capital theory , which suggest increased education leads to higher productivity and thus higher wages (Borjas, 2006).

Women also have positive returns to education, although the education premium for women is not as great. For instance, Women with A-level qualifications earn 18.6% less than, while women with degrees earn 13.8% less compared to men with equivalent education levels. We can calculate these differences by adding the coefficient of female with the desired interaction effect then interpreting as usual. However the interactions with no qualifications and GCSE level are not as significant. Nonetheless, to assess the overall impact of gender, we performed the first joint hypothesis test from section 1.4. The test is also significant returning, meaning that gender has a statistically significant effect on wages overall. For a more nuanced interpretation, the joint significance test of the interaction terms between gender and education is also significant, confirming that the gender wage gap differs across education levels.

Regarding experience, this also has a significant effect as each additional year of potential experience increases wages by 2.6%, and the negative quadratic term confirms diminishing returns over time. Additionally, control variables have expected outcomes with part-time workers earning 16.2% lower, while London workers earn 19.6% more compared to the base level.

To conclude, all genders benefit from education, especially at the A-level and degree level. However, the premium for men is greater than for women, and once again this widens at A-level and degree education stages.

2.1 Discussion and Limitations

While the model provides some understanding into the determinants of wages, there are several limitations that affect the reliability of results. For instance, the model does not necessarily prove causal interpretation of the effect of education on wages. This might be because of omitted variable bias; other factors not included in the model are likely to influence education and earnings but are not collected by the QLFS dataset. Such variables may include motivation, ability or parental education and support. If these variables are correlated with both education and wages, our OLS estimates will be upwardly biased. This is because education is likely endogenous as individuals who pursue more education may possess these unobserved characteristics which influence wages. This could somewhat be dealt with if the data set was given as panel data instead of simply cross-sectional by using a fixed effects model.

Additionally, the use of potential experience (age minus years of education) as a proxy for actual labour market experience introduces some measurement error as people are not constantly employed. For instance, women who may have taken career breaks to bring up their children may be out of work for an extended period of time. Furthermore, the categorisation of GCSE, A-level and degree leaves out other forms of education such as BTECs or other vocational qualifications. People with these qualifications may have greater or more diminished returns on education. Moreover, the gender wage gap may vary for these qualifications.

Furthermore, while heteroscedasticity is accounted for via robust standard errors, the Ramsey RESET test suggests misspecification likely from omitted variables which can bias the coefficients of our results. Consequently, because of this and previous possible limitations we have discussed, we should proceed with caution when interpreting results.

2.2 Endogeneity Issues and Possible Remedies

Perhaps one of the most significant challenges in estimating the return to education is the problem of endogeneity. Endogeneity refers to instances when the explanatory variable in a regression model is correlated with the error term which in turn may lead to upwardly biased estimates For example, those who choose to pursue more education may have unobserved traits, such as ability, motivation, familial support, all which also make them more productive and higher-earning regardless of education. To address this issue, we may attempt to use instrumental variables with two stage least squares regression (2SLS). This allows for valid estimation even in the presence of endogeneity. A valid instrumental variable must be correlated with the endogenous regressor, but uncorrelated with the error term. If we consider just returns to education and wages, this means it must not directly affect wages except through education. For example, Card (1993) uses geographic proximity to a college as an instrument for education. The rationale is that individuals living closer to a college are more likely to pursue higher education, but this proximity is plausibly unrelated to unobserved determinants of wages. Card (1993) uses 2SLS. In the first stage, years of education are regressed on college proximity and controls and in the second stage, wages are regressed on the predicted education values. This isolates exogenous variation in education driven by geographic factors.

Another example can be seen with Angrist & Krueger (1991) who use quarter of birth as an instrument. This utilised the connection between school-starting age laws and compulsory schooling requirements in the U.S. Those born earlier in the year tend to enter school at a younger age and can leave school with fewer years of completed education. Since the quarter of birth is assumed to be mostly random, it provides exogenous variation in education. They similarly use 2SLS to estimate the effect of education on earnings. However Bound, Jaeger, and Baker (1995) suggest it is ‘not obvious that season of birth is unrelated to unobserved factors that affect wage’ (Woolridge, 2019 p.502).

To adapt these approaches for analysing whether the gender wage gap varies across education levels, one might follow their methods and use college proximity or quarter of birth, but with interactions with gender. In 2SLS, this involves predicting both education and the interaction term in the first stage, then including their fitted values in the second-stage wage regression. The methods would provide greater causal inference than our analysis by remedying the problem of endogeneity.

References

Angrist, J.D. and Krueger, A.B., (1991). Does compulsory school attendance affect
schooling and earnings?. Quarterly Journal of Economics, 106(4), pp.979–1014.
Blau, F.D. and Kahn, L.M., (2017). The gender wage gap: Extent, trends, and explanations. Journal of Economic Literature, 55(3), pp.789–865.
Borjas, G.J., (2006). Labor Economics. 3rd ed. Boston: McGraw-Hill.
Card, D., (1993). Using geographic variation in college proximity to estimate the return to schooling. NBER Working Paper No. 4483. Cambridge, MA: National Bureau of Economic Research.
Mincer, J., (1958). Investment in human capital and personal income distribution. Journal of Political Economy, 66(4), pp.281–302.
Mincer, J. (1974). Schooling, Experience and Earnings. New York: National Bureau of Economic Research.
OECD, 2025. Gender wage gap. [Online] OECD. Available at: https://data.oecd.org/earnwage/gender-wage-gap.htm [Accessed 25 May 2025].
Office for National Statistics (ONS), 2024. Quarterly Labour Force Survey (QLFS). UK Data Service.
Wooldridge, J.M., (2019). Introductory Econometrics: A Modern Approach. 7th ed. Boston: Cengage Learning.