Big Data Analytics: Insights from Deped’s Budgeting | | Philippine News
Home  » Business » Business Tech » Big Data Analytics: Insights from Deped’s Budgeting

Big Data Analytics: Insights from Deped’s Budgeting

The Department of Education has one of the most up to date datasets among the other branches of government in the Philippines. These data can be downloaded from:

Deped’s Budgeting 1

Let us take a look at the Public Schools Maintenance and Other Operating Expenses (MOOE) FY 2016 and the Public schools enrollment SY 2015-2016. What insights can we glean by using descriptive analytics on the dataset?

Deped’s Budgeting 2

From the graph above, Region IV-A Calabarzon(not NCR) gets the highest share of the National Budget from DepEd. The orange bar represents the budget for High School, while the Blue bar represents the budget for Elementary Schools.  This leads one to ask the question of how Deped is actually allocating the budgets?

We can use Big Data analytics to help us figure this out. If we correlate this budget amount with the actual enrollment figures,  can we get an idea of how DepEd allocates the budget on a per School basis?

First for Elementary Schools:

The graph below shows each school as a small circle. The x axis shows the number of students enrolled in the elementary schools (taken from the column ‘Grand Total’) of the dataset and the Y Axis shows the budget allocated for each elementary school. Then we compute a trend line across all the data points.

Deped’s Budgeting 3

We can then compute a trend line:

P-value: < 0.0001
Equation: Amount = 487.356* Students + 77934.2

Then for High Schools, we plot a similar graph with the number of students enrolled in the X axis, and the budget amount for each school on the Y Axis. The line across the graph is the trend line.

Deped’s Budgeting 4

P-value: < 0.0001
Equation: Amount = 1067.4* Students + 133840


Just looking at the line over the two graphs and visually, we can already see that there is a high correlation between the budget a school gets and the number of its enrolled students. A statistical measure of the ‘fit’ is called the P value.

This P value“compares the fit of the entire model to the fit of a model composed solely of the grand mean (the average of data in the data view)”. P values less than 5% is considered good.

In both cases (ES and JHS), the P values are less than 5%. Therefore, there is a high likelihood that DepED uses enrollment numbers as one of the major basis for coming up with the budget for each school. Observant readers will note the outliers in both graphs. These are schools whose budget to student ratio are ‘far’ from the average (the trend line).

We will cover how to quickly identify outliers/anomalies/typo errors in the data in our next article.

Check out my other Big Data Analytics Posts: