Domino effect

We are examining the economy indicator today, which has a fascinating web of interconnectedness between different components. For example, I’m examining how the volume of passengers at Logan Airport can provide information about hotel occupancy rates and the overall health of business and tourism travel. Another interesting area of study is how the housing and employment markets interact. While a slow job market can cause a decline in real estate activity, a robust job market frequently drives a strong demand for housing. Major development projects also have a notable impact on local economies, demonstrating the way in which these endeavors can stimulate the housing market and create jobs. This article aims to unravel these economic strands and demonstrate how changes in one industry can spread to other areas, presenting a complete picture of our financial environment.

 

Trends in housing market

This blog examines the housing market with a particular emphasis on the evolution of median home prices. This trip is a reflection of the economy and involves more than just pricing.

The median home price graph that we examined is comparable to a road map, illustrating the highs and lows of the market. Rising prices frequently indicate a robust economy with confident buyers and a strong demand for homes. Conversely, price declines or plateaus may indicate a cooling of the market, perhaps as a result of shifting consumer attitudes or economic difficulties.

However, these tendencies are not isolated. They are entwined with other economic strands such as interest rates, employment rates, and general state of the economy. For example, a strong job market may increase people’s ability to purchase homes, which would raise prices. Similar to how interest rate fluctuations can influence prices, they can also motivate or deter buyers.

It’s interesting to note that we also observed possible seasonal fluctuations in the housing market. Prices may be slightly impacted by periods of increased activity during the year.

It is essential to comprehend these subtleties in housing prices. It provides information on both the real estate market and the overall state of the economy. Buyers, sellers, investors, and policymakers can all benefit greatly from this analysis, which will help them make well-informed decisions in a constantly changing market.

EDA

Using a trend analysis of important economic indicators, we’re going to examine the Boston economy in more detail today. It’s similar to being an economic investigator in that we put together hints to figure out the overall picture.

The unemployment rate, hotel occupancy rates, and median home prices were our three primary points of interest. These all provide us with different insights. The unemployment rate tells us how many people are unemployed, much like a thermometer does for the labor market. It’s excellent news that when this number declines, more people typically have jobs!

We then examined hotel occupancy rate, or how full hotels are. We can see a glimpse of tourism and business travel with this rate. Low occupancy might imply the opposite, but high occupancy frequently indicates more guests and active business activity.

Finally, we investigated the median price of a home. This signal functions somewhat as a window into the housing market. A robust economy may be indicated by rising prices, which can also indicate a high demand for homes. Conversely, a decline in prices or a stagnation of them may indicate a cooling of the market.

We can gauge the state of the economy by examining these patterns.

New dataset: Economic Indicator

Various economic statistics are included in the collection, arranged by month and year. An overview of what each column stands for is given below:

  • Year and Month: The duration of the data, represented by distinct columns for each year and month.
  • The quantity of travelers using Logan Airport is logan_passengers.
    logan_intl_flights: Logan Airport’s international flight count.
  • hotel_occup_rate: Hotels’ rate of occupancy.
  • hotel_avg_daily_rate: The mean daily cost associated with lodging.
  • overall_jobs: the total quantity of employment.
  • The rate of unemployment, or unemployment rate.
  • employee_part_rate: the percentage of people in the labour force.
    pipeline_unit: Details about real estate or development initiatives; may include unit count.
  • pipeline_total_dev_cost: The total development cost for pipeline projects.
  • pipeline_sqft: The total square footage of pipeline development projects.
  • pipeline_const_jobs: The number of pipeline construction jobs created.
  • number_of_foreclosure_petitions: The number of foreclosure petitions.
  • number_of_foreclosure_deeds: The number of foreclosure deeds.
  • med_housing_price: The median price of a home. housing_sales_vol: The number of new housing construction permits issued. The total number of new building permits issued.
  • new-affordable_housing_permits: The number of new affordable housing permits issued.

SARIMA

In the field of time series analysis, the SARIMA model serves as a foundation. SARIMA (Seasonal Autoregressive Integrated Moving Average), an extension of the ARIMA model, adds another level of complexity to forecasting and is especially helpful when handling seasonal data.

A statistical model called SARIMA forecasts subsequent points in a time series. It excels at processing data with seasonal patterns, such as monthly sales data that peaks around holidays or daily temperature variations from season to season. By incorporating seasonality, the model expands on ARIMA and gains greater adaptability.

Parts:

The components of the SARIMA model are as follows: moving average (MA), autoregressive (AR), integrated (I), and seasonal (S).

  • Seasonal: This element captures recurring patterns that recur over a given period and models the seasonality in the data.
  • The model’s autoregressive (AR) component describes how an observation and a certain number of lag observations are related to each other.
  • Integrated (I): For many time series models, it is essential to differentiate the time series to make it stationary.
  • Moving Average (MA): When a moving average model is applied to lagged observations, this component simulates the relationship between an observation and a residual error.

“Demystifying Time Series Analysis: A Guide to Forecasting and Pattern Recognition”

A crucial component of data science is time series analysis, which looks at collections of data points accumulated throughout time in sequence. This technique is essential for forecasting future trends based on historical data in various sectors, including meteorology and economics. This blog aims to make time series analysis more approachable for novices while maintaining its technical foundation.

The study of data points gathered at various times is the focus of time series analysis. It forecasts future trends, finds patterns, and extracts useful statistics. Numerous fields, including weather forecasting, market trends prediction, and strategic business planning, depend on this study.

Relevant Ideas:

The ability to identify long-term movement, seasonality (the ability to identify patterns or cycles), noise (the ability to distinguish random variability), and stationarity (the assumption that statistical properties stay constant over time) are all crucial concepts.

 

Making Data Decisions for Project 3: The Search for Perceptive Analysis

As we move forward with Project 3, we have an abundance of choices because the Analyze Boston website has 246 datasets available. Right now, our team is working to determine which of these options best fits the goals of our project. This selection procedure is essential since it establishes the framework for our analysis that follows. Once a dataset has been chosen, our attention will turn to carefully going over each of its details in order to find a compelling and obvious question that arises from the data. Our analysis will be built around this question, which will help us discover fresh perspectives. Our project is in an exciting phase right now, full of possibilities for exploration as well as obstacles to overcome.

“Understanding the Pros and Cons of Decision Trees in Data Analysis”

I have gained knowledge about decision trees in today’s class. In essence, decision trees are graphical depictions of decision-making procedures. Consider them as a sequence of inquiries and decisions that culminate in a decision. You start with the first question on the tree, and as you respond to each one, you move down the branches until you reach the final choice.

Choosing the most instructive questions to pose at each decision tree node is a necessary step in the construction process. Based on different characteristics of the data, these questions are chosen using statistical measures such as entropy, Gini impurity, and information gain. The objective is to choose the most pertinent attributes at each node in order to optimize the decision-making process.

Decision trees do have certain drawbacks, though, particularly in situations where the data shows a significant spread or departure from the mean. In our most recent Project 2, we came across a dataset where the mean was significantly off from the majority of data points, which reduced the effectiveness of the decision tree method. This emphasizes how crucial it is to take the distribution and features of the data into account when selecting the best statistical method for analysis. Although decision trees are a useful tool, their effectiveness depends on the type of data they are used on. In certain cases, other statistical techniques may be more appropriate for handling these kinds of scenarios.

exploring kmeans and dbscan

A clustering technique called K-means seeks to divide a set of data points into a predetermined number of groups, or “clusters.” The first step of the procedure is to choose “k” beginning points, or “centroids,” at random. The closest centroid is then allocated to each data point, and new centroids are recalculated using the cluster average of all the points. Until the centroids no longer vary noticeably, recalculating the centroids and allocating points to the nearest centroid is repeated. As a result, there exist “k” clusters, or groups of data points closer to one another than they are to points in other clusters. The number “k,” which denotes the desired number of clusters, must be entered by the user beforehand.

Data points are grouped using the DBSCAN clustering method according to their density and proximity. Unlike k-means, which require the user to choose the number of clusters ahead of time, DBSCAN analyzes the data to identify high-density zones and distinguishes them from sparse areas. Each data point is given a neighbourhood, and if a sufficient number of points are close to one another (signalling high density), they are regarded as belonging to the same cluster. Low-density zones are considered noise since the data points there are not part of any cluster. Because of this, DBSCAN is particularly helpful for handling noisy data and finding clusters of different sizes and shapes.

Possible dangers:

DBSCAN –

  • needs to choose the density parameters.
  • A poor decision may overlook clusters or combine distinct ones.
  • struggles when the density of the clusters is variable.
  • Could potentially label sparse clusters as noise.
  • In high-dimensional data, performance can deteriorate.
  • Measures of distance become less significant
  • Points near two clusters could be given at random.

K-values:

  • The number of clusters must be specified in advance.
  • Making the wrong decision can result in subpar clustering.
  • The ultimate clusters may change after random initialization.
  • might arrive at local optima given starting positions.
  • assumes that clusters are around the same size and spherical.
  • struggles with clusters that are lengthy or asymmetrical in form.
  • prone to outlier distortion, which might affect the centroids of clusters

 

“Exploring the Interplay of Age, Race, and Threat Levels in Relation to Mental Illness in Fatal Police Shootings: A Statistical Analysis”

We have explored the connections between factors such as age, ethnicity, and perceived danger levels in relation to signs of mental illness in our investigation of fatal police shootings.

Age and Mental Health: The investigation has shown a significant correlation between a person’s age and mental health markers’ existence. A substantial age difference was found using a t-test between participants who showed indicators of mental health and those who did not; the t-statistic was 8.51 and the p-value was nearly zero. This result emphasizes how age and mental health problems are strongly correlated in these cases.

Ethnicity and Mental Health: We first encountered data issues while examining ethnicity, but they were later resolved, and a chi-square test was carried out. The results showed a significant relationship between mental health symptoms and ethnicity, with a tiny p-value of 3.98×10^-35 and a chi-square value of 171.23.

danger perception and mental health: A chi-square statistic of 24.48 and a p-value of 4.82×10^-6 indicate that there is a significant correlation between the perceived danger level and mental health indicators in our study.

In incidents of lethal police contact, the research has illuminated the strong relationships between age, ethnicity, perceived danger level, and mental health markers. These results pave the way for more thorough studies and improve our understanding of these pivotal moments. The distribution of threat levels and their interactions with other variables will be the main focus of our upcoming study phase.