ITECH1103 Big Data and Analytics Assignment Sample
IT - Report
You will use an analytical tool (i.e. WEKA) to explore, analyse and visualize a dataset of your choosing. An important part of this work is preparing a good quality report, which details your choices, content, and analysis, and that is of an appropriate style.
The dataset should be chosen from the following repository:
UC Irvine Machine Learning Repository https://archive.ics.uci.edu/ml/index.php
The aim is to use the data set allocated to provide interesting insights, trends and patterns amongst the data. Your intended audience is the CEO and middle management of the Company for whom you are employed, and who have tasked you with this analysis.
Task 1 – Data choice. Choose any dataset from the repository that has at least five attributes, and for which the default task is classification. Transform this dataset into the ARFF format required by WEKA.
Task 2 – Background information. Write a description of the dataset and project, and its importance for the organization. Provide an overview of what the dataset is about, including from where and how it has been gathered, and for what purpose. Discuss the main benefits of using data mining to explore datasets such as this. This discussion should be suitable for a general audience. Information must come from at least two appropriate sources be appropriately referenced.
Task 3 – Data description. Describe how many instances does the dataset contain, how many attributes there are in the dataset, their names, and include which is the class attribute. Include in your description details of any missing values, and any other relevant characteristics. For at least 5 attributes, describe what is the range of possible values of the attributes, and visualise these in a graphical format.
Task 4 – Data preprocessing. Preprocess the dataset attributes using WEKA's filters. Useful techniques will include remove certain attributes, exploring different ways of discretizing continuous attributes and replacing missing values. Discretizing is the conversion of numeric attributes into "nominal" ones by binning numeric values into intervals 2 . Missing values in ARFF files are represented with the character "?" 3 . If you replaced missing values explain what strategy you used to select a replacement of the missing values. Use and describe at least three different preprocessing techniques.
Task 5 – Data mining. Compare and contrast at least three different data mining algorithms on your data, for instance:. k-nearest neighbour, Apriori association rules, decision tree induction. For each experiment you ran describe: the data you used for the experiments, that is, did you use the entire dataset of just a subset of it. You must include screenshots and results from the techniques you employ.
Task 6 – Discussion of findings. Explain your results and include the usefulness of the approaches for the purpose of the analysis. Include any assumptions that you may have made about the analysis. In this discussion you should explain what each algorithm provides to the overall analysis task. Summarize your main findings.
Task 7 – Report writing. Present your work in the form of an analytics report.
In order to perform the brief analysis on the selected data set, it is important to make the data suitable for analysis using WEKA tool. This tool for assignment help support only ARFF format of data and arff viewer in this tool helps to view and transfer data set into appropriate form of data (Bharati, Rahman & Podder, 2018). In this task, a data set has been downloaded which is related to the phone call campaign of Portuguese banking institution. The data set was initially in the csv format which is imported into the analytical platform. In the below figure, the csv file has been shown which was initially imported into the analytical tool. Based on the tool requirement, this data file need to be transformed into arff format.
Figure 1: Original csv data
After importing the csv file into the arff viewer, the file has been saved as arff data format to convert the data set type.
Figure 2: Transformed ARFF data
In the above figure, it has been seen that the data set has been transformed into arff format and all the attributes of the data frame is present into the dataset. The data set conversion has been successfully done to do further analysis on the selected data set.
In this project, a data set has been chosen which is related to the bank customer details. Data has been generated after getting information from clients through phone calls to predict if they are interested to invest into term policies. There are several attributes are available into the data frame. Here, the data types and other information related to the client are given. On the other hand, a brief analysis on the data frame could be conducted into the WEKA data mining tool. There are different analytical algorithms have been associated that could be utilized to classify data by considering a particular class variable. The data set has been downloaded from UCI machine learning repository. A large number of data sets are available into this website that consist of variety of topics and categories. The data set is mainly about the client details and based on the attributes present into the data frame, all the necessary analysis could be done.
In order to get proper insights and findings from the analysis, clients could be classified into two categories. Persons who are interested to invest in term policies and who are not interested in investment are the two major categories. In this project, a bank market analysis will be done to get knowledge on the investment pattern by the clients. The project is mainly focused on the data analysis by using WEKA analytical tool. Based on the given data and statistic, several essential insights could be extracted by using the analytical platform. Here, the Portuguese banking institute will be able to make crucial decision on the client engagement and investment issues. Based on the client data analysis, attractive offers and terms could be given to the potential investors. On the other hand, a complete statistics of the pervious campaign and their output could also be achieved by the analysts. In order to enhance the subscription rate into the organization, this analysis will help the organization through statistical analysis. All the major findings will be documented into this report.
In order to fulfil the project aim and objectives, WEKA data mining tool will be used that gives major features to pre-process and analysis data set. There are several benefits could be achieved by the users by using the data mining tool. Based on the data types and project objective, data mining tools could be used for multiple purposes. Business managers can obtain information from a variety of reputable sources using data mining tools and methodologies. After conducting a brief study on the selected large dataset, industry professionals can gain a number of important observations. On the other side, with analytics solutions like WEKA, a significant volume of data and information may be readily handled and controlled. Furthermore, the policy maker could make a variety of critical judgments after evaluating the information utilizing data mining methods, which could lead to positive outcomes business expansion.
A bank campaign data has been selected in this project and it will be analyzed to get vital information on the clients. It is important to understand the data set properly to make better analysis on the analytical platform. The data set consist all the major attributes related to the clients including age, marital status, loan details, campaign details, campaign outcome and some other indexes. However, all the attributes are categorized into client data, campaign data, social and economic attributes. All the attributes could be pre-processed to conduct the analysis after considering the class attribute and categories. The last attribute of the data frame is desired_target that will be considered as the target class. If the clients are interested to invest in the bank on term policies is the main focus of the entire analysis. Here, a major concern of the project is to make a brief analysis on the given data frame. On the other hand, some data pre-processing will also be done to prepare the data frame suitable for the analysis. There are several filtering features have been given by the analytical platform.
There are 19 attributes are available into the data frame and all the necessary attributes will be considered for this analysis. Here, five major attributes have been evaluated based on the importance for this analysis:
• Job: This is a major attribute that gives overview on the work filed of the client. Income level of the client could also be assumed through the job profile which could play a vital role on investments strategy of the client. In terms of business perspective, job profile of the client could play a better role to customize offers and policies of the client.
• Marital status: Based on the marital status of the client, investment tendency could be assumed. For each relationship, different financial investments are sometime done by the consumers. One the other hand, expenses of the clients also varies on the relationship status.
• Loan: Previous loan consumption or financial transaction of the client should also be considered by the business analysts of the banking institutes. This attribute tells if the client has consumed any previous loans or not.
• Poutcome: This feature is another vital thing that should also be considered to predict if the client is interested to invest or not. After each campaign, output is considered as success or failure. This would play a vital role in this analysis.
• Cons.conf.idx: On the other hand, consumer confidence index is another essential aspect that must be analyzed to predict if the client in interested in investment or not.
The above five attributes are the most essential aspects that must be analyzed to get insights on the campaign data and its possibilities. However, the campaign strategy could be changed based on the previous results and outcomes.
Data pre-processing is the primary stage that must be performed by the analysts to prepare data suitable for the analysis. During the analysis, some of the major issues are faced that must be mitigated by using different pre-processing techniques. Data cleaning, transformation and other some other operations are performed during the data pre-processing. In this task, different data processing steps have been followed to make the data frame suitable for the analysis.
Removing attribute from the data frame
Figure 3: Unnecessary attributes removed
In the above figure, two attributes including euribor3m and nr.employes have been removed from the data frame. These are the two attributes that will not provide any vital insights on campaign data.
Figure 4: Discretizing attributes
In the above figure, four attributes have been selected that have been transformed from numeric to nominal data type. This will make the analysis easier by selecting the class data type. On the other hand, the selected analytical tool is not comfortable with the numeric data types and it gives betted visualization on the nominal values.
Removing duplicated values
Duplicated values gives wrong analytical result on the data frame. For this reason, it is important to remove the duplicated values from the data frame. In the below figure, all the attributes have been selected and then a filters has been applied to remove duplicated values from the data frame.
Figure 5: Removing duplicated values
After removing duplicated values, it has been seen that the count of each columns or categories have been reduced. After removing duplicated value, only distinct type data are present into the data frame.
These three data pre-processing steps have been introduced in this project to make the data set appropriate for the analysis. After preparing data with some pre-processing steps, all the necessary analysis and insights have been built.
There are several data mining techniques are there that could be introduced into the data set to get proper insights and visualization on current business operations and activities. Based on the business requirement, classification algorithm could be implemented into the data frame. On the other hand, a brief analysis on the given problem context could be introduced by the users after successfully implementing the algorithms. In this task, three different algorithms have been selected and executed on the data frame.
Random Forest algorithm
Random Forest algorithm is a major type of classification algorithm that take decision on the given data set by classifying data frame into different categories. By selection random data from the data frame, decision tree is created and then based on the accuracy of each branch, decision tree gives result. In this project, decision tree has been implemented into the data frame to classify clients into potential subscribers or non-subscribers. In order to improve the accuracy of the model, average of sub samples are calculated by this algorithm.
In the above figure, a random forest algorithm has been executed on the campaign data set in order to classify the clients. All the attributes have been included in this execution and based on the cross-validation, 10 folds have been tested.
Figure 7: Output of RandomForest model
After implementing the classification algorithm, a complete statistic of the model performance have been given in the above figure. All the necessary parameters have been given in the given statistic. The developed model given about 85% of accuracy. The model has been built within 5.66 seconds. A confusion matrix on the selected model has also been created that classifies the data frame into two categories.
Naive Bayes algorithm
Naïve Bayes classification algorithm is a supervised algorithm that is simple and effective for the classification and prediction of a particular feature. It's termed Nave because it considers that the appearance of one feature is unrelated to the appearance of others. As a result, each aspect helps to identifying that it is a fetaure without relying on the others. In order to classify the identified features from the data frame, the naïve bayes classifier algorithm will set some pre-defined identifications (Hawari & Sinaga, 2019). However, this algorithm has some independences variable on the given data frame and prediction of feature is made with some probabilistic assumptions.
Figure 8: Naïve Bayes classifier
In the above figure, a naive bayse algorithm has been executed into the given data frame. The data frame has been classified into two categories that are yes and no. 10 fold cross validation has been selected as testing option. Based on some particular features, this model will classify the data frame into class variables.
Figure 9: Output of Naive Bayes model
After the implementation of the naive bayes algorithm, the above statistic has been achieved that shows all the essential parameters of the model. However, this model is able to classify the data frame with more than 83% of accuracy. Here, the confusion matrix has also been demonstrated that gives overview on the classification capability of the model.
In this model, new data are classified after checking the similarity with the previous data. Knn algorithm can easily classify the given data into categories based in the previous data records. Both the regression and classification models could be developed by using knn algorithm. Assumption on the underlying data are not triggered in this algorithm. However, the action performed in the training data are not quickly implemented into the test data set.
Figure 10: K-nearest neighbor
In the above figure, a lazy classifies algorithm has been executed into the data frame. The nearest neighbor will be identified based on some per-defined characteristics. Here, 10 fold cross validation process has been used.
Figure 11: Output of K-nearest neighbor
In the above figure, several performance parameters have been illustrated as output of the model. On the other hand, the confusion matrix of the model has also been introduced in the above figure.
Discussion of findings
After conducting brief analysis on the given data set, a number of vital insights have been achieved that have been discussed in this section with proper evidences.
(Figure 12: Age variable vs. desired_target)
In the above figure, it has been seen that the clients with age between 33 to 41 are highly interested to invest in term policies. The rate of investment is decreasing with increase in age of the clients.
Figure 13: Job variable vs. desired_target
On the other hand, clients with job profile in administration have maximum probability of getting subscription or non-subscription.
Figure 14: Marital status variable vs. desired_target
Here, the analysis has been done based on the marital status of the client. This shows that the married persons are highly interested to make investments.
Figure 15: Loan variable vs. desired_target
Those clients who have already taken any loans are interested to make investments. On the other hand, percentage of non-subscribers is lower who have not taken any loan.
Figure 16: Poutcome variable vs. desired_target
However, the output of the phone campaign gives a statistic that most of the campaign have not given any particular result or assumption.
Figure 17: Cons.conf.idx variable vs. desired_target
The confidence index of the consumers is another essential aspect that has also been analyzed in the above figure.
Figure 18: Cons.price.idx variable vs. desired_target
Consumer price index has been illustrated in the above figure. After categorizing the feature in terms of desired_target, some vital insights have been introduced.
Download Samples PDF
- MIS301 Cyber Security Assignment
- SHI104 Sociology of Health and Illness Assignment
- Is AI Taking Over the Work of Financial Analysts
- ECO600 Economics and Finance for Business Assignment
- BSBOPS505 Manage Organisational Customer Service Assignment
- 7069SOH Managing and Planning Resources in Healthcare Organisation Assignment
- CLWM4000 Business and Corporations Law Assignment
- BMP4006 People and Performance Assignment
- EDUC9136 Education Policy Politics and Practice Assignment
- 7113ICT Research for IT Professionals Assignment
- Environmentally Conscious Building Assignment
- ACCM4400 Auditing and Assurance Assignment
- PPMP20011 Contract and Procurement Management Assignment
- External Auditing Process and Its Stages Assignment
- 2105AFE Introduction To Business Law Assignment
- PSY30008 Psychology of Personality Assignment
- HRM202 Human Resource Planning and Development Assignment
- STATS7061 Statistical Analysis Assignment
- ECON1025 Prices and Markets Assignment
- Knowledge and Attitudes of Nursing Students About Pain Management Assignment
Academic Writing Services
- Urgent Assignment Help
- Writing Assignment for University
- College Assignment Help
- SPSS Assignment Help
- HND Assignment Help
- Architecture Assignment Help
- Total Assignment Help
- All Assignment Help
- My Assignment Help
- Student Assignment Help
- Instant Assignment Help
- Cheap Assignment Help
- Global Assignment Help
- Write My Assignment
- Do My Assignment
- Solve My Assignment
- Make My Assignment
- Pay for Assignment Help
- Financial Management Assignment Help
- Business Management Assignment Help
- Management Assignment Help
- Project Management Assignment Help
- Supply Chain Management Assignment Help
- Operations Management Assignment Help
- Risk Management Assignment Help
- Strategic Management Assignment Help
- Logistics Management Assignment Help
- Global Business Strategy Assignment Help
- Consumer Behavior Assignment Help
- MBA Assignment Help
- Portfolio Management Assignment Help
- Change Management Assignment Help
- Hospitality Management Assignment Help
- Healthcare Management Assignment Help
- Investment Management Assignment Help
- Market Analysis Assignment Help
- Corporate Strategy Assignment Help
- Conflict Management Assignment Help
- Marketing Management Assignment Help
- Strategic Marketing Assignment Help
- CRM Assignment Help
- Marketing Research Assignment Help
- Human Resource Assignment Help
- Business Assignment Help
- Business Development Assignment Help
- Business Statistics Assignment Help
- Business Ethics Assignment Help
- 4p of Marketing Assignment Help
- Pricing Strategy Assignment Help
- Finance Assignment Help
- Do My Finance Assignment For Me
- Financial Accounting Assignment Help
- Behavioral Finance Assignment Help
- Finance Planning Assignment Help
- Personal Finance Assignment Help
- Financial Services Assignment Help
- Forex Assignment Help
- Financial Statement Analysis Assignment Help
- Capital Budgeting Assignment Help
- Financial Reporting Assignment Help
- International Finance Assignment Help
- Business Finance Assignment Help
- Corporate Finance Assignment Help
- Accounting Assignment Help
- Managerial Accounting Assignment Help
- Taxation Accounting Assignment Help
- Perdisco Assignment Help
- Solve My Accounting Paper
- Business Accounting Assignment Help
- Cost Accounting Assignment Help
- Taxation Assignment Help
- Activity Based Accounting Assignment Help
- Tax Accounting Assignment Help
- Financial Accounting Theory Assignment Help
Computer Science and IT
- Robotics Assignment Help
- Operating System Assignment Help
- Data mining Assignment Help
- Computer Network Assignment Help
- Database Assignment Help
- IT Management Assignment Help
- Network Topology Assignment Help
- Data Structure Assignment Help
- Business Intelligence Assignment Help
- Data Flow Diagram Assignment Help
- UML Diagram Assignment Help
- R Studio Assignment Help
- Law Assignment Help
- Business Law Assignment Help
- Contract Law Assignment Help
- Tort Law Assignment Help
- Social Media Law Assignment Help
- Criminal Law Assignment Help
- Employment Law Assignment Help
- Taxation Law Assignment Help
- Commercial Law Assignment Help
- Constitutional Law Assignment Help
- Corporate Governance Law Assignment Help
- Environmental Law Assignment Help
- Criminology Assignment Help
- Company Law Assignment Help
- Human Rights Law Assignment Help
- Evidence Law Assignment Help
- Administrative Law Assignment Help
- Enterprise Law Assignment Help
- Migration Law Assignment Help
- Communication Law Assignment Help
- Law and Ethics Assignment Help
- Consumer Law Assignment Help
- Humanities Assignment Help
- Sociology Assignment Help
- Philosophy Assignment Help
- English Assignment Help
- Geography Assignment Help
- Agroecology Assignment Help
- Psychology Assignment Help
- Social Science Assignment Help
- Public Relations Assignment Help
- Political Science Assignment Help
- Mass Communication Assignment Help
- History Assignment Help
- Cookery Assignment Help
- Economics Assignment Help
- Managerial Economics Assignment Help
- Econometrics Assignment Help
- Microeconomics Assignment Help
- Business Economics Assignment Help
- Marketing Plan Assignment Help
- Demand Supply Assignment Help
- Comparative Analysis Assignment Help
- Health Economics Assignment Help
- Macroeconomics Assignment Help
- Political Economics Assignment Help
- International Economics Assignments Help
- Academic Writing Services
- Essay Help
- Essay Writing Help
- Essay Help Online
- Online Custom Essay Help
- Descriptive Essay Help
- Help With MBA Essays
- Essay Writing Service
- Essay Writer For Australia
- Essay Outline Help
- illustration Essay Help
- Response Essay Writing Help
- Professional Essay Writers
- Custom Essay Help
- English Essay Writing Help
- Essay Homework Help
- Literature Essay Help
- Scholarship Essay Help
- Research Essay Help
- History Essay Help
- MBA Essay Help
- Plagiarism Free Essays
- Writing Essay Papers
- Write My Essay Help
- Need Help Writing Essay
- Help Writing Scholarship Essay
- Help Writing a Narrative Essay
- Best Essay Writing Service Canada
- Biology Dissertation Help
- Academic Dissertation Help
- Nursing Dissertation Help
- Dissertation Help Online
- MATLAB Dissertation Help
- Doctoral Dissertation Help
- Geography Dissertation Help
- Architecture Dissertation Help
- Statistics Dissertation Help
- Sociology Dissertation Help
- English Dissertation Help
- Law Dissertation Help
- Dissertation Proofreading Services
- Cheap Dissertation Help
- Dissertation Writing Help
- Marketing Dissertation Help
- Write Case Study For Me
- Business Law Case Study Help
- Civil Law Case Study Help
- Marketing Case Study Help
- Nursing Case Study Help
- Case Study Writing Services
- History Case Study help
- Amazon Case Study Help
- Apple Case Study Help
- Case Study Assignment Help
- ZARA Case Study Assignment Help
- IKEA Case Study Assignment Help
- Zappos Case Study Assignment Help
- Tesla Case Study Assignment Help
- Flipkart Case Study Assignment Help
- Contract Law Case Study Assignments Help
- Business Ethics Case Study Assignment Help
- Nike SWOT Analysis Case Study Assignment Help
- Thesis Writing