Machine Learning is now offering a plethora of applications which were once thought unachievable. It is important to understand its two main branches – supervised learning and unsupervised learning, to largely classify which can solve your machine learning problems at hand. With that ambition, we want you to deep dive with us on this subject with differentiations between both of them.
In this webinar, you will learn how machine learning, a subset of artificial intelligence, is changing entire industries, including the way we work. We will also discuss present examples and discuss strategies to help you get started.
The following are the answers to the questions that were asked during the live webinar.
Answer 1: If you already use Azure, AWS, Google Cloud, or another cloud provider, or have configured Hadoop/Apache on site, machine learning services can be implemented without requiring significant change to infrastructure. Otherwise, depending on factors such as volume and velocity of data, re-training intervals, desired level of performance, and so on, small to significant infrastructure changes may be required. Additionally, some kinds of machine learning tasks such as image classification or models such as deep neural networks may require specialized hardware.
Answer 2: We assume that you electronically store all your transactions along with buyer information. From that set of data we can learn buying patterns, frequencies, volume and other behaviors of your customers. If some deviation is happening, and it might be possible that it goes unnoticed, as you have thousands of customers, and new customers are coming as well. But by employing machine learning algorithms you can predict next line of actions by your customers, whether they are planning a next purchase or churning out of your business.
Answer 3: Everything you can! Usually, additional data should only be collected if better model performance is desired or if we know that the current training data is not representative enough of the real world. However, there is no guarantee that providing a model with more data or adding new attributes (features) to it will lead to improvements in performance.
Answer 4: First of all, a data scientist normally deals with data objectively and plays with samples of data. Secondly, data resides on the cloud. Many times financial and health care institutions provide their highly secured data, which is masked or anonymized by removing any identifiers.
Answer 5: The answer to this question can take lot of discussion. But with some common assumptions, if you are opting for a cloud-based infrastructure, then most of the time it will be taken care of within your monthly billing. Most of the cloud providers like Amazon, Microsoft Azure all have machine learning component/services that you can consume and pay as per your consumptions. So, ideally you should not invest upfront in hardware infrastructure, just add on the services if you have already opted for the cloud.
Answer 6: Again, it depends. Diversity in terms of features contributes to model performance if the features are of a high quality. Redundant or noisy features have the opposite effect. Therefore, it is often better to work with a limited set of good features than to include a large number of features without a thought about their quality. Diversity in terms of representativeness is always good. It should be ensured that training data accurately depicts the larger population.
Answer 7:This is subjective as well. Size is determined both, by the number of training examples, and the number of features. Many algorithms require that the former must be larger than the latter in order to produce good models. There are methods in machine learning to determine the amount of training data required before performance levels off. Other than that, training data should be adequate enough so it represents the larger population whose phenomena we are attempting to model. This representativeness criteria varies widely depending on factors such as the problem domain, the desired level of model performance, and so on.