-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathoverview.html
525 lines (449 loc) · 29 KB
/
overview.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Smile - ML Overview</title>
<meta name="description" content="Statistical Machine Intelligence and Learning Engine">
<!-- prettify js and CSS -->
<script src="https://cdn.rawgit.com/google/code-prettify/master/loader/run_prettify.js?lang=scala&lang=kotlin&lang=clj"></script>
<style>
.prettyprint ol.linenums > li { list-style-type: decimal; }
</style>
<!-- Bootstrap core CSS -->
<link href="css/cerulean.min.css" rel="stylesheet">
<link href="css/custom.css" rel="stylesheet">
<script src="https://code.jquery.com/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"></script>
<!-- slider -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/owl-carousel/1.3.3/owl.carousel.min.js"></script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/owl-carousel/1.3.3/owl.carousel.css" type="text/css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/owl-carousel/1.3.3/owl.transitions.css" type="text/css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/owl-carousel/1.3.3/owl.theme.min.css" type="text/css" />
<!-- table of contents auto generator -->
<script src="js/toc.js" type="text/javascript"></script>
<!-- styles for pager and table of contents -->
<link rel="stylesheet" href="css/pager.css" type="text/css" />
<link rel="stylesheet" href="css/toc.css" type="text/css" />
<!-- Vega-Lite Embed -->
<script src="https://cdn.jsdelivr.net/npm/vega@5"></script>
<script src="https://cdn.jsdelivr.net/npm/vega-lite@5"></script>
<script src="https://cdn.jsdelivr.net/npm/vega-embed@6"></script>
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-57GD08QCML"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-57GD08QCML');
</script>
<!-- Sidebar and testimonial-slider -->
<script type="text/javascript">
$(document).ready(function(){
// scroll/follow sidebar
// #sidebar is defined in the content snippet
// This script has to be executed after the snippet loaded.
// $.getScript("js/follow-sidebar.js");
$("#testimonial-slider").owlCarousel({
items: 1,
singleItem: true,
pagination: true,
navigation: false,
loop: true,
autoPlay: 10000,
stopOnHover: true,
transitionStyle: "backSlide",
touchDrag: true
});
});
</script>
</head>
<body>
<div class="container" style="max-width: 1200px;">
<header>
<div class="masthead">
<p class="lead">
<a href="index.html">
<img src="images/smile.jpg" style="height:100px; width:auto; vertical-align: bottom; margin-top: 20px; margin-right: 20px;">
<span class="tagline">Smile — Statistical Machine Intelligence and Learning Engine</span>
</a>
</p>
</div>
<nav class="navbar navbar-default" role="navigation">
<!-- Brand and toggle get grouped for better mobile display -->
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse" id="navbar-collapse">
<ul class="nav navbar-nav">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Overview <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="quickstart.html">Quick Start</a></li>
<li><a href="overview.html">What's Machine Learning</a></li>
<li><a href="data.html">Data Processing</a></li>
<li><a href="visualization.html">Data Visualization</a></li>
<li><a href="vegalite.html">Declarative Visualization</a></li>
<li><a href="gallery.html">Gallery</a></li>
<li><a href="faq.html">FAQ</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Supervised Learning <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="classification.html">Classification</a></li>
<li><a href="regression.html">Regression</a></li>
<li><a href="deep-learning.html">Deep Learning</a></li>
<li><a href="feature.html">Feature Engineering</a></li>
<li><a href="validation.html">Model Validation</a></li>
<li><a href="missing-value-imputation.html">Missing Value Imputation</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Unsupervised Learning <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="clustering.html">Clustering</a></li>
<li><a href="vector-quantization.html">Vector Quantization</a></li>
<li><a href="association-rule.html">Association Rule Mining</a></li>
<li><a href="mds.html">Multi-Dimensional Scaling</a></li>
<li><a href="manifold.html">Manifold Learning</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">LLM & NLP <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="llm.html">Large Language Model (LLM)</a></li>
<li><a href="nlp.html">Natural Language Processing (NLP)</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Math <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="linear-algebra.html">Linear Algebra</a></li>
<li><a href="statistics.html">Statistics</a></li>
<li><a href="wavelet.html">Wavelet</a></li>
<li><a href="interpolation.html">Interpolation</a></li>
<li><a href="graph.html">Graph Data Structure</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">API <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="api/java/index.html" target="_blank">Java</a></li>
<li><a href="api/scala/index.html" target="_blank">Scala</a></li>
<li><a href="api/kotlin/index.html" target="_blank">Kotlin</a></li>
<li><a href="api/clojure/index.html" target="_blank">Clojure</a></li>
<li><a href="api/json/index.html" target="_blank">JSON</a></li>
</ul>
</li>
<li><a href="https://mybinder.org/v2/gh/haifengl/smile/notebook?urlpath=lab%2Ftree%2Fshell%2Fsrc%2Funiversal%2Fnotebooks%2Findex.ipynb" target="_blank">Try It Online</a></li>
</ul>
</div>
<!-- /.navbar-collapse -->
</nav>
</header>
<div id="content" class="row">
<div class="col-md-3 col-md-push-9 hidden-xs hidden-sm">
<div id="sidebar">
<div class="sidebar-toc" style="margin-bottom: 20px;">
<p class="toc-header">Contents</p>
<div id="toc"></div>
</div>
<div id="search">
<script>
(function() {
var cx = '010264411143030149390:ajvee_ckdzs';
var gcse = document.createElement('script');
gcse.type = 'text/javascript';
gcse.async = true;
gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
'//cse.google.com/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(gcse, s);
})();
</script>
<gcse:searchbox-only></gcse:searchbox-only>
</div>
</div>
</div>
<div class="col-md-9 col-md-pull-3">
<h1 id="overview-top" class="title">What's Machine Learning</h1>
<p>Machine learning is a type of artificial intelligence that provides computers with the ability
to learn without being explicitly programmed. Machine learning algorithms can make
predictions on data by building a model from example inputs.</p>
<p>A core objective of machine learning is to generalize from its experience.
Generalization is the ability of a learning machine to perform accurately
on new, unseen examples/tasks after having experienced a training data set.
The training examples come from some generally unknown probability distribution
and the learner has to build a general model about this space that enables it
to produce sufficiently accurate predictions in new cases.</p>
<p>Machine learning tasks are typically classified into three broad categories, depending
on the nature of the learning "signal" or "feedback" available to a learning system.</p>
<dl>
<dt>Supervised learning</dt>
<dd><p>The computer is presented with example inputs and their desired outputs,
given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs.</p>
</dd>
<dt>Unsupervised learning</dt>
<dd><p>No labels are given to the learning algorithm, leaving it on its own to find structure in
its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data)
or a means towards an end (feature learning).</p>
</dd>
<dt>Reinforcement learning</dt>
<dd><p>A computer program interacts with a dynamic environment in which it must perform a certain goal,
without a teacher explicitly telling it whether it has come close to
its goal.</p>
</dd>
</dl>
<p>Between supervised and unsupervised learning is semi-supervised learning, where the teacher gives an
incomplete training signal: a training set with some (often many) of the target outputs missing.</p>
<h2 id="features">Features</h2>
<p>A feature is an individual measurable property of a phenomenon being observed.
Features are also called explanatory variables, independent variables, predictors, regressors, etc.
Any attribute could be a feature, but choosing informative, discriminating and
independent features is a crucial step for effective algorithms in machine learning.
Features are usually numeric and a set of numeric features can be conveniently
described by a feature vector. Structural features such as strings, sequences and
graphs are also used in areas such as natural language processing, computational biology, etc.</p>
<p>Feature engineering is the process of using domain knowledge of the data to create features that make
machine learning algorithms work. Feature engineering is fundamental to the application of machine
learning, and is both difficult and expensive. It requires the experimentation of multiple
possibilities and the combination of automated
techniques with the intuition and knowledge of the domain expert.</p>
<p>The initial set of raw features can be redundant and too large to be managed. Therefore,
a preliminary step in many applications consists of selecting a subset of features,
or constructing a new and reduced set of features to facilitate learning, and
to improve generalization and interpretability.</p>
<h2 id="supervised-learning">Supervised Learning</h2>
<p>In supervised learning, each example is a pair consisting of an input object (typically a feature vector)
and a desired output value (also called the response variable or dependent variable).
Supervised learning algorithms try to learn a function (often called hypothesis) from input object to the output value.
By analyzing the training data, it produces an inferred function
(referred as a model), which can be used for mapping new examples.</p>
<p>Supervised learning problems are often solved by optimizating the loss functions that
represent the price paid for inaccuracy of predictions. The risk associated with hypothesis
is then defined as the expectation of the loss function. In general, the risk cannot be computed
because the underlying distribution is unknown. However, we can compute an approximation,
called empirical risk, by averaging the loss function on the training set.</p>
<p>Empirical risk minimization principle states that the learning algorithm should choose
a hypothesis which minimizes the empirical risk.</p>
<p>Batch learning algorithms generate the model by learning on the entire training data set at once.
In contrast, online learning methods update the model with new data in a sequential order.
Online learning is a common technique on big data where
it is computationally infeasible to train over the entire dataset.
It is also used when the data itself is generated over the time.</p>
<p>If the response variable is of category values, supervised learning problems are called classification.
While the response variable is of real values, it is referred as regression.</p>
<h3 id="overfitting">Overfitting</h3>
<p>When a model describes random error or noise instead of the underlying relationship, it is called overfitting.
Overfitting generally occurs when a model is excessively complex, such as having too many parameters
relative to the number of observations. An overfit model will generally have poor generalization
performance, as it can exaggerate minor fluctuations in the data.</p>
<div style="width: 100%; display: inline-block; text-align: center;">
<img src="https://upload.wikimedia.org/wikipedia/commons/1/19/Overfitting.svg" width="480px">
<div class="caption" style="min-width: 480px;">The overfit model in green makes no
errors on the trainning data. But it is over complex and describes random noise.</div>
</div>
<h3 id="model-validation">Model Validation</h3>
<p>To assess if the model be not overfit and can generalize to an independent data set,
out-of-sample evaluation is generally employed. If the model has been estimated over some, but not all,
of the available data, then the model using the estimated parameters can be used to predict the
held-back data.</p>
<p>A popular model validation technique is cross-validation. One round of cross-validation involves
partitioning a sample of data into complementary subsets, performing the analysis on one subset
(called the training set), and validating the analysis on the other subset (called the testing set).
To reduce variability, multiple rounds of cross-validation are performed using different partitions,
and the validation results are averaged over the rounds.</p>
<h3 id="regularization">Regularization</h3>
<p>Regularization refers to a process of introducing additional information in order to prevent overfitting
(or to solve an ill-posed problem). In general, a regularization term, typically a penalty on the complexity of
hypothesis, is introduced to a general loss function with a parameter controlling the importance of
the regularization term. For example, regularization term may be restrictions for smoothness
or bounds on the vector space norm.</p>
<p>Regularization can be used to learn simpler models, induce models to be sparse, introduce group structure
into the learning problem, and more.</p>
<p>A theoretical justification for regularization is that it attempts to impose Occam's razor on the solution.
From a Bayesian point of view, many regularization techniques correspond to imposing certain prior
distributions on model parameters.</p>
<h2 id="unsupervised-learning">Unsupervised Learning</h2>
<p>Unsupervised learning tries to infer a function to describe hidden structure from unlabeled data.
Since the examples given to the learner are unlabeled, there is no error or reward signal
to evaluate a potential solution.</p>
<p>Unsupervised learning is closely related to the problem of density estimation in statistics.
However, unsupervised learning also encompasses many other techniques that seek to summarize
and explain key features of the data.</p>
<h3 id="clustering">Clustering</h3>
<p>Cluster analysis or clustering is the task of grouping a set of objects such that objects
in the same group (called a cluster) are more similar to each other than to those in other groups.</p>
<h3 id="latent-variable-models">Latent Variable Models </h3>
<p>In statistics, latent variables are variables that are not directly observed but are rather inferred
from other observed variables. Mathematical models that aim to explain observed variables in terms
of latent variables are called latent variable models.</p>
<h3 id="association-rules">Association Rules</h3>
<p>Association rule mining is to identify strong and interesting relations between variables in large databases.
Introduced by Rakesh Agrawal et al., a typical use case is to discover regularities between products
in large-scale transaction data recorded by point-of-sale systems in supermarkets. For example,
the rule <code>{onions, potatoes} => {burger meat}</code> found in the sales data of
a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also
buy hamburger meat. Such information can be used as the basis for decisions about marketing activities
(e.g., promotional pricing or product placements).</p>
<h2 id="semi-supervised-learning">Semi-supervised Learning</h2>
<p>The acquisition of labeled data for a learning problem is usually labor-intensive, time-consuming, and
of high cost. On the other hand, acquisition of unlabeled data is relatively inexpensive.
Researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data,
can produce considerable improvement in model accuracy.
Semi-supervised learning is a class of supervised learning tasks and techniques that make use of
both a large amount of unlabeled data and a small amount of labeled data.</p>
<p>In order to make any use of unlabeled data, some relationship to the underlying distribution of
data must exist. Semi-supervised learning algorithms make use of at least one of the following
assumptions:</p>
<dl>
<dt>Continuity assumption</dt>
<dd><p>Points that are close to each other are more likely to share a label.
This is also generally assumed in supervised learning and yields a preference
for geometrically simple decision boundaries. In the case of semi-supervised
learning, the smoothness assumption additionally yields a preference for
decision boundaries in low-density regions, so few points are close to each
other but in different classes.</p>
</dd>
<dt>Cluster assumption</dt>
<dd><p>The data tend to form discrete clusters, and points in the same cluster are more
likely to share a label (although data that shares a label may spread across multiple
clusters). This is a special case of the smoothness assumption and gives rise to
feature learning with clustering algorithms.</p>
</dd>
<dt>Manifold assumption</dt>
<dd><p>The data lie approximately on a manifold of much lower dimension than the input space.
In this case learning the manifold using both the labeled and unlabeled data can avoid
the curse of dimensionality. Then learning can proceed using distances and densities
defined on the manifold. The manifold assumption is practical when high-dimensional data
are generated by some process that may be hard to model directly, but which has only a
few degrees of freedom.</p>
</dd>
</dl>
<h2 id="self-learning">Self-Supervised Learning</h2>
<p>A self-supervised learning model is trained on a task using the data itself to
generate supervisory signals, rather than relying on external labels
provided by humans. Like supervised learning methods, the goal of
self-supervised learning is to generate a classified output from the input.
Meanwhile, it does not require the explicit use of labeled input-output pairs.
Instead, correlations, metadata embedded in the data, or domain knowledge
present in the input are implicitly and autonomously extracted from the data
for training. For example, Transformer, a self-supervised language model,
essentially learns to "fill in the blanks."</p>
<h2 id="GenAI">Generative AI</h2>
<p>Generative AI (GenAI) can produce a wide variety of highly realistic and
complex content, such as images, videos, audio, text, and 3D models by
learning patterns from training data. Transformer is the state-of-the-art
GenAI model architecture in natural language generation. It is based on
the multi-head attention mechanism. Text is converted to numerical
representations called tokens, and each token is converted into a vector
via looking up from a word embedding table. At each layer, each token is
then contextualized within the scope of the context window with other
(unmasked) tokens via a parallel multi-head softmax-based attention mechanism
allowing the signal for key tokens to be amplified and less important tokens
to be diminished. GPTs (Generative pre-trained transformers) are based on
the decoder-only transformer architecture. Each generation of GPT models
is significantly more capable than the previous, due to increased model size
(number of trainable parameters) and larger training data.</p>
<p>Stable Diffusion is a text-to-image deep learning model based on latent
diffusion techniques. Diffusion models are trained with the objective
of removing successive applications of Gaussian noise on training images,
which can be thought of as a sequence of denoising autoencoders.
Stable Diffusion consists of 3 parts: the variational autoencoder (VAE),
U-Net, and an optional text encoder. The VAE encoder compresses the image
from pixel space to a smaller dimensional latent space, capturing a more
fundamental semantic meaning of the image. Gaussian noise is iteratively
applied to the compressed latent representation during forward diffusion.
The U-Net block, composed of a ResNet backbone, denoises the output from
forward diffusion backwards to obtain a latent representation. Finally,
the VAE decoder generates the final image by converting the representation
back into pixel space.</p>
<p>Generative adversarial network (GAN) is another framework for approaching
generative AI. In GAN, two neural networks contest with each other in a game.
The generative network generates candidates while the discriminative network evaluates
them. The contest operates in terms of data distributions. Typically, the generative
network learns to map from a latent space to a data distribution of interest, while
the discriminative network distinguishes candidates produced by the generator from
the true data distribution. The generative network's training objective is to increase
the error rate of the discriminative network.</p>
<p>A known dataset serves as the initial training data for the discriminator.
Typically, the generator is seeded with randomized input that is sampled
from a predefined latent space. Thereafter, candidates synthesized by the
generator are evaluated by the discriminator. Backpropagation is applied
in both networks so that the generator produces better images, while the
discriminator becomes more skilled at flagging synthetic images.</p>
<h2 id="reinforcement-learning">Reinforcement Learning</h2>
<p>Reinforcement learning is about a learning agent interacting with its environment
to achieve a goal. The learning agent has to map situations to actions to maximize
a numerical reward signal. Different from supervised learning, the learner is not
told which actions to take but instead must discover which actions yield the most
reward by trying them. Moreover, actions may affect not only the immediate reward
but also all subsequent rewards. Trial-and-error search and delayed reward are the
most important features of reinforcement learning.</p>
<p>Markov decision processes (MDPs) provide a mathematical framework for modeling decision-making
in situations where outcomes are partly random and partly under the control of a decision maker.
In contrast, deep reinforcement learning uses a deep neural network without explicitly
designing the state space.</p>
<p>The major challenge in reinforcement learning is the tradeoff
between exploration and exploitation. Reinforcement learning focus on
finding a balance between exploration (of uncharted territory) and
exploitation (of current knowledge). To obtain a lot of reward, an agent
must prefer actions that it has tried in the past and found to be effective
in producing reward. But to discover such actions, it has totry actions
that it has not selected before. The agent has to exploit what it has
already experienced in order to obtain reward, but it also has to explore
in order to make better action selections in the future. Reinforcement learning
requires clever exploration mechanisms. Randomly selecting actions, without
reference to an estimated probability distribution, shows poor performance.
The case of (small) finite MDP is relatively well understood. However,
due to the lack of algorithms that scale well with the number of states
(or scale to problems with infinite state spaces), simple exploration
methods are the most practical.</p>
<p>There are four main components in reinforcement learning system: a policy,
a reward signal, a value function, and optionally a model of the
environment. A policy defines the learning agent's way of behaving at
a given time. A reward signal defines the goal of a reinforcement
learning problem. The agent's objective is to maximize the total
reward it receives overthe long run. The reward signal is the primary
basis for altering the policy; if an action selected by the policy is
followed by low reward, then the policy may be changed to select some
other action in that situation in the future. While the reward
is an immediate signal, a value function specifies what is good in the
long run. The value of a state may be regarded as the total amount
of reward an agent can expect to accumulate over the future, starting
from that state. Action choices are made based on value judgments.
We seek actions that bring about states of highest value, not the highest
reward. Unfortunately, it is much harder to determine values than
it is to determine rewards. Rewards are basically given directly
by the environment, but values must be estimated and re-estimated from
the sequences of observations an agent makes over its entire lifetime.
Optionally, some reinforcement learning systems have a model of the
environment. It allows inferences to be made about how the environment
will behave. For example, given a state and action, the model might
predict the result, next state and next reward, which can be used for
planning.</p>
<div id="btnv">
<span class="btn-arrow-left">← </span>
<a class="btn-prev-text" href="quickstart.html" title="Previous Section: Quick Start"><span>Quick Start</span></a>
<a class="btn-next-text" href="data.html" title="Next Section: Data"><span>Data Processing</span></a>
<span class="btn-arrow-right"> →</span>
</div>
</div>
<script type="text/javascript">
$('#toc').toc({exclude: 'h1, h5, h6', context: '', autoId: true, numerate: false});
</script>
</div>
</div>
<a href=https://github.com/haifengl/smile><img style="position: fixed; top: 0; right: 0; border: 0" src=/images/forkme_right_orange.png alt="Fork me on GitHub"></a>
<!-- Place this tag right after the last button or just before your close body tag. -->
<script async defer id="github-bjs" src="https://buttons.github.io/buttons.js"></script>
</body>
</html>