Speech and Language Processing
An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition
Third Edition draft
Daniel Jurafsky
Stanford University
James H. Martin
University of Colorado at Boulder
Copyright ©2020. All rights reserved.
Draft of December 30, 2020. Comments and typos welcome!
Summary of Contents
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Regular Expressions, Text Normalization, Edit Distance . . . . . . . . . 2
3 N-gram Language Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Naive Bayes and Sentiment Classification . . . . . . . . . . . . . . . . . . . . . . . 55
5 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6 Vector Semantics and Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7 Neural Networks and Neural Language Models . . . . . . . . . . . . . . . . . 127
8 Sequence Labeling for Parts of Speech and Named Entities . . . . . . 148
9 Deep Learning Architectures for Sequence Processing . . . . . . . . . . . 173
10 Contextual Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
11 Machine Translation and Encoder-Decoder Models . . . . . . . . . . . . . 203
12 Constituency Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
13 Constituency Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
14 Dependency Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
15 Logical Representations of Sentence Meaning . . . . . . . . . . . . . . . . . . . 305
16 Computational Semantics and Semantic Parsing . . . . . . . . . . . . . . . . 331
17 Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
18 Word Senses and WordNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
19 Semantic Role Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
20 Lexicons for Sentiment, Affect, and Connotation . . . . . . . . . . . . . . . . 393
21 Coreference Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
22 Discourse Coherence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
23 Question Answering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
24 Chatbots & Dialogue Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
25 Phonetics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
26 Automatic Speech Recognition and Text-to-Speech . . . . . . . . . . . . . . 548
Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
Mecha Learn
Tuesday 29 December 2020
Monday 22 April 2019
Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
Part I. The Fundamentals of Machine Learning
1. The Machine Learning Landscape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
What Is Machine Learning? 4
Why Use Machine Learning? 4
Types of Machine Learning Systems 8
Supervised/Unsupervised Learning 8
Batch and Online Learning 15
Instance-Based Versus Model-Based Learning 18
Main Challenges of Machine Learning 24
Insufficient Quantity of Training Data 24
Nonrepresentative Training Data 26
Poor-Quality Data 27
Irrelevant Features 27
Overfitting the Training Data 28
Underfitting the Training Data 30
Stepping Back 30
Testing and Validating 31
Hyperparameter Tuning and Model Selection 32
Data Mismatch 33
Exercises 34
2. End-to-End Machine Learning Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Working with Real Data 38
Look at the Big Picture 39
iii
Frame the Problem 39
Select a Performance Measure 42
Check the Assumptions 45
Get the Data 45
Create the Workspace 45
Download the Data 49
Take a Quick Look at the Data Structure 50
Create a Test Set 54
Discover and Visualize the Data to Gain Insights 58
Visualizing Geographical Data 59
Looking for Correlations 62
Experimenting with Attribute Combinations 65
Prepare the Data for Machine Learning Algorithms 66
Data Cleaning 67
Handling Text and Categorical Attributes 69
Custom Transformers 71
Feature Scaling 72
Transformation Pipelines 73
Select and Train a Model 75
Training and Evaluating on the Training Set 75
Better Evaluation Using Cross-Validation 76
Fine-Tune Your Model 79
Grid Search 79
Randomized Search 81
Ensemble Methods 82
Analyze the Best Models and Their Errors 82
Evaluate Your System on the Test Set 83
Launch, Monitor, and Maintain Your System 84
Try It Out! 85
Exercises 85
3. Classication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
MNIST 87
Training a Binary Classifier 90
Performance Measures 90
Measuring Accuracy Using Cross-Validation 91
Confusion Matrix 92
Precision and Recall 94
Precision/Recall Tradeoff 95
The ROC Curve 99
Multiclass Classification 102
Error Analysis 104
iv | Table of Contents
Multilabel Classification 108
Multioutput Classification 109
Exercises 110
4. Training Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Linear Regression 114
The Normal Equation 116
Computational Complexity 119
Gradient Descent 119
Batch Gradient Descent 123
Stochastic Gradient Descent 126
Mini-batch Gradient Descent 129
Polynomial Regression 130
Learning Curves 132
Regularized Linear Models 136
Ridge Regression 137
Lasso Regression 139
Elastic Net 142
Early Stopping 142
Logistic Regression 144
Estimating Probabilities 144
Training and Cost Function 145
Decision Boundaries 146
Softmax Regression 149
Exercises 153
5. Support Vector Machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Linear SVM Classification 155
Soft Margin Classification 156
Nonlinear SVM Classification 159
Polynomial Kernel 160
Adding Similarity Features 161
Gaussian RBF Kernel 162
Computational Complexity 163
SVM Regression 164
Under the Hood 166
Decision Function and Predictions 166
Training Objective 167
Quadratic Programming 169
The Dual Problem 170
Kernelized SVM 171
Online SVMs 174
Table of Contents | v
Exercises 175
6. Decision Trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Training and Visualizing a Decision Tree 177
Making Predictions 179
Estimating Class Probabilities 181
The CART Training Algorithm 182
Computational Complexity 183
Gini Impurity or Entropy? 183
Regularization Hyperparameters 184
Regression 185
Instability 188
Exercises 189
7. Ensemble Learning and Random Forests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Voting Classifiers 192
Bagging and Pasting 195
Bagging and Pasting in Scikit-Learn 196
Out-of-Bag Evaluation 197
Random Patches and Random Subspaces 198
Random Forests 199
Extra-Trees 200
Feature Importance 200
Boosting 201
AdaBoost 202
Gradient Boosting 205
Stacking 210
Exercises 213
8. Dimensionality Reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
The Curse of Dimensionality 216
Main Approaches for Dimensionality Reduction 218
Projection 218
Manifold Learning 220
PCA 222
Preserving the Variance 222
Principal Components 223
Projecting Down to d Dimensions 224
Using Scikit-Learn 224
Explained Variance Ratio 225
Choosing the Right Number of Dimensions 225
PCA for Compression 226
vi | Table of Contents
Randomized PCA 227
Incremental PCA 227
Kernel PCA 228
Selecting a Kernel and Tuning Hyperparameters 229
LLE 232
Other Dimensionality Reduction Techniques 234
Exercises 235
9. Unsupervised Learning Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Clustering 238
K-Means 240
Limits of K-Means 250
Using clustering for image segmentation 251
Using Clustering for Preprocessing 252
Using Clustering for Semi-Supervised Learning 254
DBSCAN 256
Other Clustering Algorithms 259
Gaussian Mixtures 260
Anomaly Detection using Gaussian Mixtures 266
Selecting the Number of Clusters 267
Bayesian Gaussian Mixture Models 270
Other Anomaly Detection and Novelty Detection Algorithms 274
Part II. Neural Networks and Deep Learning
10. Introduction to Articial Neural Networks with Keras. . . . . . . . . . . . . . . . . . . . . . . . . . 277
From Biological to Artificial Neurons 278
Biological Neurons 279
Logical Computations with Neurons 281
The Perceptron 281
Multi-Layer Perceptron and Backpropagation 286
Regression MLPs 289
Classification MLPs 290
Implementing MLPs with Keras 292
Installing TensorFlow 2 293
Building an Image Classifier Using the Sequential API 294
Building a Regression MLP Using the Sequential API 303
Building Complex Models Using the Functional API 304
Building Dynamic Models Using the Subclassing API 309
Saving and Restoring a Model 311
Using Callbacks 311
Table of Contents | vii
Visualization Using TensorBoard 313
Fine-Tuning Neural Network Hyperparameters 315
Number of Hidden Layers 319
Number of Neurons per Hidden Layer 320
Learning Rate, Batch Size and Other Hyperparameters 320
Exercises 322
11. Training Deep Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Vanishing/Exploding Gradients Problems 326
Glorot and He Initialization 327
Nonsaturating Activation Functions 329
Batch Normalization 333
Gradient Clipping 338
Reusing Pretrained Layers 339
Transfer Learning With Keras 341
Unsupervised Pretraining 343
Pretraining on an Auxiliary Task 344
Faster Optimizers 344
Momentum Optimization 345
Nesterov Accelerated Gradient 346
AdaGrad 347
RMSProp 349
Adam and Nadam Optimization 349
Learning Rate Scheduling 352
Avoiding Overfitting Through Regularization 356
ℓ1
and ℓ2
Regularization 356
Dropout 357
Monte-Carlo (MC) Dropout 360
Max-Norm Regularization 362
Summary and Practical Guidelines 363
Exercises 364
12. Custom Models and Training with TensorFlow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
A Quick Tour of TensorFlow 368
Using TensorFlow like NumPy 371
Tensors and Operations 371
Tensors and NumPy 373
Type Conversions 374
Variables 374
Other Data Structures 375
Customizing Models and Training Algorithms 376
Custom Loss Functions 376
viii | Table of Contents
Saving and Loading Models That Contain Custom Components 377
Custom Activation Functions, Initializers, Regularizers, and Constraints 379
Custom Metrics 380
Custom Layers 383
Custom Models 386
Losses and Metrics Based on Model Internals 388
Computing Gradients Using Autodiff 389
Custom Training Loops 393
TensorFlow Functions and Graphs 396
Autograph and Tracing 398
TF Function Rules 400
13. Loading and Preprocessing Data with TensorFlow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
The Data API 404
Chaining Transformations 405
Shuffling the Data 406
Preprocessing the Data 409
Putting Everything Together 410
Prefetching 411
Using the Dataset With tf.keras 413
The TFRecord Format 414
Compressed TFRecord Files 415
A Brief Introduction to Protocol Buffers 415
TensorFlow Protobufs 416
Loading and Parsing Examples 418
Handling Lists of Lists Using the SequenceExample Protobuf 419
The Features API 420
Categorical Features 421
Crossed Categorical Features 421
Encoding Categorical Features Using One-Hot Vectors 422
Encoding Categorical Features Using Embeddings 423
Using Feature Columns for Parsing 426
Using Feature Columns in Your Models 426
TF Transform 428
The TensorFlow Datasets (TFDS) Project 429
14. Deep Computer Vision Using Convolutional Neural Networks. . . . . . . . . . . . . . . . . . . 431
The Architecture of the Visual Cortex 432
Convolutional Layer 434
Filters 436
Stacking Multiple Feature Maps 437
TensorFlow Implementation 439
Table of Contents | ix
Memory Requirements 441
Pooling Layer 442
TensorFlow Implementation 444
CNN Architectures 446
LeNet-5 449
AlexNet 450
GoogLeNet 452
VGGNet 456
ResNet 457
Xception 459
SENet 461
Implementing a ResNet-34 CNN Using Keras 464
Using Pretrained Models From Keras 465
Pretrained Models for Transfer Learning 467
Classification and Localization 469
Object Detection 471
Fully Convolutional Networks (FCNs) 473
You Only Look Once (YOLO) 475
Semantic Segmentation 478
Exercises 482
https://www.knowledgeisle.com/wp-content/uploads/2019/12/2-Aur%C3%A9lien-G%C3%A9ron-Hands-On-Machine-Learning-with-Scikit-Learn-Keras-and-Tensorflow_-Concepts-Tools-and-Techniques-to-Build-Intelligent-Systems-O%E2%80%99Reilly-Media-2019.pdf
https://www.knowledgeisle.com/wp-content/uploads/2019/12/2-Aur%C3%A9lien-G%C3%A9ron-Hands-On-Machine-Learning-with-Scikit-Learn-Keras-and-Tensorflow_-Concepts-Tools-and-Techniques-to-Build-Intelligent-Systems-O%E2%80%99Reilly-Media-2019.pdf
Monday 31 December 2018
Foundations of Machine Learning
Contents
Preface xiii
1 Introduction 1
1.1 What is machine learning? 1
1.2 What kind of problems can be tackled using machine learning? 2
1.3 Some standard learning tasks 3
1.4 Learning stages 4
1.5 Learning scenarios 6
1.6 Generalization 7
2 The PAC Learning Framework 9
2.1 The PAC learning model 9
2.2 Guarantees for finite hypothesis sets — consistent case 15
2.3 Guarantees for finite hypothesis sets — inconsistent case 19
2.4 Generalities 21
2.4.1 Deterministic versus stochastic scenarios 21
2.4.2 Bayes error and noise 22
2.5 Chapter notes 23
2.6 Exercises 23
3 Rademacher Complexity and VC-Dimension 29
3.1 Rademacher complexity 30
3.2 Growth function 34
3.3 VC-dimension 36
3.4 Lower bounds 43
3.5 Chapter notes 48
3.6 Exercises 50
4 Model Selection 61
4.1 Estimation and approximation errors 61
4.2 Empirical risk minimization (ERM) 62
4.3 Structural risk minimization (SRM) 64
vi Contents
4.4 Cross-validation 68
4.5 n-Fold cross-validation 71
4.6 Regularization-based algorithms 72
4.7 Convex surrogate losses 73
4.8 Chapter notes 77
4.9 Exercises 78
5 Support Vector Machines 79
5.1 Linear classification 79
5.2 Separable case 80
5.2.1 Primal optimization problem 81
5.2.2 Support vectors 83
5.2.3 Dual optimization problem 83
5.2.4 Leave-one-out analysis 85
5.3 Non-separable case 87
5.3.1 Primal optimization problem 88
5.3.2 Support vectors 89
5.3.3 Dual optimization problem 90
5.4 Margin theory 91
5.5 Chapter notes 100
5.6 Exercises 100
6 Kernel Methods 105
6.1 Introduction 105
6.2 Positive definite symmetric kernels 108
6.2.1 Definitions 108
6.2.2 Reproducing kernel Hilbert space 110
6.2.3 Properties 112
6.3 Kernel-based algorithms 116
6.3.1 SVMs with PDS kernels 116
6.3.2 Representer theorem 117
6.3.3 Learning guarantees 117
6.4 Negative definite symmetric kernels 119
6.5 Sequence kernels 121
6.5.1 Weighted transducers 122
6.5.2 Rational kernels 126
6.6 Approximate kernel feature maps 130
6.7 Chapter notes 135
6.8 Exercises 137
7 Boosting 145
7.1 Introduction 145
7.2 AdaBoost 146
7.2.1 Bound on the empirical error 149
7.2.2 Relationship with coordinate descent 150
7.2.3 Practical use 154
Contents vii
7.3 Theoretical results 154
7.3.1 VC-dimension-based analysis 154
7.3.2 L1-geometric margin 155
7.3.3 Margin-based analysis 157
7.3.4 Margin maximization 161
7.3.5 Game-theoretic interpretation 162
7.4 L1-regularization 165
7.5 Discussion 167
7.6 Chapter notes 168
7.7 Exercises 170
8 On-Line Learning 177
8.1 Introduction 178
8.2 Prediction with expert advice 178
8.2.1 Mistake bounds and Halving algorithm 179
8.2.2 Weighted majority algorithm 181
8.2.3 Randomized weighted majority algorithm 183
8.2.4 Exponential weighted average algorithm 186
8.3 Linear classification 190
8.3.1 Perceptron algorithm 190
8.3.2 Winnow algorithm 198
8.4 On-line to batch conversion 201
8.5 Game-theoretic connection 204
8.6 Chapter notes 205
8.7 Exercises 206
9 Multi-Class Classification 213
9.1 Multi-class classification problem 213
9.2 Generalization bounds 215
9.3 Uncombined multi-class algorithms 221
9.3.1 Multi-class SVMs 221
9.3.2 Multi-class boosting algorithms 222
9.3.3 Decision trees 224
9.4 Aggregated multi-class algorithms 228
9.4.1 One-versus-all 229
9.4.2 One-versus-one 229
9.4.3 Error-correcting output codes 231
9.5 Structured prediction algorithms 233
9.6 Chapter notes 235
9.7 Exercises 237
10 Ranking 239
10.1 The problem of ranking 240
10.2 Generalization bound 241
10.3 Ranking with SVMs 243
viii Contents
10.4 RankBoost 244
10.4.1 Bound on the empirical error 246
10.4.2 Relationship with coordinate descent 248
10.4.3 Margin bound for ensemble methods in ranking 250
10.5 Bipartite ranking 251
10.5.1 Boosting in bipartite ranking 252
10.5.2 Area under the ROC curve 255
10.6 Preference-based setting 257
10.6.1 Second-stage ranking problem 257
10.6.2 Deterministic algorithm 259
10.6.3 Randomized algorithm 260
10.6.4 Extension to other loss functions 262
10.7 Other ranking criteria 262
10.8 Chapter notes 263
10.9 Exercises 264
11 Regression 267
11.1 The problem of regression 267
11.2 Generalization bounds 268
11.2.1 Finite hypothesis sets 268
11.2.2 Rademacher complexity bounds 269
11.2.3 Pseudo-dimension bounds 271
11.3 Regression algorithms 275
11.3.1 Linear regression 275
11.3.2 Kernel ridge regression 276
11.3.3 Support vector regression 281
11.3.4 Lasso 285
11.3.5 Group norm regression algorithms 289
11.3.6 On-line regression algorithms 289
11.4 Chapter notes 290
11.5 Exercises 292
12 Maximum Entropy Models 295
12.1 Density estimation problem 295
12.1.1 Maximum Likelihood (ML) solution 296
12.1.2 Maximum a Posteriori (MAP) solution 297
12.2 Density estimation problem augmented with features 297
12.3 Maxent principle 298
12.4 Maxent models 299
12.5 Dual problem 299
12.6 Generalization bound 303
12.7 Coordinate descent algorithm 304
12.8 Extensions 306
12.9 L2-regularization 308
Contents ix
12.10 Chapter notes 312
12.11 Exercises 313
13 Conditional Maximum Entropy Models 315
13.1 Learning problem 315
13.2 Conditional Maxent principle 316
13.3 Conditional Maxent models 316
13.4 Dual problem 317
13.5 Properties 319
13.5.1 Optimization problem 320
13.5.2 Feature vectors 320
13.5.3 Prediction 321
13.6 Generalization bounds 321
13.7 Logistic regression 325
13.7.1 Optimization problem 325
13.7.2 Logistic model 325
13.8 L2-regularization 326
13.9 Proof of the duality theorem 328
13.10 Chapter notes 330
13.11 Exercises 331
14 Algorithmic Stability 333
14.1 Definitions 333
14.2 Stability-based generalization guarantee 334
14.3 Stability of kernel-based regularization algorithms 336
14.3.1 Application to regression algorithms: SVR and KRR 339
14.3.2 Application to classification algorithms: SVMs 341
14.3.3 Discussion 342
14.4 Chapter notes 342
14.5 Exercises 343
15 Dimensionality Reduction 347
15.1 Principal component analysis 348
15.2 Kernel principal component analysis (KPCA) 349
15.3 KPCA and manifold learning 351
15.3.1 Isomap 351
15.3.2 Laplacian eigenmaps 352
15.3.3 Locally linear embedding (LLE) 353
15.4 Johnson-Lindenstrauss lemma 354
15.5 Chapter notes 356
15.6 Exercises 356
16 Learning Automata and Languages 359
16.1 Introduction 359
x Contents
16.2 Finite automata 360
16.3 Efficient exact learning 361
16.3.1 Passive learning 362
16.3.2 Learning with queries 363
16.3.3 Learning automata with queries 364
16.4 Identification in the limit 369
16.4.1 Learning reversible automata 370
16.5 Chapter notes 375
16.6 Exercises 376
17 Reinforcement Learning 379
17.1 Learning scenario 379
17.2 Markov decision process model 380
17.3 Policy 381
17.3.1 Definition 381
17.3.2 Policy value 382
17.3.3 Optimal policies 382
17.3.4 Policy evaluation 385
17.4 Planning algorithms 387
17.4.1 Value iteration 387
17.4.2 Policy iteration 390
17.4.3 Linear programming 392
17.5 Learning algorithms 393
17.5.1 Stochastic approximation 394
17.5.2 TD(0) algorithm 397
17.5.3 Q-learning algorithm 398
17.5.4 SARSA 402
17.5.5 TD(λ) algorithm 402
17.5.6 Large state space 403
17.6 Chapter notes 405
Conclusion 407
A Linear Algebra Review 409
A.1 Vectors and norms 409
A.1.1 Norms 409
A.1.2 Dual norms 410
A.1.3 Relationship between norms 411
A.2 Matrices 411
A.2.1 Matrix norms 411
A.2.2 Singular value decomposition 412
A.2.3 Symmetric positive semidefinite (SPSD) matrices 412
Contents xi
B Convex Optimization 415
B.1 Differentiation and unconstrained optimization 415
B.2 Convexity 415
B.3 Constrained optimization 419
B.4 Fenchel duality 422
B.4.1 Subgradients 422
B.4.2 Core 423
B.4.3 Conjugate functions 423
B.5 Chapter notes 426
B.6 Exercises 427
C Probability Review 429
C.1 Probability 429
C.2 Random variables 429
C.3 Conditional probability and independence 431
C.4 Expectation and Markov’s inequality 431
C.5 Variance and Chebyshev’s inequality 432
C.6 Moment-generating functions 434
C.7 Exercises 435
D Concentration Inequalities 437
D.1 Hoeffding’s inequality 437
D.2 Sanov’s theorem 438
D.3 Multiplicative Chernoff bounds 439
D.4 Binomial distribution tails: Upper bounds 440
D.5 Binomial distribution tails: Lower bound 440
D.6 Azuma’s inequality 441
D.7 McDiarmid’s inequality 442
D.8 Normal distribution tails: Lower bound 443
D.9 Khintchine-Kahane inequality 443
D.10 Maximal inequality 444
D.11 Chapter notes 445
D.12 Exercises 445
E Notions of Information Theory 449
E.1 Entropy 449
E.2 Relative entropy 450
E.3 Mutual information 453
E.4 Bregman divergences 453
E.5 Chapter notes 456
E.6 Exercises 457
xii Contents
F Notation 459
Bibliography 461
Index 475
https://www.dropbox.com/s/38p0j6ds5q9c8oe/10290.pdf
Natural Language Processing Succinctly
OVERVIEW
AI assistants represent a significant frontier for development. But the complexities of such systems pose a significant barrier for developers. In Natural Language Processing Succinctly, author Joseph Booth will guide readers through designing a simple system that can interpret and provide reasonable responses to written English text. With this foundation, readers will be prepared to tackle the greater challenges of natural language development.
TABLE OF CONTENTS
Natural Language Processing
What We're Building
Extracting Sentences
Extracting Words
Tagging
Entity Recognition
Knowledge Base
Answering Questions
Cloudmersive
Google Cloud NLP API
Microsoft Cognitive Services
Other NLP Uses
Summary
Penn Treebank Tags
Universal POS Tags
About the Code
https://www.syncfusion.com/succinctly-free-ebooks/natural-language-processing-succinctly
https://www.syncfusion.com/succinctly-free-ebooks/natural-language-processing-succinctly
Friday 30 November 2018
Machine Learning For Dummies
Table of Contents INTRODUCTION............................................................................................... 1 About This Book ................................................................................... 1 Foolish Assumptions............................................................................ 2 Icons Used in This Book....................................................................... 2 CHAPTER 1: Understanding Machine Learning................................. 3 What Is Machine Learning? ................................................................. 4 Iterative learning from data........................................................... 5 What’s old is new again.................................................................. 5 Defining Big Data.................................................................................. 6 Big Data in Context with Machine Learning...................................... 7 The Need to Understand and Trust your Data................................. 8 The Importance of the Hybrid Cloud................................................. 9 Leveraging the Power of Machine Learning ..................................... 9 Descriptive analytics.....................................................................10 Predictive analytics .......................................................................10 The Roles of Statistics and Data Mining with Machine Learning...............................................................................11 Putting Machine Learning in Context ..............................................12 Approaches to Machine Learning ....................................................14 Supervised learning......................................................................15 Unsupervised learning .................................................................15 Reinforcement learning ...............................................................16 Neural networks and deep learning...........................................17 CHAPTER 2: Applying Machine Learning ..............................................19 Getting Started with a Strategy.........................................................19 Using machine learning to remove biases from strategy........20 More data makes planning more accurate ...............................22 Understanding Machine Learning Techniques...............................22 Tying Machine Learning Methods to Outcomes ............................23 Applying Machine Learning to Business Needs..............................23 Understanding why customers are leaving...............................24 Recognizing who has committed a crime ..................................25 Preventing accidents from happening.......................................26 iv Machine Learning For Dummies, IBM Limited Edition These materials are © 2018 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited. CHAPTER 3: Looking Inside Machine Learning................................27 The Impact of Machine Learning on Applications..........................28 The role of algorithms..................................................................28 Types of machine learning algorithms.......................................29 Training machine learning systems............................................33 Data Preparation................................................................................34 Identify relevant data ...................................................................34 Governing data..............................................................................36 The Machine Learning Cycle .............................................................37 CHAPTER 4: Getting Started with Machine Learning.................39 Understanding How Machine Learning Can Help..........................39 Focus on the Business Problem .......................................................40 Bringing data silos together ........................................................41 Avoiding trouble before it happens............................................42 Getting customer focused ...........................................................43 Machine Learning Requires Collaboration......................................43 Executing a Pilot Project....................................................................44 Step 1: Define an opportunity for growth..................................44 Step 2: Conducting a pilot project...............................................44 Step 3: Evaluation .........................................................................45 Step 4: Next actions......................................................................45 Determining the Best Learning Model ............................................46 Tools to determine algorithm selection.....................................46 Approaching tool selection..........................................................47 CHAPTER 5: Learning Machine Skills .......................................................49 Defining the Skills That You Need ....................................................49 Getting Educated................................................................................53 IBM-Recommended Resources ........................................................56 CHAPTER 6: Using Machine Learning to Provide Solutions to Business Problems ....................................57 Applying Machine Learning to Patient Health ................................57 Leveraging IoT to Create More Predictable Outcomes..................58 Proactively Responding to IT Issues.................................................59 Protecting Against Fraud...................................................................60 CHAPTER 7: Ten Predictions on the Future of Machine Learning...............................................................63
Saturday 30 December 2017
Deep Learning for Natural Language Processing
Contents III Data Preparation 34 IV BagofWords 61 V Word Embeddings 114 VI Text Classification 144 VII Language Modeling 189 VIII Image Captioning 244 IX Machine Translation 331 X Appendix 372 XI Conclusions 395 Copyright
Common terms and phrases approach architecture array bag-of-words better BLEU score calculate called caption chapter characters classification clean close Complete example convert create dataset deep learning define descriptions develop discover document encode Encoder-Decoder Epoch evaluate example Example output Explore extract file.close filename filter function given import input input sequence integer encode Keras labels language model layer length Listing load load doc load_doc(filename look loss mapping max_length means methods movie review natural language processing negative Neural Machine Translation neural network open(filename output pairs performance pre-trained predict prepare prints probability problem provides punctuation Python reference remove representation Running the example sentence sentiment sequence skill specific split started statistical step summarize task text data tokens turn tutorial vector vocab vocab_size vocabulary word embedding Word2Vec
Thursday 31 July 2014
Natural Language Processing with Python [html edition]
Natural Language Processing with Python
– Analyzing Text with the Natural Language Toolkit
Steven Bird, Ewan Klein, and Edward Loper
This version of the NLTK book is updated for Python 3 and NLTK 3. The first edition of the book, published by O'Reilly, is available at http://nltk.org/book_1ed/. (There are currently no plans for a second edition of the book.)
0. Preface
1. Language Processing and Python
2. Accessing Text Corpora and Lexical Resources
3. Processing Raw Text
4. Writing Structured Programs
5. Categorizing and Tagging Words (minor fixes still required)
6. Learning to Classify Text
7. Extracting Information from Text
8. Analyzing Sentence Structure
9. Building Feature Based Grammars
10. Analyzing the Meaning of Sentences (minor fixes still required)
11. Managing Linguistic Data (minor fixes still required)
12. Afterword: Facing the Language Challenge
Bibliography
Term Index
Subscribe to:
Posts (Atom)
Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Third Editi...
-
Contents III Data Preparation 34 IV BagofWords 61 V Word Embeddings 114 VI Text Classification 144 VII Language Modeling 189 VIII Image Ca...
-
Chapter 1 Language Processing and Python 1 Chapter 2 Accessing Text Corpora and Lexical Resources 39 Chapter 3 Processing Raw Text 79 Chapt...
-
Part I. The Fundamentals of Machine Learning 1. The Machine Learning Landscape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...