Automated data collection with R : a practical guide to Web scraping and text mining / (Record no. 207650)
[ view plain ]
000 -LEADER | |
---|---|
fixed length control field | 06313cam a2200733 i 4500 |
001 - CONTROL NUMBER | |
control field | ocn889941400 |
003 - CONTROL NUMBER IDENTIFIER | |
control field | OCoLC |
005 - DATE AND TIME OF LATEST TRANSACTION | |
control field | 20171026112149.0 |
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS | |
fixed length control field | m o d |
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION | |
fixed length control field | cr ||||||||||| |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION | |
fixed length control field | 140902s2014 enk ob 001 0 eng |
010 ## - LIBRARY OF CONGRESS CONTROL NUMBER | |
LC control number | 2014035023 |
015 ## - NATIONAL BIBLIOGRAPHY NUMBER | |
National bibliography number | GBB4D4768 |
Source | bnb |
016 7# - NATIONAL BIBLIOGRAPHIC AGENCY CONTROL NUMBER | |
Record control number | 016955944 |
Source | Uk |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
International Standard Book Number | 9781118834787 |
Qualifying information | electronic bk. |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
International Standard Book Number | 111883478X |
Qualifying information | electronic bk. |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
International Standard Book Number | 9781118834800 |
Qualifying information | electronic bk. |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
International Standard Book Number | 1118834801 |
Qualifying information | electronic bk. |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
Canceled/invalid ISBN | 9781118834817 (hardback) |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
Canceled/invalid ISBN | 9781118834732 |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
Canceled/invalid ISBN | 1118834739 |
029 1# - OTHER SYSTEM CONTROL NUMBER (OCLC) | |
OCLC library identifier | NLGGC |
System control number | 383512123 |
029 1# - OTHER SYSTEM CONTROL NUMBER (OCLC) | |
OCLC library identifier | AU@ |
System control number | 000053665594 |
029 1# - OTHER SYSTEM CONTROL NUMBER (OCLC) | |
OCLC library identifier | NZ1 |
System control number | 15913639 |
029 1# - OTHER SYSTEM CONTROL NUMBER (OCLC) | |
OCLC library identifier | CHVBK |
System control number | 334092922 |
029 1# - OTHER SYSTEM CONTROL NUMBER (OCLC) | |
OCLC library identifier | CHBIS |
System control number | 010442318 |
029 1# - OTHER SYSTEM CONTROL NUMBER (OCLC) | |
OCLC library identifier | AU@ |
System control number | 000058372878 |
029 1# - OTHER SYSTEM CONTROL NUMBER (OCLC) | |
OCLC library identifier | DEBBG |
System control number | BV043397096 |
029 1# - OTHER SYSTEM CONTROL NUMBER (OCLC) | |
OCLC library identifier | CHVBK |
System control number | 375152725 |
029 1# - OTHER SYSTEM CONTROL NUMBER (OCLC) | |
OCLC library identifier | CHSLU |
System control number | 001259237 |
035 ## - SYSTEM CONTROL NUMBER | |
System control number | (OCoLC)889941400 |
037 ## - SOURCE OF ACQUISITION | |
Stock number | AC8C3FD3-C56D-49B4-B991-612E67C28A99 |
Source of stock number/acquisition | OverDrive, Inc. |
Note | http://www.overdrive.com |
040 ## - CATALOGING SOURCE | |
Original cataloging agency | DLC |
Language of cataloging | eng |
Description conventions | rda |
Transcribing agency | DLC |
Modifying agency | N$T |
-- | UKMGB |
-- | YDXCP |
-- | DG1 |
-- | CDX |
-- | RECBK |
-- | OCLCF |
-- | OCLCO |
-- | TEFOD |
-- | EBLCP |
-- | OCLCQ |
-- | DEBBG |
042 ## - AUTHENTICATION CODE | |
Authentication code | pcc |
049 ## - LOCAL HOLDINGS (OCLC) | |
Holding library | MAIN |
050 00 - LIBRARY OF CONGRESS CALL NUMBER | |
Classification number | QA76.9.D343 |
072 #7 - SUBJECT CATEGORY CODE | |
Subject category code | COM |
Subject category code subdivision | 000000 |
Source | bisacsh |
082 00 - DEWEY DECIMAL CLASSIFICATION NUMBER | |
Classification number | 006.3/12 |
Edition number | 23 |
084 ## - OTHER CLASSIFICATION NUMBER | |
Classification number | COM021030 |
Number source | bisacsh |
100 1# - MAIN ENTRY--PERSONAL NAME | |
Personal name | Munzert, Simon. |
245 10 - TITLE STATEMENT | |
Title | Automated data collection with R : a practical guide to Web scraping and text mining / |
Statement of responsibility, etc. | Simon Munzert, Christian Ruoba, Peter Meiboner, Dominic Nyhuis. |
Medium | [electronic resource] |
264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE | |
Place of production, publication, distribution, manufacture | Chichester, West Sussex, United Kingdom ; |
-- | : |
Name of producer, publisher, distributor, manufacturer | Wiley, |
Date of production, publication, distribution, manufacture, or copyright notice | 2014. |
300 ## - PHYSICAL DESCRIPTION | |
Extent | 1 online resource. |
336 ## - CONTENT TYPE | |
Content type term | text |
Source | rdacontent |
337 ## - MEDIA TYPE | |
Media type term | computer |
Source | rdamedia |
338 ## - CARRIER TYPE | |
Carrier type term | online resource |
Source | rdacarrier |
504 ## - BIBLIOGRAPHY, ETC. NOTE | |
Bibliography, etc | Includes bibliographical references and index. |
505 8# - FORMATTED CONTENTS NOTE | |
Formatted contents note | Machine generated contents note: Dedication Table of Contents List of Figures List of Tables Preface 1 Introduction 1.1 Case Study: World Heritage Sites in Danger 1.2 Some Remarks on Web Data Quality 1.3 Technologies for Disseminating, Extracting and Storing Web Data 1.3.1 Technologies for disseminating content on the Web 1.4 Structure of the Book Part One A Primer on Web and Data Technologies 2 HTML 2.1 Browser Presentation and Source Code 2.2 Syntax Rules 2.3 Tags and Attributes 2.4 Parsing Summary Further Reading Problems 3 XML and JSON 3.1 A Short Example XML Document 3.2 XML Syntax Rules 3.3 When Is an XML Document Well-formed or Valid? 3.4 XML Extensions and Technologies 3.5 XML and R in Practice 3.6 A Short Example JSON Document 3.7 JSON Syntax Rules 3.8 JSON and R in Practice Summary Further Reading Problems 4 XPath 4.1 XPath -- a Querying Language for Web Documents 4.2 Identifying Node Sets with XPath 4.3 Extracting Node Elements Summary Further Reading Problems 5 HTTP 5.1 HTTP Fundamentals 5.2 Advanced Features of HTTP 5.3 Protocols beyond HTTP 5.4 HTTP in Action Summary Further Reading Problems 6 AJAX 6.1 JavaScript 6.2 XHR 6.3 Exploring AJAX with Web Developer Tools Summary Further Reading Problems 7 SQL and Relational Databases 7.1 Overview and Terminology 7.2 Relational Databases 7.3 SQL: a Language to Communicate with Databases 7.4 Databases in Action Summary Further Reading Problems 8 Regular Expressions and String Functions 8.1 Regular Expressions 8.2 String Processing 8.3 A Word on Character Encodings Summary Further Reading Problems Part Two A Practical Toolbox for Web Scraping and Text Mining 9 Scraping the Web 9.1 Retrieval Scenarios 9.2 Extraction Strategies 9.3 Web Scraping: Good Practice 9.4 Valuable Sources of Inspiration Summary Further Reading Problems 10 Statistical Text Processing 10.1 The running example: classifying press releases of the British government 10.2 Processing Textual Data 10.3 Supervised Learning Techniques 10.4 Unsupervised Learning Techniques Summary Further reading 11 Managing Data Projects 11.1 Interacting with the File System 11.2 Processing Multiple Documents/Links 11.3 Organizing Scraping Procedures 11.4 Executing R Scripts on a Regular Basis Part Three A Bag of Case Studies 12 Collaboration Networks in the U.S. Senate 12.1 Information on the Bills 12.2 Information on the Senators 12.3 Analyzing the network structure 12.4 Conclusion 13 Parsing Information from Semi-Structured Documents 13.1 Downloding Data from the FTP Server 13.2 Parsing Semi-Structured Text Data 13.3 Visualizing station and temperature data 14 Predicting the 2014 Academy Awards using Twitter 14.1 Twitter APIs: Overview 14.2 Twitter-based Forecast of the 2014 Academy Awards 14.3 Conclusion 15 Mapping the Geographic Distribution of Names 15.1 Developing a Data Collection Strategy 15.2 Web Site Inspection 15.3 Data Retrieval and Information Extraction 15.4 Mapping Names 15.5 Automating the Process 15.6 Summary 16 Gathering Data on Mobile Phones 16.1 Page Exploration 16.2 Scraping Procedure 16.3 Graphical Analysis 16.4 Data storage 17 Analyzing Sentiments of Product Reviews 17.1 Introduction 17.2 Collecting the data 17.3 Analyzing the Data 17.4 Conclusion References Bibliography Indices General Index Package Index Function Index . |
520 ## - SUMMARY, ETC. | |
Summary, etc. | "This book provides a unified framework of web scraping and information extraction from text data with R for the social sciences"-- |
Assigning source | Provided by publisher. |
588 ## - SOURCE OF DESCRIPTION NOTE | |
Source of description note | Description based on print version record and CIP data provided by publisher. |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | Data mining. |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | Automatic data collection systems. |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | Social sciences |
General subdivision | Research |
-- | Data processing. |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | R (Computer program language) |
650 #7 - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | COMPUTERS / Database Management / Data Mining. |
Source of heading or term | bisacsh |
650 #7 - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | Automatic data collection systems. |
Source of heading or term | fast |
Authority record control number | (OCoLC)fst00822733 |
650 #7 - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | Data mining. |
Source of heading or term | fast |
Authority record control number | (OCoLC)fst00887946 |
650 #7 - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | R (Computer program language) |
Source of heading or term | fast |
Authority record control number | (OCoLC)fst01086207 |
650 #7 - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | Social sciences |
General subdivision | Research |
-- | Data processing. |
Source of heading or term | fast |
Authority record control number | (OCoLC)fst01122948 |
655 #4 - INDEX TERM--GENRE/FORM | |
Genre/form data or focus term | Electronic books. |
776 08 - ADDITIONAL PHYSICAL FORM ENTRY | |
Relationship information | Print version: |
Main entry heading | Munzert, Simon. |
Title | Automated data collection with R |
Place, publisher, and date of publication | HobokenChichester, West Sussex, United Kingdom ; : John Wiley & Sons Inc., 2014 |
International Standard Book Number | 9781118834817 |
Record control number | (DLC) 2014032266 |
856 40 - ELECTRONIC LOCATION AND ACCESS | |
Uniform Resource Identifier | http://onlinelibrary.wiley.com/book/10.1002/9781118834732 |
Public note | Wiley Online Library |
942 ## - ADDED ENTRY ELEMENTS (KOHA) | |
Source of classification or shelving scheme | |
Koha item type | Books |
No items available.