1001 Freelance Projects -- Data acquisition

Latest Projects from
Freelance Marketplaces

View this project in detail (Note: you will be redirected to external marketplace)
Project title:
Data acquisition
Posted by:
External project from PeoplePerHour
Started:
10-Jan-2025 12:49 GMT
Description:
We are a software company. For one of our projects we need to download information from a website containing articles about medical topics. The website contains cca. 10000 HTML pages of paged listing of articles in Czech language. The list contains titles of articles, each title having a link to the detail HTML page with the article text. We need someone to produce wget and other scripts and download the titles of all articles, parse the links from those titles, download the detailed pages of the articles and distill the text that is shown in the page. The titles as well as the detail pages mostly have the same structure so this allows for an automated work. But it is not so in 100% cases, there may be several types of structure so it may require some attention as to how to distill the correct information. The result of this work will be a set of static HTML files. You can view this structure under https://fomenot.com/z/dwld24/main.html I.e. the result will contain the contents of the article separated into paragraphs of normal text and captions (nothing else, no images or other texts). We only want the main text of the article that is visible on the screen for the user. No other text or html content. Another result will be the raw HTML output for each of the detail pages For accepting the output, we will do our check of the result. If we find errors, we will give examples of these errors and we will expect the vendor to fix all such errors in the result, not just those examples. If there are only a few errors we may not be able to find them and it is ok. But if we find any we will require correcting them. We expect that the raw HTML files will be 100% error free (for these we will not give examples, we just would demand fixing them). For the text-based results we will give examples before demanding to fix them. An example of such a source page you can find here: https://www.idnes.cz/onadnes/zdravi/2 You can see a list of articles, each having a link leading to the detail and then a paging control that can load more articles from the next page. This is NOT the page we need to download but similar. Putting here the example only that you understand what is the task. Let us know if you could do it and for what price. We will provide the real links to the selected candidate.
Project ID:
3416433
Project category:

Project budget:

View this project in detail (Note: you will be redirected to external marketplace)

Project	Started
SolidWorks Steel Sheet Metal Design Category: 3D Design, 3D Drafting, 3D Modelling, Product Design, Solidworks Budget: ₹750 - ₹1250 INR	25 Mar 2026 16:56 GMT
Etsy Niche Selection for Digital Art Category: Competitor Analysis, Digital Art, Etsy, Internet Marketing, Keyword Research, Link Building, Market Research, SEO Budget: $30 - $250 USD	25 Mar 2026 16:56 GMT
Deep Clean Apartment Post-Construction Category: Domestic Cleaning, House Cleaning Budget: $30 - $250 USD	25 Mar 2026 16:56 GMT
Industrial Maintaince & Reapir company Logo Category: Adobe Illustrator, Branding, Corporate Identity, Graphic Design, Illustration, Logo Design, Typography, Vector Design, Visual Design Budget: $30 - $250 CAD	25 Mar 2026 16:55 GMT
Personal Inventory Cataloging App Development Category: Android, App Design, App Development, App Usability Analysis, Image Processing, IOS Development, IPhone, Mobile App Development, Objective C, UI / User Interface Budget: $250 - $750 USD	25 Mar 2026 16:55 GMT
Ilustrador/a para Diseño de Personaje y Packaging -- 3 Category: Brochure Design, Concept Art, Corporate Identity, Covers & Packaging, Creative Design, Graphic Design, Illustration, Packaging Design Budget: $10 - $30 USD	25 Mar 2026 16:53 GMT
Facebook Car Rental Setup -- 2 Category: Content Writing, Facebook Ads, Facebook Marketing, Graphic Design, HTML, PHP, Social Media Management, Web Design Budget: $250 - $750 USD	25 Mar 2026 16:53 GMT
Looking for Data Specialist (Industrial Automation Products) Category: AI Content Creation, Data Collection, Data Entry, Data Management, Data Scraping, Excel, SEO, Web Scraping, Web Search, WordPress Budget: £20 - £250 GBP	25 Mar 2026 16:53 GMT
Cloud Based Aim & Recoil Category: AI Development, C++, Programming, Reverse Engineering Budget: $750 - $1500 USD	25 Mar 2026 16:53 GMT
Looking for a Zoho Exper Category: API Integration, CRM, Lead Generation, PHP, Salesforce.com, Troubleshooting, Zoho, Zoho CRM Budget: ₹100 - ₹400 INR	25 Mar 2026 16:52 GMT
Dell Boomi SAP ECC Extraction Category: API Integration, Automation, Data Extraction, Data Governance, Data Processing, ETL, SAP Budget: $250 - $750 USD	25 Mar 2026 16:52 GMT
Amazon SKU Launch & Listing Revamp Category: Amazon, Amazon Ads, Amazon Product Launch, Copywriting, EBay, Ghostwriting, Keyword Research, Product Descriptions Budget: ₹750 - ₹1250 INR	25 Mar 2026 16:51 GMT
Vetted Virginia Mortgage Lead List Category: Compliance, Data Analysis, Data Cleansing, Data Collection, Data Management, Data Scraping, Database Management, Lead Generation, Market Research, Sales Budget: $2 - $8 USD	25 Mar 2026 16:51 GMT
Crypto Payment & Delivery SaaS Development Category: Blockchain, Full Stack Development, JavaScript, Node.js, Payment Processing, PHP, Software Architecture, Web Design Budget: min $50 USD	25 Mar 2026 16:50 GMT
Ongoing Commercial Architecture Partnership Category: 3D Modelling, 3D Rendering, 3ds Max, Architectural Engineering, Architecture, AutoCAD, Building Architecture, Revit Budget: ₹100 - ₹400 INR	25 Mar 2026 16:48 GMT

Browse All Projects

New!
Проекты на русском (Projects in Russian)