1001 Freelance Projects
Latest Projects from
Freelance Marketplaces
View Project
View this project in detail
(Note: you will be redirected to external marketplace)
Project title:
Data acquisition
Posted by:
External project from PeoplePerHour
Started:
10-Jan-2025 12:49 GMT
Description:
We are a software company. For one of our projects we need to download
information from a website containing articles about medical topics.
The website contains cca. 10000 HTML pages of paged listing of articles
in Czech language. The list contains titles of articles, each title having
a link to the detail HTML page with the article text.
We need someone to produce wget and other scripts and download the titles of
all articles, parse the links from those titles, download the detailed pages
of the articles and distill the text that is shown in the page.
The titles as well as the detail pages mostly have the same structure so
this allows for an automated work. But it is not so in 100% cases, there may
be several types of structure so it may require some attention as to how
to distill the correct information.
The result of this work will be a set of static HTML files. You can view this
structure under
https://fomenot.com/z/dwld24/main.html
I.e. the result will contain the contents of the article separated into
paragraphs of normal text and captions (nothing else, no images or other
texts). We only want the main text of the article that is visible on the screen
for the user. No other text or html content.
Another result will be the raw HTML output for each of the detail pages
For accepting the output, we will do our check of the result. If we find errors,
we will give examples of these errors and we will expect the vendor to fix
all such errors in the result, not just those examples. If there are only a few
errors we may not be able to find them and it is ok. But if we find any we will
require correcting them.
We expect that the raw HTML files will be 100% error free (for these we will not
give examples, we just would demand fixing them). For the text-based results
we will give examples before demanding to fix them.

An example of such a source page you can find here:
https://www.idnes.cz/onadnes/zdravi/2
You can see a list of articles, each having a link leading to the detail
and then a paging control that can load more articles from the next page.
This is NOT the page we need to download but similar. Putting here the example
only that you understand what is the task.

Let us know if you could do it and for what price. We will provide the real links
to the selected candidate.
Project ID:
3416433
Project category:
Project budget:
View this project in detail
(Note: you will be redirected to external marketplace)
Last Projects / Browse Projects
  Project Started
OpenVPN Server Deployment + Temporary VPN Client Automation Script
Category: Automation, Bash, Debian, Linux, Network Security, OpenVPN, Scripting, Shell Script, Ubuntu, VPN
Budget: $30 - $250 USD
08 Nov 2025 11:03 GMT
Basic Corporate Website Build
Category: CMS, Graphic Design, HTML, Performance Tuning, PHP, Web Development, Web Design, Website Optimization
Budget: $30 - $250 USD
08 Nov 2025 11:03 GMT
Wedding Promo & Full Edit
Category: Adobe Premiere Pro, Audio Editing, Color Grading, Graphic Design, Photo Editing, Video Editing, Video Production, Video Services
Budget: ₹75000 - ₹150000 INR
08 Nov 2025 11:03 GMT
Capacitive Touch Lamp PCB
Category: Circuit Design, Electrical Engineering, Electronics, Embedded Systems, PCB Layout, Product Development, Prototyping
Budget: ₹600 - ₹1500 INR
08 Nov 2025 11:03 GMT
AI-Powered B2B Sales Automation System
Category: AI Development, AI Model Development, API Development, Automation, Email Marketing, Next.js, Python, Web Scraping
Budget: $1500 - $3000 USD
08 Nov 2025 11:03 GMT
Virtual Assistant - Chinese Natives Only
Category: Customer Service, Data Entry, ECommerce, Excel, Inventory Management, Logistics, Order Processing, Product Sourcing, Shipping, Virtual Assistant
Budget: £2 - £3 GBP
08 Nov 2025 11:03 GMT
Google Maps Business Data Extraction
Category: API Integration, Data Analysis, Data Extraction, Data Management, Data Mining, Data Processing, Database Management, Excel, Web Scraping
Budget: $250 - $750 USD
08 Nov 2025 11:02 GMT
Jenkins Pipeline Nexus Stage
Category: Automation, CI / CD, Continuous Integration, DevOps, Jenkins, Scripting, Software Development, Spring Boot
Budget: ₹1500 - ₹12500 INR
08 Nov 2025 10:59 GMT
Manual Hindi PDF to Word Document
Category: Data Entry, Editing, PDF, Project Management, Proofreading, Typing, Word, Word Processing
Budget: ₹1500 - ₹12500 INR
08 Nov 2025 10:58 GMT
Revamp of Full Corporate Website Development (Approx. 90-100 Pages)
Category: Web Development, WordPress
Budget: ₹12500 - ₹37500 INR
08 Nov 2025 10:58 GMT
Grow Computer & CCTV Income
Category: Advertising, Business Strategy, Content Marketing, Digital Marketing, Email Marketing, Market Research, Marketing Strategy, SEO Auditing, Social Media Marketing
Budget: $8 - $15 USD
08 Nov 2025 10:57 GMT
Egypt InstaPay marketing
Category: API Integration, JavaScript, Logo Design, Marketing, Node.js, Payment Processing, PHP, Software Architecture, Software Development, Web Development
Budget: $30 - $250 USD
08 Nov 2025 10:57 GMT
Infographic Dashboard Grid Page
Category: CSS, Documentation, Frontend Development, HTML, HTML5, JSON, Web Design, Web Development, WordPress
Budget: €30 - €250 EUR
08 Nov 2025 10:56 GMT
Long-Term Social Media Graphics Design
Category: Adobe Illustrator, Photoshop, Banner Design, Canva, Graphic Design, Logo Design, Social Media Marketing
Budget: $250 - $750 USD
08 Nov 2025 10:56 GMT
Smart Ticket Note Generator
Category: Desktop Application, HTML, JavaScript, Node.js, Python, Software Development, User Interface / IA, Web Application
Budget: ₹1500 - ₹12500 INR
08 Nov 2025 10:56 GMT
Browse All Projects
Projects by Skills ...
android
ajax
asp
aspnet
cms
cpp
csharp
css
delphi
design
drupal
excel
facebook
flash
html
java
javascript
joomla
iphone
mysql
photoshop
php
python
ruby
seo
sql
sysadm
translate
typing
twitter
vbnet
xml
wordpress
writing
New!
Проекты на русском
(Projects in Russian)

Copyright © 2005-2024
1001 Freelance Projects