Fetch Data from Web

How the web crawler gets clean content from any site

flowchart TD A([URL]) --> B[Open page in browser] B --> C[Scroll to trigger\nlazy-loaded content] C --> D[Wait for network idle\n+ 2s render buffer] D --> E{Page loaded?} E -->|No| F{Attempts left?} F -->|Yes| G[Wait and retry\n5s · 10s · 15s] G --> B F -->|No| H([Dead Letter Queue]) E -->|Yes| I[Extract markdown content] I --> J[Strip noise\ntracking pixels · nav · footer] J --> K[Save with metadata\ntitle · url · type · timestamp] K --> L{More pages\non this site?} L -->|Yes| M[Follow next link\nin BFS queue] M --> B L -->|No| N[Sort files by\nURL structure] N --> O([Ready for chunker]) style A fill:#339af0,color:#fff style H fill:#ff6b6b,color:#fff style O fill:#51cf66,color:#fff