Extract filehosters links from webpage

The script below will extract all links of file hosters. Here is a breakdown of what it does:

  1. Break down string that starts with http as a new line.
  2. Pick up lines that contain http.
  3. Remove double quote and any character that follows.
  4. Remove space and any character that follows.
  5. Remove > symbol and any character that follows.
  6. Remove " and any character that follows.
  7. Remove ]] and any character that follows.
  8. Remove < and any character that follows.
#!/bin/bash
sed -e 's/http/\nhttp/gI' $1 \
 | grep -i http \
 | sed -e 's/".*//' \
 | sed -e 's/ .*//' \
 | sed -e 's/<.*//' \
 | sed -e 's/&quot;.*//' \
 | sed -e 's/]].*//' \
 | sed -e 's/&lt;.*//' \
 | grep -Ei "depositfiles|rapidshare|megaupload|qshare|uploadbox|letitbit|storage.to|shareflare|multiupload|mediafire|sendspace|kewlshare|uploading.com" \
 | uniq > $1.out
 
# Unwanted file hosters:
# filesonic|fileserve|hotfile|ul.to|uploaded|easy-share|oron.com|sharingmatrix|


Converted to run in DOS

REM DOS version
sed -e "s/http/\nhttp/gI" %1 ^
 | grep -i http ^
 | sed -e "s/\x22.*//" ^
 | sed -e "s/ .*//" ^
 | sed -e "s/\x3C.*//" ^
 | sed -e "s/&quot;.*//" ^
 | sed -e "s/]].*//" ^
 | sed -e "s/&lt;.*//" ^
 | grep -Ei "depositfiles|rapidshare|megaupload|qshare|uploadbox|letitbit|storage.to|shareflare|multiupload|mediafire|sendspace|kewlshare|uploading.com" ^
 | uniq > %1.out

REM filesonic|fileserve|hotfile|ul.to|uploaded|easy-share|oron.com|sharingmatrix|