I'm trying to extract the torrent name from torrent files.
Without looking to deep in how torrent files are structured I noticed that I only need to match last occurrence of all characters between two strings which in my case are
Here is the beginning of Arch Linux iso torrent file:
d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi
I need to extract
archlinux-2015.07.01-dual.iso witch is in between
12:piece lengthi. I checked this pattern with other torrent files in my case it will work! I can't figure out how to combine the regex
(?<=:)(.*)(?=12:piece lengthi) and
:(?:.(?!:))+$ if they are even correct at all.
I'm trying to make a bash script with
sed or something that could with a linux command.
Final perfectly working solution (thoroughly tested):
This works with all types of non-standard characters for example Cyrillic.
torrent_title=$(tr -d "\n" < "$filename" | iconv -f utf-8 -t utf-8 -c | sed 's/.*:\(.*\)12:piece lengthi.*/\1/')
Update:All suggestion work but Torrent files are binary files for example I tried
grep --text and
strings file | piped to grep or sed but random strings from the binary file are messing up the output.
Update 2 and SOLVED IT: so the final command is this
head -1 file.torrent| strings | tr -d "\n\r" | iconv -f utf-8 -t utf-8 -c| sed 's/.*:\(.*\)12:piece lengthi.*/\1/
I figured that the info is only in the first line of the file.
In my original example post I forgot to copy a couple of more strings at the end
d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi524288e6:pieces25840:
witch are part of the first line so for that I needed to slightly change hek2mgl sed
Update 3 The right way to do it is to use a parser, I learned it the hard way.
I would use
sed for that, like this:
sed 's/.*:\(.*\)12:piece lengthi/\1/' input.torrent
Try this with GNU grep:
grep -oP ':\K[^:]*(?=12:piece lengthi$)' file
sed -e 's/12:piece lengthi//' -e 's/.*://'