当前位置: 动力学知识库 > 问答 > 编程问答 >

linux - Regex match last occurrence of all characters between two strings

问题描述:

I'm trying to extract the torrent name from torrent files.

Without looking to deep in how torrent files are structured I noticed that I only need to match last occurrence of all characters between two strings which in my case are : * 12:piece lengthi.

Here is the beginning of Arch Linux iso torrent file:

d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi

I need to extract archlinux-2015.07.01-dual.iso witch is in between : and 12:piece lengthi. I checked this pattern with other torrent files in my case it will work! I can't figure out how to combine the regex (?<=:)(.*)(?=12:piece lengthi) and :(?:.(?!:))+$ if they are even correct at all.

I'm trying to make a bash script with grep OR awk OR sed or something that could with a linux command.

Final perfectly working solution (thoroughly tested):

This works with all types of non-standard characters for example Cyrillic.

torrent_title=$(tr -d "\n" < "$filename" | iconv -f utf-8 -t utf-8 -c | sed 's/.*:\(.*\)12:piece lengthi.*/\1/')

Update:All suggestion work but Torrent files are binary files for example I tried grep --text and strings file | piped to grep or sed but random strings from the binary file are messing up the output.

Update 2 and SOLVED IT: so the final command is this

head -1 file.torrent| strings | tr -d "\n\r" | iconv -f utf-8 -t utf-8 -c| sed 's/.*:\(.*\)12:piece lengthi.*/\1/

I figured that the info is only in the first line of the file.

In my original example post I forgot to copy a couple of more strings at the end

 d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi524288e6:pieces25840:

witch are part of the first line so for that I needed to slightly change hek2mgl sed

answer.

Update 3 The right way to do it is to use a parser, I learned it the hard way.

网友答案:

I would use sed for that, like this:

sed 's/.*:\(.*\)12:piece lengthi/\1/' input.torrent
网友答案:

Try this with GNU grep:

 grep -oP ':\K[^:]*(?=12:piece lengthi$)' file

Output:

archlinux-2015.07.01-dual.iso
网友答案:

Try this:

 sed -e 's/12:piece lengthi//' -e 's/.*://'
分享给朋友:
您可能感兴趣的文章:
随机阅读: