Nixers project: Bittorrent library - Community & Forums Related Discussions
z3bra
Thanks for having think about the implementation!

I didn't think a lot about it myself, but my first idea was to go with your latest proposal: write directly to the end file, and allocate the full size (which is known) first. This has the advantage of failling quickly in case you don't have enough space.

First of all, let's use the correct terms. A "torrent" is the final file. Each torrent is splitted into "pieces", each having a hash, and each piece is sent as multiple "block" (see the RFC for details).

Your idea is to put each piece into its own file, and then, concatenate them when you jave all the parts. The main advantage I see here is that you can easily know which part you've already downloaded, allowing easy stop and starts without a "cache file".
The main disadvantage is that you will have a lot of I/O upon concatenation, and that would be a single threaded process.
This would also use a shitons on inodes, but that's not the biggest issue.

The "sort as they come" idea seems slow, bloated and complex to implement for no practical reason, so I'll just ignore it.

By writing directly to the file, you save some time, as when you'll receive the last piece, your end file will be ready. This is also the simplest approach for me, as you will know the size of pieces, and their position, so writing to the file is just an fseek() away. The problem is that this is not atomic, and require a "cache file" to remember which pieces are retrieved (altough you should be able to reconstruct it by checking which pieces of the end file are zeroed).

For now, I find the later simpler, as it eliminates intermediary parts from the process. It comes down to
receive a piece, check its hash, write it to the file, and that's it. Your idea adds another step which is "put all the parts together", and another step means more code, and thus more attack surface for bugs :)


I think there is no urge to settle this now.
For now, the structs for pieces/blocks are not yet finished. We also cannot talk to trackers, or other clients. The advantage of this RFC is that (I think) all parts are explained in a logical order. For example, you cannot understand how a metainfo file is written if you don't what bencoding is.
You're talking here about the way we write pieces received to the torrent, which comes, I think at 6.3.10 in the RFC ("Piece" message).
The library is currently stuck at point 3: "pieces and blocks". We have many things to do before deciding how to write them!

Implementing the full protocol is a big task, and we need to stay focused on what needs to be done if we want to get it working quickly.

I chose to implement all the points in the RFC one by one, in the order they appear. It might not be the best way to handle this project, but at least I can easily know where I'm at, and what needs to be done.
If you think of a better way, we can still discuss it though, I wanted this project to be the one of the community, not just mine.
josuah
(06-08-2017, 06:40 PM)z3bra Wrote: The main disadvantage is that you will have a lot of I/O upon concatenation, and that would be a single threaded process.
This would also use a shitons on inodes, but that's not the biggest issue.

Thanks, I did not think about this.

(06-08-2017, 06:40 PM)z3bra Wrote: Writing directly to the file [...] this is not atomic, and require a "cache file"

I mostly thought about concurrent seeding and leeching. I asked "is it possible to read and write a file at the same time" ? If not, one big torrent file open for writing one piece could not be used for reading another piece. But z3bra said on irc: "you can do that by mmap'ing the file".

So no real advantage to save the parts on different files, a big buffer file can provide everything needed.

The cache file will contain a list of completed pieces, this will bring atomicity back: a not fully written piece will not be in this cache file, and will be downloaded again if something went wrong.

(06-08-2017, 06:40 PM)z3bra Wrote: I think there is no urge to settle this now.

Let's begin with the beginning. I can get started with UTF-8, as discussed with z3bra:
http://bittorrent.org/beps/bep_0003.html tells us that:

Quote:All strings in a .torrent file that contains text must be UTF-8 encoded.

And the clients using the library may have broken locale, or simply use a different encoding. The wchar.h (C90) standard library can convert the currently running operating system encoding whatever it is to a encoding independent number (character position in Unicode).

(06-08-2017, 06:40 PM)z3bra Wrote: I chose to implement all the points in the RFC one by one, in the order they appear. It might not be the best way to handle this project, but at least I can easily know where I'm at, and what needs to be done.

We will need to notify each other our progress and self-assignments.
z3bra
Just a quick bump to thank xkco for his participation. It's not much, but it shows the interest you have in the project, and the will to participate. Be it a welcoming start for anyone willing to participate in this project!
josuah
xkco: you rock!



Venam started to make structs to hold the metainfo. That is a good idea I think, as it will help us work on multiple parts of the code independently and merge everything easily.

Here is the proposition: if we need to have an array of complex items, use a struct. Otherwise just use variables.

We can have multiple torrents per process and do not know how many of them.
I think we need a 'torrent' struct (torrent[i]->data).

Some torrents can have multiple files, struct as well (files[i]->path).

Code:
struct file {
    size_t       length;
    char        *path;
}

Code:
struct torrent {
    char        *announce_list[];   /* array of urls             */
    char        *name;
    size_t       length;
    size_t       pieces_length;
    size_t       pieces_count;      /* for convenience           */
    char        *pieces_sha1;
    char         torrent_sha1[20];  /* to compute ourself        */
    struct file *files[];           /* is NULL for single-file   */
    char        *data;              /* pointer to mmap-ed memory */
    char        *downloaded;        /* bitmask for the parts     */
}

If there is only announce on the metainfo file, the 'announce_list' would only
hold the first (prefered) url of tracker:

Code:
torrent->announce_list = { "http://tracker-1.tld", NULL }

I do not see the need to separate metainfo from the actual data.

We could do a 'part' struct as well, to avoid implementing a bitmask, and for convenience.
z3bra
The bitmask will have to be implemeted imo, as the RFC says it should be sent to peers. Here is what I had in mind:

Code:
struct torrent {
    char *announce[];
    char *files[];
    uint8_t bits[];
    struct piece {
        size_t len;
        uint8_t sha1[20];
        uint8_t data[];
    } pieces[];
};

This is, IMO the bare minimum that needs to be implemented. We could then add on top of that things like comments, creation date, ... (all optionnal fields actually).
josuah
That looks good to me. IMHO, the fewer data, the more stateless it will be ;) and the easier it will be to manage its state.

I did not know you could define nested structs. That is pretty neat.

One detail, though: how do we now at which byte to cut the big data blob to extract the files if we do not know their length?

Code:
struct torrent {
    char *announce[];
    uint8_t bits[];
    struct file {
        char *name;
        size_t len;
    } files[];
    struct piece {
        size_t len;
        uint8_t sha1[20];
        uint8_t data[];
    } pieces[];
};
z3bra
Good catch! I forgot to add a field for the file size. I'd use "path" rather than "name" though, as in the RFC, "name" refers to the filename without the path, while we'd store both in this case.
josuah
(12-08-2017, 08:42 AM)z3bra Wrote: I'd use "path" rather than "name" though.

That is fine.

I'm ok with adding it to the library then. :)
nas
Hi Guys, I'd like to try contributing to this project but two things:
1) This would be my first time doing some "real" thing in C. (I use Java for work)
2) How do we test it if it works?
josuah
(24-08-2017, 11:53 AM)nas Wrote: Hi Guys, I'd like to try contributing to this project but two things:
Quote:Nice! :) Welcome to the project. Make sure to join #unix @ irc.nixers.net if you have questions or ask here.

[quote="nas" pid="18868" dateline="1503586386"]
1) This would be my first time doing some "real" thing in C. (I use Java for work)

I started C by contributing to dvtm. I sent was awful patches, and martanne helped me to correct them (they were not merged into dvtm, though).

[quote="nas" pid="18868"]
2) How do we test it if it works?

We can setup our own tracker with existing clients: We need 2 things for a BitTorrent system: clients, a tracker (not really the server).
  • The clients do all the heavy lifting: they pass the file directly with peer to peer (as you know already), with a "peer wire protocol" (section 6).
  • The server to help the clients find each other (maintaining a peer list) and this go with HTTP GET requests.

The ideal is to test with many different trackers and clients, but if we have only one at first it is fine.

Open Tracker looks good to me, I will try to serve an instance at http://josuah.net:6969...




Members  |  Stats  |  Night Mode  |  Help