Cpu buffer vs cache


Sur une section de code sensible je me suis demander, ce qui est mieux (teste fait en C++):

int a;


int a;

Avec la 1ère partie, une vague d’information qui viens du coeur du cpu viens remplir les différents buffer (L1, L2, L3) jusqu’as la mémoire centrale de manière non bloquante. Ce qui provoque une grosse consommation sur tout le chemin de l’information. Cela est plus performant sur ARMv6 in-order surchargé car cela est non bloquant (2x), mais plus lent de 30% sur un x86 vide.

La 2ème partie est plus lente sur un système surchargé car le CPU passe sont temps a remettre en cache l’information à cause des changements de contexte qui purge les différents cache. Bien sur cette solution est préférable si cela permet d’évité un traitement lourd, mais la première solution est mieux pour la gestion simple de flags.


File transfer for file copy/move

Hello, here I will speak about file transfer to do copy/move. You need understand some basic therms.

I have discover which each OS and FS is different.

  • Windows/ntfs have correct fs layer, asynchronous inode and data layer, synchronous is not allowed and then flush return instantly regardless if is really written on the disk (prevent freeze the application, in all case data in memory lost application blocked or not). 
  • Linux/ext4 with partial asynchronous for data layer (in loop read/write the read block the write, and the write block the read), inode creations and manipulations are synchronous but due to low volume the aggressive access can be done without big lost of performance.
In my work I have found lot of problem, the main was:
  • close file descriptor after write lot of content call flush function or like, then is slow down while all file is not write, regardless if the hdd is idle or have few occupation.
  • In loop: read 1 block, write 1 block, the out of cache/buffer block the other operation without reason.
  • Inode access can be parallel to be grouped by the OS. At parallel copy, the mkdir to create the destination folder can concurrence then self.
  • The graphic thread, then the main thread can be slow down in some condition (like linux with slow open source graphic drivers and large file copy list), and the IO access is blocking.

Ultracopier 0.1

While { read (position); write (position) }

  • Advantage: Very simple, and used by the most of developer to have simple copy file.
  • Disadvantage: The missing cache do read blocking, then if buffer is not full, the write can be down without blocking but the loop and thread is blocked at read function and the contrary.
  • Implementation mistake: The slow down in interface slow down the copy.

Ultracopier 0.2

Thread 1: While { read (position); } , Thread pool for write: While { write (position); } close(); , and pipe like communication

  • Advantage: The close function is blocked in thread, and while is blocked another write thread is used. The read not block anymore the write, and the contrary (the different media have advantage, and the buffer and cache level can change separately and be in concurrency).
  • Disadvantage: It’s complicated on some programming aspect, I have need use goto to minimize the code, and have great big read function. The write thread decision is can be complicated too. The list parsing and read is same code for intuitive programming. Not parallel the extra programming like variable initialization, to prevent slow down for not real copy operation. Not parallel inode and data parallel operation possible. Can’t recovery destination file write corruption. 
  • Implementation mistake: Do with thread and lot of blocking function (not event to have cleaner design), based on 0.1 design. The slow down in interface slow down the copy.

Ultracopier 0.3

Thead 1: Copy list send/receive event (start transfer, stop transfer event), Thread list of {Read thread, Write thread, transfer thread} with pipe like communication.

  • Advantage: Can parallel the inode and data access, prevent no copy operation to slow down the copy. Can group inode access via parallel access, but data parallel is bad in general. Have asynchronous behavior like for all OS/FS (included synchronous OS/FS like linux/ext4). Very cleaner design, possible separate control on each transfer. Can recovery destination file write corruption. 
  • Disadvantage: Need master multi-thread, data locality, and lot of advanced algorithm.
The new copy engine is the best I have do, if you which do your copy engine to have better, make it has ultracopier plugin and compare with my copy engine.