-
Notifications
You must be signed in to change notification settings - Fork 228
extend preprocess_data_dist to handle jsonl files #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
269af4e
9ba081b
d29a702
ed49713
e94f2a0
687ff32
af59545
4f648a0
9f2ba6a
72d6c9c
8b67bec
3eca1f3
a691b48
e4a34e2
8b168ca
7a02693
eca2940
b14491d
ec11281
354d13b
2dc3f7a
980e904
ebd20a6
69b2f49
50de06a
af290ad
4b58c74
71a2fdc
73d3a24
b9e69be
da615c6
c42f41f
a3a7d53
163310a
01b2be0
4b6e8ff
ea08555
2524fce
ca14d48
d482f36
28d76f5
f706108
f122883
57c012e
eed8327
a75cfc2
afcfcf9
74b733a
dadb51b
a2f8fa0
39e6cd7
2a29d99
d6fa895
1216c0a
a64d3da
fde439e
fb274bf
d428c02
ba14351
852fdd0
61f4b46
18881ae
bd6f41f
3488d0b
1305fe9
6bcac1f
813d068
0510081
1fea302
d360313
20a43af
7b08347
8d448bc
6f7519f
3f9078d
b9aa845
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -144,6 +144,7 @@ def all_sum_(self, vals: np.array): | |
|
|
||
| def open(self, filename, truncate=None): | ||
| """Create, truncate, and open a file for writing shared by all ranks.""" | ||
| f = None | ||
|
|
||
| # Don't truncate existing file until all ranks reach this point | ||
| self.barrier() | ||
|
|
@@ -163,6 +164,8 @@ def open(self, filename, truncate=None): | |
|
|
||
| except Exception as e: | ||
| err = e | ||
| if f is not None: | ||
| f.close() | ||
|
|
||
| # Verify that rank 0 created the file | ||
| self.allraise_if(err) | ||
|
|
@@ -176,12 +179,20 @@ def open(self, filename, truncate=None): | |
| err = e | ||
|
|
||
| # Verify that all ranks successfully opened the file | ||
| if not self.alltrue(err is None): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I overall curious why you need to close the file? Raise should destroy everything no?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, good point. I think the raise should do the trick, since the file handle will not be returned and go out of scope in that case. I'll simplify that code. |
||
| # Someone failed to open the file. | ||
| # If we succeeded, close our file. | ||
| if f is not None: | ||
| f.close() | ||
|
|
||
| # All raise an exception if anyone did | ||
| self.allraise_if(err) | ||
|
|
||
| return f | ||
|
|
||
| def openread(self, filename): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't that just
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the open-for-write function, I have that written as a two phase process, where rank 0 creates and truncates the file, then other ranks open the file afterwards. In the open-for-read, all procs open the file simultaneously. I think it's useful to keep the two-phase step for creating the file, because create/truncate can be expensive on some file systems. However, I suspect this can be refactored to have a single open call so that |
||
| """Open a shared file for reading by all ranks.""" | ||
| f = None | ||
|
|
||
| # Don't attempt to open until all ranks are ready. | ||
| self.barrier() | ||
|
|
@@ -198,7 +209,7 @@ def openread(self, filename): | |
| if not self.alltrue(err is None): | ||
| # Someone failed to open the file. | ||
| # If we succeeded, close our file. | ||
| if err is None: | ||
| if f is not None: | ||
| f.close() | ||
|
|
||
| # All raise an exception if anyone did | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead scope everything? It would allow exception handling to be specific? Like if truncate fails then we need to close the file.